The Lottery Ticket Hypothesis
On Sparse, Trainable Neural Networks (Jonathan Frankle)
Last updated
Was this helpful?
On Sparse, Trainable Neural Networks (Jonathan Frankle)
Last updated
Was this helpful?
Neural networks are large
Prune: to reduce the extent of a neural network by removing unwanted parts
Goal: reduce the cost of inference
Structure
Neurons, attention heads, weights, blocks?
Focus on pruning weights
Superfluous
Magnitudes? Gradients? Activations?
Fine-tuned
Retrain
Cost
Financial cost
Environmental cost (CO2)
Computationally
Prune models after training, can we train smaller models?
Capacity for representation vs. optimization
Yes
Early in training
Hindsight
Does not reach full accuracy
Contribution: you need to keep the original initializations
IMP finds subnetworks that can train from the start to full accuracy at non-trivial sparsities
Random subnetwroks cannot train to full accuracy