The Lottery Ticket Hypothesis
On Sparse, Trainable Neural Networks (Jonathan Frankle)
Background: Neural Network Pruning
Neural networks are large
Prune: to reduce the extent of a neural network by removing unwanted parts
Goal: reduce the cost of inference
Structure
Neurons, attention heads, weights, blocks?
Focus on pruning weights
Superfluous
Magnitudes? Gradients? Activations?
Fine-tuned
Retrain
Contribution
Training is expensive
Cost
Financial cost
Environmental cost (CO2)
Computationally
Research Question
Prune models after training, can we train smaller models?
Capacity for representation vs. optimization
Yes
Early in training
Lottery Ticket Hypothesis
Hindsight
Does not reach full accuracy
Contribution: you need to keep the original initializations
Iterative Magnitude Pruning (IMP)
Result
IMP finds subnetworks that can train from the start to full accuracy at non-trivial sparsities
Random subnetwroks cannot train to full accuracy
Last updated