CtrlK

The Lottery Ticket Hypothesis

On Sparse, Trainable Neural Networks (Jonathan Frankle)

Background: Neural Network Pruning

Neural networks are large
Prune: to reduce the extent of a neural network by removing unwanted parts
Goal: reduce the cost of inference

Structure
- Neurons, attention heads, weights, blocks?
- Focus on pruning weights
Superfluous
- Magnitudes? Gradients? Activations?
Fine-tuned
- Retrain

Contribution

Training is expensive

Cost
- Financial cost
- Environmental cost (CO2)
Computationally

Research Question

Prune models after training, can we train smaller models?
- Capacity for representation vs. optimization
Yes
- Early in training

Lottery Ticket Hypothesis

Hindsight

Does not reach full accuracy

Contribution: you need to keep the original initializations

Iterative Magnitude Pruning (IMP)

Result

IMP finds subnetworks that can train from the start to full accuracy at non-trivial sparsities
- Random subnetwroks cannot train to full accuracy

PreviousEnabling Hyperscale Web Services NextExternal Merge Sort for Top-K Queries: Eager input filtering guided by histograms

Last updated 4 years ago

Was this helpful?