The Lottery Ticket Hypothesis

On Sparse, Trainable Neural Networks (Jonathan Frankle)

Background: Neural Network Pruning

  • Neural networks are large

  • Prune: to reduce the extent of a neural network by removing unwanted parts

  • Goal: reduce the cost of inference

  • Structure

    • Neurons, attention heads, weights, blocks?

    • Focus on pruning weights

  • Superfluous

    • Magnitudes? Gradients? Activations?

  • Fine-tuned

    • Retrain

Contribution

Training is expensive

  • Cost

    • Financial cost

    • Environmental cost (CO2)

  • Computationally

Research Question

  • Prune models after training, can we train smaller models?

    • Capacity for representation vs. optimization

  • Yes

    • Early in training

Lottery Ticket Hypothesis

  • Hindsight

  • Does not reach full accuracy

  • Contribution: you need to keep the original initializations

Iterative Magnitude Pruning (IMP)

Result

  • IMP finds subnetworks that can train from the start to full accuracy at non-trivial sparsities

    • Random subnetwroks cannot train to full accuracy

Last updated