# The Lottery Ticket Hypothesis

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEEj5bMiYwx_YOJRfc%2F-MXEEwt7RpPToTN5v-vn%2Fimage.png?alt=media\&token=ee138dea-8aa7-4896-8a91-2e83d98b07bf)

### Background: Neural Network Pruning

* Neural networks are large&#x20;
* Prune: to reduce the extent of a neural network by removing unwanted parts&#x20;
* Goal: reduce the cost of inference&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEFfIB-ehsP827PKsC%2Fimage.png?alt=media\&token=8a4a69d9-dce4-4469-8d45-03d14cf2cd32)

* Structure
  * Neurons, attention heads, weights, blocks?&#x20;
  * Focus on pruning weights&#x20;
* Superfluous
  * Magnitudes? Gradients? Activations?&#x20;
* Fine-tuned&#x20;
  * Retrain

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEG8WFm5ikCbcZokN6%2Fimage.png?alt=media\&token=040c6adc-fc45-4400-a1f4-af26876d5b17)

### Contribution&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEGFxRgOhuvyNk5t7v%2Fimage.png?alt=media\&token=c5c69176-33e9-4e51-8ff9-9aae3a2e799f)

### Training is expensive&#x20;

* Cost&#x20;
  * Financial cost&#x20;
  * Environmental cost (CO2)&#x20;
* Computationally&#x20;

### Research Question&#x20;

* Prune models after training, can we train smaller models?
  * Capacity for representation vs. optimization&#x20;
* Yes&#x20;
  * Early in training&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEH6pa9CNnyQvUbbCq%2Fimage.png?alt=media\&token=0ddcdd2d-1942-4bf9-a993-439a4b78d30b)

### Lottery Ticket Hypothesis&#x20;

* &#x20;Hindsight&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEHG5N8t59G5iFQENz%2Fimage.png?alt=media\&token=7d5b679b-b80b-4771-a6f9-c7efa8146254)

* Does not reach full accuracy&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEHQk-yNHv4fII98m3%2Fimage.png?alt=media\&token=aedb8243-52e1-4f52-853c-3a6f96449ade)

* Contribution: you need to keep the original initializations&#x20;

### Iterative Magnitude Pruning (IMP)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEI5hiy0B9HKkxRtX7%2Fimage.png?alt=media\&token=dda791d6-1ff1-44c6-9c38-bfa02b16d986)

### Result&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEISnStQqZbYtFHV2C%2Fimage.png?alt=media\&token=65399da6-293d-4b2b-a545-c7f19fcac8aa)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEIVYam4B5Yq32jRBn%2Fimage.png?alt=media\&token=2b84e4a7-e83d-4021-a99c-c824ab8a13cd)

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEIy2IAi-s7Pq-ERF0%2Fimage.png?alt=media\&token=f065171e-b514-4191-952c-beeaf08729d6)

* IMP finds subnetworks that can train from the start to full accuracy at non-trivial sparsities&#x20;
  * Random subnetwroks cannot train to full accuracy&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MXEFG17i5zxzIF1CkuZ%2F-MXEJWvaZ5Qay25T69VH%2Fimage.png?alt=media\&token=c86a0e5c-83a8-440d-9d02-ba3d873ea94d)
