> For the complete documentation index, see [llms.txt](https://sliu583.gitbook.io/blog/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://sliu583.gitbook.io/blog/specific-work/shivarams-group/group-papers/lists/wavelet-efficient-dnn-training-with-tick-tock-scheduling.md).

# Wavelet: Efficient DNN Training with Tick-Tock Scheduling

![](/files/-MZJHrcvkurLhL0_80Tt)

![](/files/-MZJI2TizWyLnlVOVye7)

![](/files/-MZJI5jFAkCwiVp84Q5B)

![](/files/-MZJIEQ9PlY_p9VFtvJl)

1. All-reduce&#x20;
2. Parameter server&#x20;

![](/files/-MZJIkXbMCo4GXykSsCO)

* Why?
  * Cluster-level
  * Might be more fragmentation&#x20;
  * Not something about single task utilization&#x20;

![](/files/-MZJJgXFy6cTOa9fLTEL)

* Not using all resources all the time&#x20;

![](/files/-MZJLDw2NZC84OyGv3NK)

Gandiva:

* cluster-level
* But does not improve a single job's performance&#x20;
* Single job also takes the same time&#x20;

![](/files/-MZJLjcaAdA5Wd7vfw3t)

* Increase inter-batch parallelism&#x20;

Gandiva:

* Multi-jobs and single jobs&#x20;

![](/files/-MZJMJg-RKJLQQu_rZnE)

Pipedream: minimizing the communication there&#x20;

![](/files/-MZJN23lrzALtlNppBHL)

![](/files/-MZJNvlpnqgIlGPg1Ro3)

Version of the model that is read?&#x20;

![](/files/-MZJOfmIK64BGEKI-OPj)

![](/files/-MZJQoeZdz8q49q0Ydno)

![](/files/-MZJQsRX_Szk6Ojj2bpz)

### Slides&#x20;
