# Beyond Data and Model Parallelism for Deep Neural Networks

* Parallelizing DNN training is hard&#x20;
  * Complex DNN models --> complex machine architectures&#x20;
* Existing approaches: data and model parallelism&#x20;
  * Data parallelism is the default strategy in existing DNN framework
  * Manually-designed strategies&#x20;
    * Combine data and model parallelism to accelerate DNNs&#x20;
  * Automatic generated strategies&#x20;
    * ColocRL uses RL to find device placement for model paralellism&#x20;
  * Exploring dimensions beyond data and model parallelism can further accelerate DNN training (by  up to 3.3x)&#x20;
* A search-based approach&#x20;
  * Define the SOAP search space of possible parallelization approach&#x20;
  * A cost model and a search algorithm&#x20;
  * Combining them: optimized strategies&#x20;
* The SOSP search space&#x20;
  * Samples, operators, attributes, parameters
    * Samples: partitioning training samples (data parallelism)
    * Operators: partitioning DNN operators (model parallelism)
    * Attributes: partitioning attributes in a sample (e.g., different pixels)&#x20;
    * Parameters: partitioning parameters in an operator&#x20;
  * Hybrid parallelism: different strategies perform the same computation (same accuracy, and focus on runtime performance)&#x20;
* This work: by considering a large search space, able to find better solution&#x20;
  * Example: data parallelism, model parallelism, hybrid&#x20;
* A cost model and a search algorithm&#x20;
  * Optimized solution in this search space&#x20;
  * FlexFlow
    * Input: operator graph (computation in DNN model), device topology (set of available devices, and their inter-connections)
    * Execution optimizer&#x20;
      * MCMC: search algorithm&#x20;
        * Iterative generate candidate strategies&#x20;
      * Execution simulator: cost model
        * Simulate the execution of the strategies and send the simulated performance back to the search algorithm&#x20;
        * Challenge: measuring distributed executions on real hardware is slow&#x20;
        * Two observations&#x20;
          * The performance of DNN operators is highly predictable&#x20;
          * DNN models only use a small number of distinct operators (redundancy)&#x20;
        * Execution simulator
          * Measure each distinct operator once&#x20;
          * Use the measurements to estimate different parallelization strategies&#x20;
          * Delta simulation algorithm&#x20;
            * Idea: do nothave to build task graph from scratch
            * Observation&#x20;
              * The MCMC search proposes a new strategy by updating the previous strategy&#x20;
              * Most of the task graph does not change&#x20;
            * Solution: simulate a new strategy using incremental updates to previous simulations&#x20;
      * Best found strategy will be sent to distributed runtime to parallelize training&#x20;
  * Evaluation&#x20;
    * Simulation reduces the search by 2-7x&#x20;
    * The search only takes a few minutes&#x20;
    * Two clusters, six DNN benchmarks&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/beyond-data-and-model-parallelism-for-deep-neural-networks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
