> For the complete documentation index, see [llms.txt](https://sliu583.gitbook.io/blog/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/automatically-discovering-machine-learning-optimizations.md).

# Automatically Discovering Machine Learning Optimizations

* ML computation is increasingly exponentially&#x20;
  * Amount of computation in largest ML training doubles every 3.4 months&#x20;
* Computation Graph of ML Model&#x20;
  * Nodes: tensor algebra operator (e.g., convolution, matrix mul)
  * Edge: tensor (high-dim array)
* ML systems are a key ingredient in ML&#x20;
  * Distributed heterogenous hardware architectures&#x20;
* Challenges of building ML systems&#x20;
  * Increasingly diverse models&#x20;
  * Increasingly heterogeneous hardware&#x20;
  * Summary&#x20;
    * Massively parallel: tensor algebra is parallelizable in many dimensions&#x20;
    * New ML operators: continuously introduced into ML systems&#x20;
    * Heterogenous hardware: different processor kinds and complex memory hierarchy&#x20;
  * Limitations&#x20;
    * Hard to manually design&#x20;
    * Suboptimal performance&#x20;
    * Correctness relies on manual proof&#x20;
* Catalyst: automated learning systems lab
  * Mission: automate the design and optimization of ML systems&#x20;
    * Statistical and mathematical properties of ML algorithms&#x20;
    * Domain knowledge of modern hardware platforms&#x20;
  * Graph optimizations: TASO, PET --> automated generation of graph optimizations&#x20;
  * Parallelization optimizations: FlexFlow, Dorylus (automated discovery of fast parallelization strategy)&#x20;
  * Data layout & placement: Lux, Roc
* Advantages of automated ML systems&#x20;
  * Better runtime performance: discovering novel optimizations hard to manually designed, 3-10x speedup over manual optimizations&#x20;
  * Less engineering effort: code for discovering optimizations is generally much less than manual implementation of these optimizations&#x20;
  * Stronger correctness guarantees: using formal verification techniques&#x20;
* TASO, PET&#x20;
  * Current rule-based graph optimizations&#x20;
  * Challenges of graph optimizations for ML: infeasible to manually design graph optimizations for all cases&#x20;
    * Operators, graph architectures, hardware backends&#x20;
  * TASO: tensor algebra super-optimizer&#x20;
    * Automated generation and verification of graph substitutions&#x20;
    * Workflow&#x20;
      * Operator specifications&#x20;
      * Graph subset generator&#x20;
        * Huge amount of graphs&#x20;
        * Explicitly considering all pairs does not scale --> compute output fingerprints, pairs of graphs with identical fingerprint are candidate substitutions&#x20;
      * Candidate substitution&#x20;
      * Graph substitution verifier&#x20;
      * Graph optimizer&#x20;
        * Based on individual operators' cost
        * Measure the cost of each operator on hardware&#x20;
        * Cost-based backtracking search&#x20;
  * PET: first tensor program optimizer with partially equivalent transformations&#x20;
    * Larger optimization space&#x20;
    * Better performance&#x20;
    * Correctness&#x20;
