Automatically Discovering Machine Learning Optimizations

https://www.youtube.com/watch?v=XyXzzjbuXCs&t=2s

  • ML computation is increasingly exponentially

    • Amount of computation in largest ML training doubles every 3.4 months

  • Computation Graph of ML Model

    • Nodes: tensor algebra operator (e.g., convolution, matrix mul)

    • Edge: tensor (high-dim array)

  • ML systems are a key ingredient in ML

    • Distributed heterogenous hardware architectures

  • Challenges of building ML systems

    • Increasingly diverse models

    • Increasingly heterogeneous hardware

    • Summary

      • Massively parallel: tensor algebra is parallelizable in many dimensions

      • New ML operators: continuously introduced into ML systems

      • Heterogenous hardware: different processor kinds and complex memory hierarchy

    • Limitations

      • Hard to manually design

      • Suboptimal performance

      • Correctness relies on manual proof

  • Catalyst: automated learning systems lab

    • Mission: automate the design and optimization of ML systems

      • Statistical and mathematical properties of ML algorithms

      • Domain knowledge of modern hardware platforms

    • Graph optimizations: TASO, PET --> automated generation of graph optimizations

    • Parallelization optimizations: FlexFlow, Dorylus (automated discovery of fast parallelization strategy)

    • Data layout & placement: Lux, Roc

  • Advantages of automated ML systems

    • Better runtime performance: discovering novel optimizations hard to manually designed, 3-10x speedup over manual optimizations

    • Less engineering effort: code for discovering optimizations is generally much less than manual implementation of these optimizations

    • Stronger correctness guarantees: using formal verification techniques

  • TASO, PET

    • Current rule-based graph optimizations

    • Challenges of graph optimizations for ML: infeasible to manually design graph optimizations for all cases

      • Operators, graph architectures, hardware backends

    • TASO: tensor algebra super-optimizer

      • Automated generation and verification of graph substitutions

      • Workflow

        • Operator specifications

        • Graph subset generator

          • Huge amount of graphs

          • Explicitly considering all pairs does not scale --> compute output fingerprints, pairs of graphs with identical fingerprint are candidate substitutions

        • Candidate substitution

        • Graph substitution verifier

        • Graph optimizer

          • Based on individual operators' cost

          • Measure the cost of each operator on hardware

          • Cost-based backtracking search

    • PET: first tensor program optimizer with partially equivalent transformations

      • Larger optimization space

      • Better performance

      • Correctness

Last updated