Automatically Discovering Machine Learning Optimizations

https://www.youtube.com/watch?v=XyXzzjbuXCs&t=2s

ML computation is increasingly exponentially
- Amount of computation in largest ML training doubles every 3.4 months
Computation Graph of ML Model
- Nodes: tensor algebra operator (e.g., convolution, matrix mul)
- Edge: tensor (high-dim array)
ML systems are a key ingredient in ML
- Distributed heterogenous hardware architectures
Challenges of building ML systems
- Increasingly diverse models
- Increasingly heterogeneous hardware
- Summary
  - Massively parallel: tensor algebra is parallelizable in many dimensions
  - New ML operators: continuously introduced into ML systems
  - Heterogenous hardware: different processor kinds and complex memory hierarchy
- Limitations
  - Hard to manually design
  - Suboptimal performance
  - Correctness relies on manual proof
Catalyst: automated learning systems lab
- Mission: automate the design and optimization of ML systems
  - Statistical and mathematical properties of ML algorithms
  - Domain knowledge of modern hardware platforms
- Graph optimizations: TASO, PET --> automated generation of graph optimizations
- Parallelization optimizations: FlexFlow, Dorylus (automated discovery of fast parallelization strategy)
- Data layout & placement: Lux, Roc
Advantages of automated ML systems
- Better runtime performance: discovering novel optimizations hard to manually designed, 3-10x speedup over manual optimizations
- Less engineering effort: code for discovering optimizations is generally much less than manual implementation of these optimizations
- Stronger correctness guarantees: using formal verification techniques
TASO, PET
- Current rule-based graph optimizations
- Challenges of graph optimizations for ML: infeasible to manually design graph optimizations for all cases
  - Operators, graph architectures, hardware backends
- TASO: tensor algebra super-optimizer
  - Automated generation and verification of graph substitutions
  - Workflow
    Operator specifications
    Graph subset generator
    Huge amount of graphs
    Explicitly considering all pairs does not scale --> compute output fingerprints, pairs of graphs with identical fingerprint are candidate substitutions
    Candidate substitution
    Graph substitution verifier
    Graph optimizer
    Based on individual operators' cost
    Measure the cost of each operator on hardware
    Cost-based backtracking search
- PET: first tensor program optimizer with partially equivalent transformations
  - Larger optimization space
  - Better performance
  - Correctness

PreviousLeveraging Service Meshes as a New Network Layer NextBeyond Data and Model Parallelism for Deep Neural Networks

Last updated 3 years ago

Was this helpful?