Automatically Discovering Machine Learning Optimizations
https://www.youtube.com/watch?v=XyXzzjbuXCs&t=2s
ML computation is increasingly exponentially
Amount of computation in largest ML training doubles every 3.4 months
Computation Graph of ML Model
Nodes: tensor algebra operator (e.g., convolution, matrix mul)
Edge: tensor (high-dim array)
ML systems are a key ingredient in ML
Distributed heterogenous hardware architectures
Challenges of building ML systems
Increasingly diverse models
Increasingly heterogeneous hardware
Summary
Massively parallel: tensor algebra is parallelizable in many dimensions
New ML operators: continuously introduced into ML systems
Heterogenous hardware: different processor kinds and complex memory hierarchy
Limitations
Hard to manually design
Suboptimal performance
Correctness relies on manual proof
Catalyst: automated learning systems lab
Mission: automate the design and optimization of ML systems
Statistical and mathematical properties of ML algorithms
Domain knowledge of modern hardware platforms
Graph optimizations: TASO, PET --> automated generation of graph optimizations
Parallelization optimizations: FlexFlow, Dorylus (automated discovery of fast parallelization strategy)
Data layout & placement: Lux, Roc
Advantages of automated ML systems
Better runtime performance: discovering novel optimizations hard to manually designed, 3-10x speedup over manual optimizations
Less engineering effort: code for discovering optimizations is generally much less than manual implementation of these optimizations
Stronger correctness guarantees: using formal verification techniques
TASO, PET
Current rule-based graph optimizations
Challenges of graph optimizations for ML: infeasible to manually design graph optimizations for all cases
Operators, graph architectures, hardware backends
TASO: tensor algebra super-optimizer
Automated generation and verification of graph substitutions
Workflow
Operator specifications
Graph subset generator
Huge amount of graphs
Explicitly considering all pairs does not scale --> compute output fingerprints, pairs of graphs with identical fingerprint are candidate substitutions
Candidate substitution
Graph substitution verifier
Graph optimizer
Based on individual operators' cost
Measure the cost of each operator on hardware
Cost-based backtracking search
PET: first tensor program optimizer with partially equivalent transformations
Larger optimization space
Better performance
Correctness
Last updated
Was this helpful?