IOS: Inter-Operator Scheduler for CNN Acceleration

https://arxiv.org/abs/2011.01302

Executive summary
- Motivation: sequential execution --> under-utilization problem
Inter-Operator Scheduler
- Inter-operator parallelism
- Dynamic programming --> optimal schedule
- 1.1-1.5x speedup
Efficient deployment of CNNs is important
- Is CNN inference in current DL libraries well utilizing underlying hardware?
Motivation for Inter-Operator Parallelization
- More small convs in CNN design
- GPU peak performance increased
- Intra- and inter-operator parallelization
  - Sequential execution: Intra-operator Parallelization: Device under-utilization (small op & opwerful GPU)
  - Inter-Op Parallel Execution: better device utilization
Background: wavefront schedule policy
- Execute all available operators stage by stage
- A better schedule
  - Put op to saturated stage: marginal benefit
  - Under-utilization problem
  - Wavefront schedule policy is sub-optimal
Inter-operator scheduler (IOS)
- General idea: explore the schedule space exhausitvely
- Challenge: the number of schedules is exp in the number of operators
  - Prohibitive to enumerate
- Observation 1: optimal schedule for a subgraph can be reused
  - Key idea: dynamic programming
- Observation 2: the width of the computation graph is usually small (max number of parallelizable operators)
  - Key result: time complexity is only exponential in the width
- Parallelization strategy selection
  - Concurrent execution --> multi-GPU kernel at the same time
  - Operator merge --> merged convolution, usually better performance
  - Profile & select
- Last stage candidates
  - S' can be the last stage of S <--> there is no edge from S' to S - S'
- Transition graph and time complexity
- Methodology
  - Benchmarks
    Inception V3, SqueezeNet, Randwire, NasNet
  - Baselines: state-of-the-art frameworks, different schedules on IOS Runtime
  - Environment: NVIDIA V100, Cuda, cuDNN
- More active warps improve utilization

PreviousBeyond Data and Model Parallelism for Deep Neural Networks NextBuilding An Elastic Query Engine on Disaggregated Storage

Last updated 3 years ago

Was this helpful?