# IOS: Inter-Operator Scheduler for CNN Acceleration

* Executive summary&#x20;
  * Motivation: sequential execution --> under-utilization problem&#x20;
* Inter-Operator Scheduler&#x20;
  * Inter-operator parallelism&#x20;
  * Dynamic programming --> optimal schedule&#x20;
  * 1.1-1.5x speedup&#x20;
* Efficient deployment of CNNs is important&#x20;
  * Is CNN inference in current DL libraries well utilizing underlying hardware?&#x20;
* Motivation for Inter-Operator Parallelization&#x20;
  * More small convs in CNN design
  * GPU peak performance increased&#x20;
  * Intra- and inter-operator parallelization&#x20;
    * Sequential execution: Intra-operator Parallelization: Device under-utilization (small op & opwerful GPU)
    * Inter-Op Parallel Execution: better device utilization&#x20;
* Background: wavefront schedule policy&#x20;
  * Execute all available operators stage by stage&#x20;
  * A better schedule&#x20;
    * Put op to saturated stage: marginal benefit&#x20;
    * Under-utilization problem&#x20;
    * Wavefront schedule policy is sub-optimal&#x20;
* Inter-operator scheduler (IOS)
  * General idea: explore the schedule space exhausitvely&#x20;
  * Challenge: the number of schedules is exp in the number of operators&#x20;
    * Prohibitive to enumerate&#x20;
  * Observation 1: optimal schedule for a subgraph can be reused&#x20;
    * Key idea: dynamic programming&#x20;
  * Observation 2: the width of the computation graph is usually small (max number of parallelizable operators)&#x20;
    * Key result: time complexity is only exponential in the width&#x20;
  * ![](/files/UIocBRnacCsbwYyHuIHx)
  * Parallelization strategy selection&#x20;
    * Concurrent execution --> multi-GPU kernel at the same time&#x20;
    * Operator merge --> merged convolution, usually better performance&#x20;
    * Profile & select&#x20;
  * Last stage candidates&#x20;
    * S' can be the last stage of S <--> there is no edge from S' to S - S'&#x20;
  * Transition graph and time complexity&#x20;
  * Methodology&#x20;
    * Benchmarks&#x20;
      * Inception V3, SqueezeNet, Randwire, NasNet
    * Baselines: state-of-the-art frameworks, different schedules on IOS Runtime&#x20;
    * Environment: NVIDIA V100, Cuda, cuDNN
  * More active warps improve utilization


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/ios-inter-operator-scheduler-for-cnn-acceleration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
