# Rearchitecting Linux Storage Stack for µs Latency and High Throughput

### Presentation&#x20;

* Widespread belief: Linux cannot achieve micro-second scale latency & high throughput&#x20;
  * Adaption of high performance H/W, but stagnant single-core capacity&#x20;
    * T-app: throughput-bound app&#x20;
    * Static data path --> hard to utilize all cores&#x20;
  * Co-location of apps with different performance goals&#x20;
    * L-app: latency-sensitive app&#x20;
    * High latency due to HoL blockign&#x20;
* Performance of existing storage stack&#x20;
  * Applications accessing in-memory data in remote servers (single-core case)&#x20;
    * ![](/files/jA1xtzPvEB0IC7pAGrRu)
    * Low latency or high throughput, but not both&#x20;
* blk-switch summary
  * Linux can achieve micro-second scale latency while achieving near H/W capacity throughput!
    * Without changes in applications, kernel CPU scheduler, kernel TCP/IP stack, and network hardware
  * For example, blk-switch acheives&#x20;
    * Even with tens of applications (6 L-apps + 6 T-apps on 6 cores)
    * Complex interference at compute, storage, and network stacks (remove storage access over100 Gbps)&#x20;
* Key insight&#x20;
  * Observation: today's linux storage stack is conceptually similar to network switches&#x20;
  * ![](/files/qPqs8wcTR9bfL9IStHIQ)
  * blk-switch: switched linux storage stack architecture&#x20;
    * Enables decoupling request processing from application cores
    * Multi-egress queues, prioritization, and load balancing&#x20;

#### Architecture&#x20;

![](/files/6P8Pd1eZleg042sQGYy8)

1. Egress queue per-(core, app-class)

![](/files/xrnnQpovzOMQpkmChtgw)

2\. Flexible mapping from ingress to egress queues&#x20;

\--> decoupling request processing from application cores: "static --> flexible"&#x20;

Three techniques:

* Blk-switch prioritization&#x20;
  * Prioritize L-app request processing&#x20;
  * Multi-egress queues + prioritization: near optimal latency for L-apps&#x20;
* Blk-switch request steering for transient loads&#x20;
  * Challenge: prioritization of L-apps can lead to transient starvation of T-apps&#x20;
  * Steer requests to underutilized cores at per-request granularity&#x20;
    * Select target cores using known techniques&#x20;
    * Capture only T-app load&#x20;
  * Request steering allows blk-switch to maintain high throughput, even under transient loads&#x20;
* Blk-switch application steering for persistent loads&#x20;
  * Challenge: persistent loads lead to high system overheads&#x20;
  * Steer apps to cores with low average utlization&#x20;
    * Long-term time scales (e.g., every 10ms)
    * Both L-app and T-app load
  * High throughput for T-apps even under persistent loads
  * Even lower latency for L-apps due to fewer context switches&#x20;
* Evaluation&#x20;
  * Implemented entirely in the Linux kernel with minimal changes (LOC: \~928)
  * To stress test blk-switch
    * Complex interaction among the compute, storage, and network stack
    * Evaluate "remote storage access"&#x20;
  * To push the bottleneck to the storage stack processing&#x20;
    * Two 32-core servers connected directly over 100 Gbps&#x20;
  * To access data on remote servers&#x20;
    * Linux / blk-switch use i10
    * SPDK uses userspace NVMe-over-TCP&#x20;

#### Summary

* It is possible to achieve millisecond-scale latency and high throughput with LInux
* blk-switch insight: modern storage stack is conceptually similar to network switches
  * Decoupling request processing from application cores
  * Multi-egress queue architecture, prioritization, request steering, and application steering
* blk-switch achieves&#x20;
  * 10s of micro-second scale avg latency and < 190 micro-second tail latency with in-memory storage
  * Near-hardware capacity throughput&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/rearchitecting-linux-storage-stack-for-s-latency-and-high-throughput.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
