# Toward Reconfigurable Kernel Datapaths with Learned Optimizations

* Problem: OS kernels are under stress! (diversifying applications and hardware)&#x20;
  * Heterogenous hardware such as SmartNICs, TPU and GPU, along with their diverging computer units, memory hierarchy, and accelerators&#x20;
  * Applications are evolving: distinct characteristics, latency, different memory access patterns&#x20;
* Kernel optimizations today
  * **Key limitations with today's kernel optimizations**&#x20;
    * **Ad hoc heuristics**, without guarantees they'll always work&#x20;
    * **One-size-fit-all optimizations** for all apps and workloads&#x20;
    * **Cannot generalize to new** apps, workloads, and hardware
* **Research agenda**&#x20;
  * **Goal**: better, principled kernel optimizations&#x20;
    * Tailored for each app, workload, or hardware
    * Adapt to new scenarios&#x20;
    * Minimize hand-tuning and benchmarking&#x20;
  * Approach: leverage machine learning (ML)&#x20;
* Why ML?
  * Four types of benefits&#x20;
    * **Reduce unnecessary kernel monitoring**&#x20;
      * e.g. reduce page faults caused by memory affinity monitoring&#x20;
    * **Better and more robust heuristics**&#x20;
      * e.g., use-cases in ML based page prefetching, task migration, I/O scheduling, and congestion control etc.&#x20;
    * **Generalization to unseen scenarios**&#x20;
      * E.g. ML prefetcher could do online adaptation to new access patterns&#x20;
    * **Cross-application optimizations**&#x20;
      * E.g., enable efficient communication between producer and consumers&#x20;
  * Kernel-ML example&#x20;
    * Swap area page prefetching&#x20;
      * Baseline #1: linux readahead
        * Only captures sequential patterns&#x20;
      * Baseline #2: majority-vote based prefetcher (leap)
        * Captures sequential and striding patterns&#x20;
      * Ours: decision tree based prefetcher&#x20;
        * Captures more complex patterns&#x20;
    * Workloads: contain non-sequential / striding patterns&#x20;
      * E.g., video resizing and matrix convolution&#x20;
      * Each workload runs in a dedicated cgroup&#x20;
    * Better performance on multiple metrics&#x20;
      * **Better coverage**: higher hit-rate and less access overhead&#x20;
      * **Better accuracy**: less memory pollution by irrelevant pages&#x20;
      * **Better execution time**: workload completes faster&#x20;

But, how?

* **A range of research questions** exist:
  * What should be the in-kernel ML infrastructure?&#x20;
  * How to rearchitect the kernel to use learned decisions?&#x20;
  * How to manage a diverse range of ML models?&#x20;
* Our proposal: reconfigurable kernel datapaths&#x20;
  * Architect kernel datapaths with reconfigurable match tables (RMT)
    * **Matches** check execution context (e.g., page faults)
    * **Actions** perform data collection and adaptive decisions&#x20;
    * **Kernel ML lib** performs learning and inference (e.g., NNs) &#x20;
    * Complexity and behavior is analyzed by the RMT verifier for performance and safty guarantees&#x20;

![](/files/DAVKNEpbKXcrHLrkjCwE)

### Challenges&#x20;

* **Challenge #1: Infrastructure for in kernel ML**&#x20;
  * Applications in user space can easily build up ML models, however, if we want to deploy the ML to interfere kernel and achieve run-time reconfigurability --> need an infrastructure&#x20;
    * Data monitoring and collection&#x20;
    * Kernel ML deployment&#x20;
    * Model management&#x20;
  * Even with eBPF, existing kernel does not support such functionality&#x20;
  * Thus, we need to build our own in-kernel RMT infrastructure to interfere critical kernel decisions&#x20;
* **Proposal #1: in-kernel RMT virtual machine**&#x20;
  * Achieve dynamic reconfigurability&#x20;
  * Steps&#x20;
    * RMT program: initializing and inserting the match-action table&#x20;
    * Verified, and compiled into byte code and further compiled into machine code&#x20;
    * Main block: match-action table&#x20;
      * K-V map manner&#x20;
      * Define key decision point in kernel data path&#x20;
      * Match: kernel reacts&#x20;
      * Entry: execution context, such as CPU load, process ID, and training ML model, or use the ML model to replace the heuristics&#x20;
* **Challenge #2: Customizing ML for the kernel**&#x20;
  * Kernel operations are limited and time-sensitive, we need to mitigate the overheads induced by ML&#x20;
  * What techniques can be adopted to achieve efficient and high-accuracy ML training, and prediction inside kernel with tolerable overhead&#x20;
* **Proposal #2: lightweight in-kernel ML**&#x20;
  * ML library to fasten the model construction and calculation&#x20;
    * ML data structure: Conv layer, MLP layer, etc.
    * Helper fuctions: mat mul, backware prop&#x20;
  * ML training&#x20;
    * Efficient background training
    * Offline training and updates&#x20;
    * ...
  * Customized ML
    * Neural architecture search&#x20;
    * Meta learning&#x20;
    * ...&#x20;
  * ML inference&#x20;
    * Model compression
    * Model quantization&#x20;
    * ...&#x20;
* **Challenge #3: guarantees of infrastructure behavior**
  * We need to guarantee the integrity of the whole infrastructure&#x20;
  * Decision based on ML leads to desirable state: e.g., resource imbalance&#x20;
  * Incorrect model causing kernel panic: eg., defective ML model
  * Privacy violation: e.g., data leaking&#x20;
* **Proposal #3: The RMT verifier**&#x20;
  * Performance interference: prevent RMT program from inducing kernel into undesirable states
  * Model safety: verify the correctness of the model or add guardrails to black box inference&#x20;
  * Privacy enhancement: implement privacy mechanism to prevent malicious queries&#x20;

### Summary

* Motivation: better, generalizable kernel optimizations&#x20;
* Proposal: reconfigurable kernel data-paths with learned optimizations&#x20;
* Key approach&#x20;
  * The RMT virtual machine&#x20;
  * Lightweight kernel ML&#x20;
  * The RMT verifier&#x20;
* Preliminary results&#x20;
  * Page prefetching, CPU scheduling&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/toward-reconfigurable-kernel-datapaths-with-learned-optimizations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
