Toward Reconfigurable Kernel Datapaths with Learned Optimizations

https://www.youtube.com/watch?v=yI5Q61V2wT4

Problem: OS kernels are under stress! (diversifying applications and hardware)
- Heterogenous hardware such as SmartNICs, TPU and GPU, along with their diverging computer units, memory hierarchy, and accelerators
- Applications are evolving: distinct characteristics, latency, different memory access patterns
Kernel optimizations today
- Key limitations with today's kernel optimizations
  - Ad hoc heuristics, without guarantees they'll always work
  - One-size-fit-all optimizations for all apps and workloads
  - Cannot generalize to new apps, workloads, and hardware
Research agenda
- Goal: better, principled kernel optimizations
  - Tailored for each app, workload, or hardware
  - Adapt to new scenarios
  - Minimize hand-tuning and benchmarking
- Approach: leverage machine learning (ML)
Why ML?
- Four types of benefits
  - Reduce unnecessary kernel monitoring
    e.g. reduce page faults caused by memory affinity monitoring
  - Better and more robust heuristics
    e.g., use-cases in ML based page prefetching, task migration, I/O scheduling, and congestion control etc.
  - Generalization to unseen scenarios
    E.g. ML prefetcher could do online adaptation to new access patterns
  - Cross-application optimizations
    E.g., enable efficient communication between producer and consumers
- Kernel-ML example
  - Swap area page prefetching
    Baseline #1: linux readahead
    Only captures sequential patterns
    Baseline #2: majority-vote based prefetcher (leap)
    Captures sequential and striding patterns
    Ours: decision tree based prefetcher
    Captures more complex patterns
  - Workloads: contain non-sequential / striding patterns
    E.g., video resizing and matrix convolution
    Each workload runs in a dedicated cgroup
  - Better performance on multiple metrics
    Better coverage: higher hit-rate and less access overhead
    Better accuracy: less memory pollution by irrelevant pages
    Better execution time: workload completes faster

But, how?

A range of research questions exist:
- What should be the in-kernel ML infrastructure?
- How to rearchitect the kernel to use learned decisions?
- How to manage a diverse range of ML models?
Our proposal: reconfigurable kernel datapaths
- Architect kernel datapaths with reconfigurable match tables (RMT)
  - Matches check execution context (e.g., page faults)
  - Actions perform data collection and adaptive decisions
  - Kernel ML lib performs learning and inference (e.g., NNs)
  - Complexity and behavior is analyzed by the RMT verifier for performance and safty guarantees

Challenges

Challenge #1: Infrastructure for in kernel ML
- Applications in user space can easily build up ML models, however, if we want to deploy the ML to interfere kernel and achieve run-time reconfigurability --> need an infrastructure
  - Data monitoring and collection
  - Kernel ML deployment
  - Model management
- Even with eBPF, existing kernel does not support such functionality
- Thus, we need to build our own in-kernel RMT infrastructure to interfere critical kernel decisions
Proposal #1: in-kernel RMT virtual machine
- Achieve dynamic reconfigurability
- Steps
  - RMT program: initializing and inserting the match-action table
  - Verified, and compiled into byte code and further compiled into machine code
  - Main block: match-action table
    K-V map manner
    Define key decision point in kernel data path
    Match: kernel reacts
    Entry: execution context, such as CPU load, process ID, and training ML model, or use the ML model to replace the heuristics
Challenge #2: Customizing ML for the kernel
- Kernel operations are limited and time-sensitive, we need to mitigate the overheads induced by ML
- What techniques can be adopted to achieve efficient and high-accuracy ML training, and prediction inside kernel with tolerable overhead
Proposal #2: lightweight in-kernel ML
- ML library to fasten the model construction and calculation
  - ML data structure: Conv layer, MLP layer, etc.
  - Helper fuctions: mat mul, backware prop
- ML training
  - Efficient background training
  - Offline training and updates
  - ...
- Customized ML
  - Neural architecture search
  - Meta learning
  - ...
- ML inference
  - Model compression
  - Model quantization
  - ...
Challenge #3: guarantees of infrastructure behavior
- We need to guarantee the integrity of the whole infrastructure
- Decision based on ML leads to desirable state: e.g., resource imbalance
- Incorrect model causing kernel panic: eg., defective ML model
- Privacy violation: e.g., data leaking
Proposal #3: The RMT verifier
- Performance interference: prevent RMT program from inducing kernel into undesirable states
- Model safety: verify the correctness of the model or add guardrails to black box inference
- Privacy enhancement: implement privacy mechanism to prevent malicious queries

Summary

Motivation: better, generalizable kernel optimizations
Proposal: reconfigurable kernel data-paths with learned optimizations
Key approach
- The RMT virtual machine
- Lightweight kernel ML
- The RMT verifier
Preliminary results
- Page prefetching, CPU scheduling

PreviousIncremental Path Towards a Safe OS Kernel NextA Vision for Runtime Programmable Networks

Last updated 3 years ago

Was this helpful?