Toward Reconfigurable Kernel Datapaths with Learned Optimizations
https://www.youtube.com/watch?v=yI5Q61V2wT4
Problem: OS kernels are under stress! (diversifying applications and hardware)
Heterogenous hardware such as SmartNICs, TPU and GPU, along with their diverging computer units, memory hierarchy, and accelerators
Applications are evolving: distinct characteristics, latency, different memory access patterns
Kernel optimizations today
Key limitations with today's kernel optimizations
Ad hoc heuristics, without guarantees they'll always work
One-size-fit-all optimizations for all apps and workloads
Cannot generalize to new apps, workloads, and hardware
Research agenda
Goal: better, principled kernel optimizations
Tailored for each app, workload, or hardware
Adapt to new scenarios
Minimize hand-tuning and benchmarking
Approach: leverage machine learning (ML)
Why ML?
Four types of benefits
Reduce unnecessary kernel monitoring
e.g. reduce page faults caused by memory affinity monitoring
Better and more robust heuristics
e.g., use-cases in ML based page prefetching, task migration, I/O scheduling, and congestion control etc.
Generalization to unseen scenarios
E.g. ML prefetcher could do online adaptation to new access patterns
Cross-application optimizations
E.g., enable efficient communication between producer and consumers
Kernel-ML example
Swap area page prefetching
Baseline #1: linux readahead
Only captures sequential patterns
Baseline #2: majority-vote based prefetcher (leap)
Captures sequential and striding patterns
Ours: decision tree based prefetcher
Captures more complex patterns
Workloads: contain non-sequential / striding patterns
E.g., video resizing and matrix convolution
Each workload runs in a dedicated cgroup
Better performance on multiple metrics
Better coverage: higher hit-rate and less access overhead
Better accuracy: less memory pollution by irrelevant pages
Better execution time: workload completes faster
But, how?
A range of research questions exist:
What should be the in-kernel ML infrastructure?
How to rearchitect the kernel to use learned decisions?
How to manage a diverse range of ML models?
Our proposal: reconfigurable kernel datapaths
Architect kernel datapaths with reconfigurable match tables (RMT)
Matches check execution context (e.g., page faults)
Actions perform data collection and adaptive decisions
Kernel ML lib performs learning and inference (e.g., NNs)
Complexity and behavior is analyzed by the RMT verifier for performance and safty guarantees
Challenges
Challenge #1: Infrastructure for in kernel ML
Applications in user space can easily build up ML models, however, if we want to deploy the ML to interfere kernel and achieve run-time reconfigurability --> need an infrastructure
Data monitoring and collection
Kernel ML deployment
Model management
Even with eBPF, existing kernel does not support such functionality
Thus, we need to build our own in-kernel RMT infrastructure to interfere critical kernel decisions
Proposal #1: in-kernel RMT virtual machine
Achieve dynamic reconfigurability
Steps
RMT program: initializing and inserting the match-action table
Verified, and compiled into byte code and further compiled into machine code
Main block: match-action table
K-V map manner
Define key decision point in kernel data path
Match: kernel reacts
Entry: execution context, such as CPU load, process ID, and training ML model, or use the ML model to replace the heuristics
Challenge #2: Customizing ML for the kernel
Kernel operations are limited and time-sensitive, we need to mitigate the overheads induced by ML
What techniques can be adopted to achieve efficient and high-accuracy ML training, and prediction inside kernel with tolerable overhead
Proposal #2: lightweight in-kernel ML
ML library to fasten the model construction and calculation
ML data structure: Conv layer, MLP layer, etc.
Helper fuctions: mat mul, backware prop
ML training
Efficient background training
Offline training and updates
...
Customized ML
Neural architecture search
Meta learning
...
ML inference
Model compression
Model quantization
...
Challenge #3: guarantees of infrastructure behavior
We need to guarantee the integrity of the whole infrastructure
Decision based on ML leads to desirable state: e.g., resource imbalance
Incorrect model causing kernel panic: eg., defective ML model
Privacy violation: e.g., data leaking
Proposal #3: The RMT verifier
Performance interference: prevent RMT program from inducing kernel into undesirable states
Model safety: verify the correctness of the model or add guardrails to black box inference
Privacy enhancement: implement privacy mechanism to prevent malicious queries
Summary
Motivation: better, generalizable kernel optimizations
Proposal: reconfigurable kernel data-paths with learned optimizations
Key approach
The RMT virtual machine
Lightweight kernel ML
The RMT verifier
Preliminary results
Page prefetching, CPU scheduling
Last updated