Toward Reconfigurable Kernel Datapaths with Learned Optimizations

https://www.youtube.com/watch?v=yI5Q61V2wT4

  • Problem: OS kernels are under stress! (diversifying applications and hardware)

    • Heterogenous hardware such as SmartNICs, TPU and GPU, along with their diverging computer units, memory hierarchy, and accelerators

    • Applications are evolving: distinct characteristics, latency, different memory access patterns

  • Kernel optimizations today

    • Key limitations with today's kernel optimizations

      • Ad hoc heuristics, without guarantees they'll always work

      • One-size-fit-all optimizations for all apps and workloads

      • Cannot generalize to new apps, workloads, and hardware

  • Research agenda

    • Goal: better, principled kernel optimizations

      • Tailored for each app, workload, or hardware

      • Adapt to new scenarios

      • Minimize hand-tuning and benchmarking

    • Approach: leverage machine learning (ML)

  • Why ML?

    • Four types of benefits

      • Reduce unnecessary kernel monitoring

        • e.g. reduce page faults caused by memory affinity monitoring

      • Better and more robust heuristics

        • e.g., use-cases in ML based page prefetching, task migration, I/O scheduling, and congestion control etc.

      • Generalization to unseen scenarios

        • E.g. ML prefetcher could do online adaptation to new access patterns

      • Cross-application optimizations

        • E.g., enable efficient communication between producer and consumers

    • Kernel-ML example

      • Swap area page prefetching

        • Baseline #1: linux readahead

          • Only captures sequential patterns

        • Baseline #2: majority-vote based prefetcher (leap)

          • Captures sequential and striding patterns

        • Ours: decision tree based prefetcher

          • Captures more complex patterns

      • Workloads: contain non-sequential / striding patterns

        • E.g., video resizing and matrix convolution

        • Each workload runs in a dedicated cgroup

      • Better performance on multiple metrics

        • Better coverage: higher hit-rate and less access overhead

        • Better accuracy: less memory pollution by irrelevant pages

        • Better execution time: workload completes faster

But, how?

  • A range of research questions exist:

    • What should be the in-kernel ML infrastructure?

    • How to rearchitect the kernel to use learned decisions?

    • How to manage a diverse range of ML models?

  • Our proposal: reconfigurable kernel datapaths

    • Architect kernel datapaths with reconfigurable match tables (RMT)

      • Matches check execution context (e.g., page faults)

      • Actions perform data collection and adaptive decisions

      • Kernel ML lib performs learning and inference (e.g., NNs)

      • Complexity and behavior is analyzed by the RMT verifier for performance and safty guarantees

Challenges

  • Challenge #1: Infrastructure for in kernel ML

    • Applications in user space can easily build up ML models, however, if we want to deploy the ML to interfere kernel and achieve run-time reconfigurability --> need an infrastructure

      • Data monitoring and collection

      • Kernel ML deployment

      • Model management

    • Even with eBPF, existing kernel does not support such functionality

    • Thus, we need to build our own in-kernel RMT infrastructure to interfere critical kernel decisions

  • Proposal #1: in-kernel RMT virtual machine

    • Achieve dynamic reconfigurability

    • Steps

      • RMT program: initializing and inserting the match-action table

      • Verified, and compiled into byte code and further compiled into machine code

      • Main block: match-action table

        • K-V map manner

        • Define key decision point in kernel data path

        • Match: kernel reacts

        • Entry: execution context, such as CPU load, process ID, and training ML model, or use the ML model to replace the heuristics

  • Challenge #2: Customizing ML for the kernel

    • Kernel operations are limited and time-sensitive, we need to mitigate the overheads induced by ML

    • What techniques can be adopted to achieve efficient and high-accuracy ML training, and prediction inside kernel with tolerable overhead

  • Proposal #2: lightweight in-kernel ML

    • ML library to fasten the model construction and calculation

      • ML data structure: Conv layer, MLP layer, etc.

      • Helper fuctions: mat mul, backware prop

    • ML training

      • Efficient background training

      • Offline training and updates

      • ...

    • Customized ML

      • Neural architecture search

      • Meta learning

      • ...

    • ML inference

      • Model compression

      • Model quantization

      • ...

  • Challenge #3: guarantees of infrastructure behavior

    • We need to guarantee the integrity of the whole infrastructure

    • Decision based on ML leads to desirable state: e.g., resource imbalance

    • Incorrect model causing kernel panic: eg., defective ML model

    • Privacy violation: e.g., data leaking

  • Proposal #3: The RMT verifier

    • Performance interference: prevent RMT program from inducing kernel into undesirable states

    • Model safety: verify the correctness of the model or add guardrails to black box inference

    • Privacy enhancement: implement privacy mechanism to prevent malicious queries

Summary

  • Motivation: better, generalizable kernel optimizations

  • Proposal: reconfigurable kernel data-paths with learned optimizations

  • Key approach

    • The RMT virtual machine

    • Lightweight kernel ML

    • The RMT verifier

  • Preliminary results

    • Page prefetching, CPU scheduling

Last updated