A Vision for Runtime Programmable Networks

https://dl.acm.org/doi/10.1145/3484266.3487377

  • How important high velocity is for improving the manageability and functionality of network

  • SOTA in network programming

    • With P4 and other programmable network frameworks, operators compile and reflash the data plane w/ new program

      • Much much faster than buying new hardware (make changes much quicker)

    • However, reflashing can cause downtime and packet loss

      • Want high up-time

    • Physical resources are often scarce

    • To avoid downtime, changes must be infrequent and operator driven

  • Benefits of velocity

    • Frequent upgrades to the data path

    • 90% incidents happen during management operations

    • Network in production use are not static

      • New needs inevitably arise! (e.g., security responses)

    • Changes need to be coordinated across switches, NICs, servers

      • Applied to only part of the traffic at a time

  • FlexNet: live changes in seconds

    • Runtime network (re)programming end-to-end

    • No downtime, zero packet loss, consistency guarantees

    • Users can inject customizations to the network

  • Why FlexNet?

    • Ex: real-time security defense

      • Network devices swap defenses

      • Hot patching zero-day attacks in the network

      • Dynamic scaling based on attack patterns

    • Just-in-time network specialization

      • Common mode: basic network logic, low footprint

      • JIT specialization based on workloads and applications

    • Tenant-specific extensions

      • Tenants directly customize network logic

    • Incremental infrastructure upgrades

      • Coordinated changes at the NIC, switch, and the host

      • E.x. deploy new CC protocols for a slice (for a particular group of cloud tenants)

  • Possible? Why now?

    • Individual targets are becoming runtime programmable

      • P4 and NPL programmable switch ASICs

      • SoC and FPGA SmartNICs

        • Runtime programmability --> partial reconfigurations

      • Host kernel stacks

        • Host-kernel: runtime loading of eBPF programs, kernel loading be dynamically configurable

    • Ex: Runtime programmable switches (NSDI '22)

      • Add, remove, and modify match / action tables

      • Change packet header parsers

      • Reconfigure control flow

  • Runtime programmable networks

  • Need a way to program the entire network, and need new abstraction to map packet processing to individual components of this network --> whole network programming

  • Runtime reconfigurability, with incremental changes, minimally destructive to each of these individual components

  • Control plane, real-time network control: manage the network as apps move from one place to another place

Overall, FlexNet raises new challenges across the stack

  • FlexNet supports runtime deployment, migration, and scale-in, scale-out of applications

  • New abstractions --> hide the details that program components can migrate across the network

  • Programming a "fungible" datapath

    • Compiler: analyze programs, and automatically generate the distribution and migration plans

    • The end-to-end network provides a fungible datapath

      • A logical "whole-stack" device

      • Hiding details on vertical / horizontal distribution

      • Whole-network program compiled to the devices

  • Incorporating real-time changes

    • Infrastructure program provided by the network owner

      • Basic utilies, e.g., ACL, telemetry

    • Tenant extensions injected in real time

      • User-specific upgrades, e.g., DDoS, refined telemetry

    • Research questions: programming, compiling, and optimizing these runtime changes are all interesting questions to look like

  • Managing elastic network apps

    • Network management system requires a re-design

    • Today mgmt and control solutions are box centric and focus on individual devices, but FlexNet apps can migrate across the network (not fixed location)

    • Need solutions that are app-centric

    • New primitives are needed to name the apps, control the migration and replication without worrying about the device-level details

    • Supporting control operations with new data plane primitives

      • E.g. app / state replication

Summary

  • Non-programmable: vendors control network logic, changes happen in years

  • Compile-time programmable: operators control network logic, changes happen in weeks

  • Runtime programmable: Users control network logic, changes happen in seconds

  • A new modality of network programmability

Last updated