IncBricks: Toward In-Network Computation with an In-Network Cache
Abstract
Emergence of programmable network devices + increasing data traffic of data centers --> in-network computation
Offload compute operations to intermediate network devices
Serve network request with low latency
Reduce datacenter traffic + reduce congestion
Save energy
Challenge:
No general compute capabilities
Commodity datacenter networks are complex
Key: in-network caching fabric with basic computing primitives
Intro
Goal: reduce traffic, lower communication latency, reduce communication overheads
SDN
programmable switches (application-specific header parsing, customized match-action rules, light-weight programmable forwarding plane)
network accelerators: low-power multicore processors and fast traffic managers
INC: offload a set of compute operations from end-servers onto programmable network devices (switches, network accelerators)
Challenges
Limited compute power and little storage for DC computation
Keeping computation and state coherent across networking elements is complex
INC requires simple and general computing abstraction to be integrated with application logic
Propose: in-network caching fabric with basic computing primitives based on programmable network devices
IncBox: hybrid switch/network accelerator architecture, offload application-level operations
IncCache: in-network cache for KV store
System Architecture
Hierarchical topology
ToR: 10 Gbps
aggregation: 10-40 Gbps
core switches: 100 Gbps
Multiple paths in the core of the network by adding redundant switches
Traditional Ethernet switches
Packet: forward based on forwarding database (FDB)
Data plane: process network packets at line rate
Ingress / Egress controller: match transmitted and received packets between their wire-level representation and a unified, structured internal format
Packet memory: buffer in-flight packets across all ingress ports
Switching module: makes packet forwarding decisions based on the forwarding database
Control plane: configure forwarding policies
low-power processor for adding and removing forwarding rules
Programmable switch and network accelerator
Programmable switches: reconfigurability in forwarding plane
Programmable parser, match memory, action engine
Packet formats customizable
Simple operations based on headers of incoming packets
Network accelerators
Traffic manager: fast DMA between TX/RX ports and internal memory
Packet scheduler: maintaining incoming packet order and distribute packets to cores
Low-power multicore processor: payload modifications
Con: only a few interface ports, limiting processing bandwidth
Combine two hardware devices
IncBox: hardware unit of a network accelerator co-located with Ethernet switch
Packet (INC), switch forward to network accelerator for computation
IncCache: distributed, coherent KV store with computing capabilities --> packet parsing, hashtable lookup, command execution, packet encapsulation
IncBox
Design Decisions
Support three things
F1: Parse in-transit network packets and extract some fields for the IncBrick logic
F2: Modify both header and payload and forward the packet based on the hash of the key
F3: Cache key / value data and potentially execute basic operations on ached value
Should provide: P1 high throughput and P2 low latency
Programmable switches:
can only support simple operations (read, write, add, subtract, shift on counters)
size of the packet buffer is on the order of few tens of MB, most for storing incoming packet traffic and little space for caching
Can meet F1 and F2, but hard to satisfy F3 and P1, P2 in terms of payload-related operations
Network accelerators to satisfy rest of the requirements
Traffic manager can serve packet data faster than kernel bypass techniques
Kernel bypass: eliminates the overheads of in-kernel network stacks by moving protocol processing to user space
E.x. dedicate NIC to application, or continue to manage NIC by allowing applications to map NIC queues to their address space
Multi-core processors can saturate 40-100 Gbps bandwidth easily
Support multi-GB of memory, which can be used for caching
Design
Switch:
Packet checking to filter in-network caching packets based on the application header
Match: forward to network accelerator
O/W: processed in the original processing pipeline
Hit checks: whether the network accelerator has cached the key or not
Packet steering: forwards the packet to a specific port based on the hash value of the key
Network accelerator:
Application-layer computations and run the IncCache system
Extract KV paris and the command from the packet payload
Conducts memory-related operations
Write
Read
Cache look-up: miss, stops and forwards; hits: execute
After execution, rebuilds the packet and sends it back to the switch
IncCache
Able to
Cache data on both IncBox units and end-servers
Keep the cache coherent using a directory-based cache coherence protocol
Handle scenarios related to multipath routings and failures
Provide basic compute primitives
Packet format: ID, magic field, command, hash, application payload
Hash table based data cache
On both network accelerators and endhost servers
network accelerator: fixed size lock-free
endhost servers: extensible hash table, lock-free
Cache coherence protocol: keep data consistent without incurring high overhead
Hierarchical directory-based cache coherence protocol
Take advantage of the structured network topology by using a hierarchical distributed directory mechanism
Decouple system interface and program interface to provide flexible programmability
Support sequential consistency for high performance SET/GET/DEL requests
Last updated