Xenic: SmartNIC-accelerated distributed transacitions

https://dl.acm.org/doi/abs/10.1145/3477132.3483555

Presentation

Distributed transactions in the datacenter
- Our target: distributed ACID transactions are a replicated, in-memory database
- Common approach: optimistic concurrency control + replication
- Viability depends on efficient remote operations --> hardware acceleration
Recent work applies RDMA
- One-sided read/write primitives are high-performance, but restrict design
  - Impact data structure and protocol overheads
- Two-sided RPCs are costly
  - Add latency overhead, processing costs
- FaRM: one-sided RDMA
- FaSST: two-sided RPCs
- DrTM+H: uses both
- Ongoing debate of applying RDMA: trade-offs are necessary
On-path SmartNICs: another option for hardware acceleration
- Programmable remote operations, without host processing
- Cost-effective compute: ~30% of NIC die area, 25W line-rate processing
SmartNIC opportunities
- Flexible CPU-bypass remote operations
- Latency savings via stateful NIC operations, efficient PCIe DMA
- Efficient NIC-to-NIC communication
- But
  - Software packet pipeline --> latency overhead
  - Limited NIC resources
Xenic
- Distributed transactions accelerated with on-path SmartNICs
- Key
  - Co-designed data store, spread across NIC + host DRAM
    Minimize lookup overhead, utilizing NIC's on-board memory
  - SmartNIC function shipping
    Offload transaction logic to avoid PCIe crossings
  - Multi-hop OCC protocols
    Reduce communication with optimized message patterns
  - Stateful, asynchronous SmartNIC operation framework
    Exploit the SmartNIC's hardware interfaces
Xenic: Robinhood Data Store
- Host DRAM contains all objects; SmartNIC caches objects and lookup hints
- Critical path accesses: NIC memory hit or DMA read, DMA log write
  - Lookup hints limit DMA cost for cache misses
    Cache miss: bounded DMA R
    Cache hit: NIC DRAM
  - OCC + pinning ensure NIC/host consistency
Xenic: SmartNIC function shipping
- Provides SmartNIC cores as a function shipping target
- Shipping execution can reduce overhead, depending on application-level computation and state requirements
- Saves coordinator PCIe crossings
Xenic: multi-hop OCC protocols
- Ships execution to remote SmartNICs
- Multi-hop NIC-to-NIC communication increases network efficiency
Evaluation
- Robinhood + NIC lookup hints effectively reduce cost
- SmartNIC increases DMA lookup efficiency, even for cache misses
- For FaRM and DrTM+H, end-to-end bandwidth/latency cost
- Better latency & throughput than RPC, RDMA, hybrid designs
- Also measured in our paper
  - Cumulative core savings
  - Full TPC-C, Retwis, Smallbank
Summary: high-performance, CPU-efficient distributed transactions
- Leveraging on-path SmartNICs:
  - Avoids RDMA compromises
  - Provides a new, remote access-optimized data store
  - Selectively offloads transaction logic
  - Applies multi-hop communication patterns
  - Delivers >2x throughput over 100Gbps RDMA, latency savings relative to RPCs

PreviousLineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism NextGraphs

Last updated 4 years ago

Was this helpful?