Xenic: SmartNIC-accelerated distributed transacitions

https://dl.acm.org/doi/abs/10.1145/3477132.3483555

Presentation

  • Distributed transactions in the datacenter

    • Our target: distributed ACID transactions are a replicated, in-memory database

    • Common approach: optimistic concurrency control + replication

    • Viability depends on efficient remote operations --> hardware acceleration

  • Recent work applies RDMA

    • One-sided read/write primitives are high-performance, but restrict design

      • Impact data structure and protocol overheads

    • Two-sided RPCs are costly

      • Add latency overhead, processing costs

    • FaRM: one-sided RDMA

    • FaSST: two-sided RPCs

    • DrTM+H: uses both

    • Ongoing debate of applying RDMA: trade-offs are necessary

  • On-path SmartNICs: another option for hardware acceleration

    • Programmable remote operations, without host processing

    • Cost-effective compute: ~30% of NIC die area, 25W line-rate processing

  • SmartNIC opportunities

    • Flexible CPU-bypass remote operations

    • Latency savings via stateful NIC operations, efficient PCIe DMA

    • Efficient NIC-to-NIC communication

    • But

      • Software packet pipeline --> latency overhead

      • Limited NIC resources

  • Xenic

    • Distributed transactions accelerated with on-path SmartNICs

    • Key

      • Co-designed data store, spread across NIC + host DRAM

        • Minimize lookup overhead, utilizing NIC's on-board memory

      • SmartNIC function shipping

        • Offload transaction logic to avoid PCIe crossings

      • Multi-hop OCC protocols

        • Reduce communication with optimized message patterns

      • Stateful, asynchronous SmartNIC operation framework

        • Exploit the SmartNIC's hardware interfaces

  • Xenic: Robinhood Data Store

    • Host DRAM contains all objects; SmartNIC caches objects and lookup hints

    • Critical path accesses: NIC memory hit or DMA read, DMA log write

      • Lookup hints limit DMA cost for cache misses

        • Cache miss: bounded DMA R

        • Cache hit: NIC DRAM

      • OCC + pinning ensure NIC/host consistency

  • Xenic: SmartNIC function shipping

    • Provides SmartNIC cores as a function shipping target

    • Shipping execution can reduce overhead, depending on application-level computation and state requirements

    • Saves coordinator PCIe crossings

  • Xenic: multi-hop OCC protocols

    • Ships execution to remote SmartNICs

    • Multi-hop NIC-to-NIC communication increases network efficiency

  • Evaluation

    • Robinhood + NIC lookup hints effectively reduce cost

    • SmartNIC increases DMA lookup efficiency, even for cache misses

    • For FaRM and DrTM+H, end-to-end bandwidth/latency cost

    • Better latency & throughput than RPC, RDMA, hybrid designs

    • Also measured in our paper

      • Cumulative core savings

      • Full TPC-C, Retwis, Smallbank

  • Summary: high-performance, CPU-efficient distributed transactions

    • Leveraging on-path SmartNICs:

      • Avoids RDMA compromises

      • Provides a new, remote access-optimized data store

      • Selectively offloads transaction logic

      • Applies multi-hop communication patterns

      • Delivers >2x throughput over 100Gbps RDMA, latency savings relative to RPCs

Last updated