Xenic: SmartNIC-accelerated distributed transacitions
https://dl.acm.org/doi/abs/10.1145/3477132.3483555
Last updated
Was this helpful?
https://dl.acm.org/doi/abs/10.1145/3477132.3483555
Last updated
Was this helpful?
Distributed transactions in the datacenter
Our target: distributed ACID transactions are a replicated, in-memory database
Common approach: optimistic concurrency control + replication
Viability depends on efficient remote operations --> hardware acceleration
Recent work applies RDMA
One-sided read/write primitives are high-performance, but restrict design
Impact data structure and protocol overheads
Two-sided RPCs are costly
Add latency overhead, processing costs
FaRM: one-sided RDMA
FaSST: two-sided RPCs
DrTM+H: uses both
Ongoing debate of applying RDMA: trade-offs are necessary
On-path SmartNICs: another option for hardware acceleration
Programmable remote operations, without host processing
Cost-effective compute: ~30% of NIC die area, 25W line-rate processing
SmartNIC opportunities
Flexible CPU-bypass remote operations
Latency savings via stateful NIC operations, efficient PCIe DMA
Efficient NIC-to-NIC communication
But
Software packet pipeline --> latency overhead
Limited NIC resources
Xenic
Distributed transactions accelerated with on-path SmartNICs
Key
Co-designed data store, spread across NIC + host DRAM
Minimize lookup overhead, utilizing NIC's on-board memory
SmartNIC function shipping
Offload transaction logic to avoid PCIe crossings
Multi-hop OCC protocols
Reduce communication with optimized message patterns
Stateful, asynchronous SmartNIC operation framework
Exploit the SmartNIC's hardware interfaces
Xenic: Robinhood Data Store
Host DRAM contains all objects; SmartNIC caches objects and lookup hints
Critical path accesses: NIC memory hit or DMA read, DMA log write
Lookup hints limit DMA cost for cache misses
Cache miss: bounded DMA R
Cache hit: NIC DRAM
OCC + pinning ensure NIC/host consistency
Xenic: SmartNIC function shipping
Provides SmartNIC cores as a function shipping target
Shipping execution can reduce overhead, depending on application-level computation and state requirements
Saves coordinator PCIe crossings
Xenic: multi-hop OCC protocols
Ships execution to remote SmartNICs
Multi-hop NIC-to-NIC communication increases network efficiency
Evaluation
Robinhood + NIC lookup hints effectively reduce cost
SmartNIC increases DMA lookup efficiency, even for cache misses
For FaRM and DrTM+H, end-to-end bandwidth/latency cost
Better latency & throughput than RPC, RDMA, hybrid designs
Also measured in our paper
Cumulative core savings
Full TPC-C, Retwis, Smallbank
Summary: high-performance, CPU-efficient distributed transactions
Leveraging on-path SmartNICs:
Avoids RDMA compromises
Provides a new, remote access-optimized data store
Selectively offloads transaction logic
Applies multi-hop communication patterns
Delivers >2x throughput over 100Gbps RDMA, latency savings relative to RPCs