# Dagger: Efficient and Fast RPCs in Cloud Microservices in Near-Memory Reconfigurable NICs

### Presentation&#x20;

* Trends in cloud computing (monoliths)&#x20;
  * Tightly-coupled application logic in a single statically / dynamically linked library&#x20;
  * Shift towards microservices&#x20;
    * Loosely-coupled application logic split into many independent small applications
  * Shift towards serverless&#x20;
    * Fine application granularity
    * Fine lifetime granularity&#x20;
* Cloud applications today are interactive&#x20;
  * Frequent interaction with large sets of users&#x20;
  * Strict performance requirements as SLO
    * Low tail latency under high load
    * Performance predictability&#x20;
* Focus on: improve communication stack in microservices&#x20;
  * over RPCs&#x20;
  * RPC requests in microservices are small and vary by tiers&#x20;
  * Take
    * Per-request communication overheads are large
    * Cannot tune communication stacks for small messages only
    * Need an adaptive stack&#x20;
  * RPC stacks run on the same CPUs as highly concurrent applications&#x20;
    * Already high pressure on CPUs from applications
    * Intensive traffic of small messages&#x20;
* Dagger: a HW/SW co-designed end-host RPC stack&#x20;
  * Design principles&#x20;
    * **Hardware offload**&#x20;
      * Existing techniques to improve efficiency of cloud networking&#x20;
        * Kernel bypass: IX, eRPC, mTCP, and many others&#x20;
          * Removes per-packet kernel overheads, tightly couples networking stacks with applications, but still run everything in SW&#x20;
        * RDMA system&#x20;
          * Offloads networking stacks to hardware&#x20;
          * But:&#x20;
            * only provides low-level abstractions, the RPC part runs in SW
            * requires specialized adapters&#x20;
      * hardware NIC for end-host communication stacks, from the L1 (PHY) layer, and all the way up to the application (RPC) layer&#x20;
        * Completely free CPU from doing any work related to data exchange
    * **Reconfigurability**&#x20;
      * Networking protocols, load balances, threading, data representation, data manipulation. HW should also be&#x20;
      * Dagger is based on an FPGA!&#x20;
        * Configurable transport: UDP, TCP, mTCP, HOMA, TONIC&#x20;
        * Configurable load balancer / flow controller: static, round-robin, random, application-specific&#x20;
        * Configurable host-NIC interface: PCIe doorbells, PCIe MMIOs, coherency-based&#x20;
        * Configurable threading model: connection/thread/queue/flow mapping, number of NIC flows / queues&#x20;
    * **Tight coupling**&#x20;
      * Dagger is based on a cache-coherent FPGA tightly-coupled with the host CPU
        * Inspired by soNUMA, series of RDMA studies&#x20;
        * an FPGA acting as NUMA node
          * No DMAs are required to exchange data between NUMA nodes
          * No explicit MMIO requests
          * Minimal software overhead
          * NUMA interconnects have lower latency&#x20;
      * Existing SmartNICs are based on PICe! (introduce overheads)&#x20;
        * Doorbell scheme&#x20;
          * Multiple PCIe roundtrips
          * Expensive and CPU-inefficient rings based on MMIOs&#x20;
        * Existing optimizations: combined descriptors and packets, packet write with MMIOs, doorbell batching... (but fail to eliminate)&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MVORxAomcgtzVVUqmws%2Fuploads%2FpzGLLupdSiFcCZiXwMuw%2Fimage.png?alt=media\&token=bbc7b0be-0285-42bc-9711-1b32a5004f20)
