LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism


  • Growing DFS host resource consumption

    • DFS is popular in cloud and HPC --> applications are consolidated

    • Resource contention between DFS and co-running applications

  • Problem: performance interference

    • Host resource contention degrades the performance of both

      • 1) DFS application 2) co-running applications

  • Solution: offload DFS to SmartNIC

    • To reduce interference

    • Challenges

      • C1: high access latency from SmartNIC CPU to host PM

      • Wimpy SmartNIC architecture

  • LineFS: SmartNIC offload of DFS with pipeline parallelism

    • Goal: minimizing host application interference and DFS slowdown

    • Design principles

      • D1: persist-and-publish

        • Persist data / metadata: latency critical task (fast host CPU)

        • Background publish: deferrable task (Wimpy SmartNIC CPU)

      • D2: pipeline parallelism

        • Increase throughput of offloaded tasks

          • D2-1: publishing pipeline

          • D2-2: replication pipeline

D1: persist-and-publish

D2-1: pipeline parallelism - publishing

D2-2: pipeline parallelism - replication

  • Replication? "persisting data to all the nodes on fsync"

  • Persist data in advance: use SmartNIC resources

  • Reduce fsync latency: most of the data has already been persisted

Other design ideas in the paper:

  • Linearizability & prefix crash consistency

  • Leveraging data-path processing

  • Availability during host OS failure

  • Shared file management with lease mechanism


  • Does LineFS provide adequate offloaded DFS performance?

  • Does LineFS alleviate interference?


  • Increasing DFS host resource use causes contention

    • Degrading applications' performance

    • Unpredictable DFS operation latencies

  • Naive DFS offload to SmartNIC has high overhead

    • Long distance to host PM

    • Wimpy processing power

  • LineFS: Efficient DFS SmartNIC offload

    • Design principles: persist-and-publish, pipeline parallelism

    • 90% lower LevelDB latency, 79% higher Filebench throughput versus Assise under host resource contention

Last updated