Bladerunner: Stream Processing at Scale for a Live View of Backend Data Mutations at the Edge

https://dl.acm.org/doi/pdf/10.1145/3477132.3483572

Problem statement

  • Users of Facebook services access the social graph from around the world

  • Challenges on data liveness

    • The social graph is constantly mutating and providing every one of our end users with fresh and up-to-date data is challenging

    • Challenges

      • Mutability and relevance

        • Parts of the social graph can mutate at a high rate and some contents are time sensitive

      • Rapid changing foci

        • Users change the focus of their interest frequently

      • Legacy substrate

        • Client devices have limited processing power and storage capacity

  • Polling solution

    • Client device issuing initial query when needed and then periodically polling server for updates.

      • Wastes network bandwidth and client battery

      • Leads to many costly queries to server

        • Range queries, intersection queries that are costly

      • Polling interval introduces extra latency

  • Bladerunner architecture

  • Deep Dive - Pylon

    • Simple pub/sub system that delivers published messages to BRASSes across regions

  • Deep Dive - BRASS

    • responsible for application-specific stream processing

  • Deep Dive - BURST

    • application-layer protocol developed to support long running streams between client devices and BRASSes

  • Deep Dive - Failure Handling and Delivery Guarantee

    • BRASS and BURST are designed to provide flexibility in choosing different levels of delivery guarantees, while most applications on Bladernner today only require best effort delivery

  • Use case

    • Comments for live videos

  • Production experience - live video comments

    • Huge efficiency wins after switching from polling to Bladerunner, besides reduced latency in comment delivery

  • Other learnings

    • Explicitly specify liveness requirement for product services

    • Ranking?

      • Polling: batchify

      • Bladerunner: wait between latest items v.s. more engaging items in product experience

Last updated