Bladerunner: Stream Processing at Scale for a Live View of Backend Data Mutations at the Edge
https://dl.acm.org/doi/pdf/10.1145/3477132.3483572
Last updated
Was this helpful?
https://dl.acm.org/doi/pdf/10.1145/3477132.3483572
Last updated
Was this helpful?
Users of Facebook services access the social graph from around the world
Challenges on data liveness
The social graph is constantly mutating and providing every one of our end users with fresh and up-to-date data is challenging
Challenges
Mutability and relevance
Parts of the social graph can mutate at a high rate and some contents are time sensitive
Rapid changing foci
Users change the focus of their interest frequently
Legacy substrate
Client devices have limited processing power and storage capacity
Polling solution
Client device issuing initial query when needed and then periodically polling server for updates.
Wastes network bandwidth and client battery
Leads to many costly queries to server
Range queries, intersection queries that are costly
Polling interval introduces extra latency
Bladerunner architecture
Deep Dive - Pylon
Simple pub/sub system that delivers published messages to BRASSes across regions
Deep Dive - BRASS
responsible for application-specific stream processing
Deep Dive - BURST
application-layer protocol developed to support long running streams between client devices and BRASSes
Deep Dive - Failure Handling and Delivery Guarantee
BRASS and BURST are designed to provide flexibility in choosing different levels of delivery guarantees, while most applications on Bladernner today only require best effort delivery
Use case
Comments for live videos
Production experience - live video comments
Huge efficiency wins after switching from polling to Bladerunner, besides reduced latency in comment delivery
Other learnings
Explicitly specify liveness requirement for product services
Ranking?
Polling: batchify
Bladerunner: wait between latest items v.s. more engaging items in product experience