Snicket: Query-Driven Distributed Tracing
https://dl.acm.org/doi/10.1145/3484266.3487393
Last updated
https://dl.acm.org/doi/10.1145/3484266.3487393
Last updated
The rise of microservices
Complexity and scaling --> microservices
Advantages
Flexibility with languages
Development velocity
Challenges
Distribution
Trace: a directed tree representing the calls made as a result of one user request
Head-Based Sampling
Unusual data is not collected
Tail-based sampling
May lose information of "how common"
Problems with current approaches
Persists information at granularity of traces
Most queries of the data are interested only in the subset of the trace
Queries run on incomplete data
Uniform sampling misses unusual traces
Trace data is biased by the sampling strategy
Key idea: collect only the information necessary for the query
What is Snicket?
Query-driven
Allows for complex queries that include graph-based reasoning
Tightly ties collection and computation: collect only what you need
Why is Snicket possible now?
The emergence of service proxies, analogous to application-level switches (e.x., envoy, linkerd)
Rise of extensions with proxies that allow computation close to the source
Allows network programmability higher up the stack
Snicket Design Overview
The Snicket Query Language
Structural filter: match on isomorphic subtree (Match)
Attribute filter: match on attributes of nodes or traces (Where)
Map: create new developer-defined attributes (latency)
Return: return attribute value
Aggregate: aggregate return values (avg)
Query example
Evaluation: expressiveness, interactivity, and latency
Online Boutique: 10 microservices
7 nodes of e2-highmem-4
Horizontal and vertical autoscaling enabled: 11->12 nodes
Locust as load generator
~15ms difference