Snicket: Query-Driven Distributed Tracing

https://dl.acm.org/doi/10.1145/3484266.3487393

  • The rise of microservices

    • Complexity and scaling --> microservices

    • Advantages

      • Flexibility with languages

      • Development velocity

    • Challenges

      • Distribution

  • Trace: a directed tree representing the calls made as a result of one user request

  • Head-Based Sampling

    • Unusual data is not collected

  • Tail-based sampling

    • May lose information of "how common"

  • Problems with current approaches

    • Persists information at granularity of traces

      • Most queries of the data are interested only in the subset of the trace

    • Queries run on incomplete data

      • Uniform sampling misses unusual traces

      • Trace data is biased by the sampling strategy

  • Key idea: collect only the information necessary for the query

  • What is Snicket?

    • Query-driven

    • Allows for complex queries that include graph-based reasoning

    • Tightly ties collection and computation: collect only what you need

  • Why is Snicket possible now?

    • The emergence of service proxies, analogous to application-level switches (e.x., envoy, linkerd)

    • Rise of extensions with proxies that allow computation close to the source

      • Allows network programmability higher up the stack

  • Snicket Design Overview

    • The Snicket Query Language

      • Structural filter: match on isomorphic subtree (Match)

      • Attribute filter: match on attributes of nodes or traces (Where)

      • Map: create new developer-defined attributes (latency)

      • Return: return attribute value

      • Aggregate: aggregate return values (avg)

  • Query example

  • Evaluation: expressiveness, interactivity, and latency

    • Online Boutique: 10 microservices

    • 7 nodes of e2-highmem-4

    • Horizontal and vertical autoscaling enabled: 11->12 nodes

    • Locust as load generator

    • ~15ms difference

Last updated