Leveraging Service Meshes as a New Network Layer

https://www.youtube.com/watch?v=ajkQNjXP5Zs

  • A recent shift from monolithic to microservice architecture

    • For better scalability and functionality

    • Talk to each other through network calls

    • Network communication has become intrinsic to application's functioning

  • Visualization of a large scale microservice

    • Usually is a multi-hop DAG with many stages

    • Developers end up writing a of "communication code"

  • Service Meshes have emerged to "factor out communication" in microservices

    • Take the existing microservices and convert it into an architecture where all the communication functionality has been factored out and placed within a separate process called a sidecar

    • Sidecar performs all inbound and outbound network communication

    • The sidecar routes traffic at the granularity of an application-level request (e.g., HTTP)

    • End to end path involves 3 transport connections

      • Microservice A --> sidecar A (source)

      • Sidecar A --> sidecar B

      • Sidecar B --> Microservice B (destination)

    • Data plane (i.e., sidecar) offers:

      • Load balancing: between replicas of the service

      • Name resolution: not keep track of the IP address

      • Encryption

      • Tracing

      • Authorization

      • ...

  • Control plane offers global view and policy

    • Service discovery

    • Certificate management

    • Configuration management

  • Data plane executes local function

  • Service Meshes as a new network layer

    • How is it a layer?

      • Service mesh abstracts lower layer functionality for the application

      • E.g., app is unaware which exact replica of a service it is talking to

      • All inbound/outbound app traffic goes through service mesh via the sidecars

    • Why is that a useful abstraction?

      • Highlights service meshes as a convenient location to observe and enhance cross-layer communication

      • Can be application-agnostic and expose standard APIs which make any optimizations available to all apps running on top of it

    • Cloud-native network stack

  • New opportunities enabled by service meshes

    • Enhanced visibility

      • So far: IP packet

      • Now: HTTP request --> internal app-level, API call tree

      • Enhanced visibility is useful for

        • Monitoring

        • Troubleshooting

        • Root cause analysis

    • Better knowledge of application needs

      • Dozens of proposals assume knowledge of application specifics

        • Priority aware flow scheduling

        • App-aware traffic engineering

        • ...

      • With service mesh APIs, apps can directly signal preferences

      • These proposals could be made more practical through service meshes

    • Easier evolvability

      • Service meshes provide convenient platform to deploy new functionality

        • Run in user space on end-hosts

        • Service Mesh APIs are easily extensible

        • E.g., WASM extensions can be used to extend sidecars directly

        • E.g. transport protocols

    • Coordination with lower layers

      • SDN controller can provide explicit hints to the sidecar (e.g., via congestion notification)

      • Example: congestion increases, SDN controller notifies service mesh, service mesh reroutes traffic

  • Case study:

    • Goal: provide request-level prioritization for latency-sensitive requests in this microservice applications which serves a mix of workloads

    • Aim: implement this prioritization, by leveraging the capability of the sidecar to do cross-layer coordination with many minimal coupling with the application

    • Design components

      • Classify application's performance objectives at the ingress

      • Carry these performance objectives (i.e. priorities) through the entire system

      • Implement cross-layer prioritization at the sidecars

        • Prioritization at the request/message queue level (e.g., at the RabbitMQ queue)

        • Choice: transport protocol (BBR, PCC) as per QoS of the request

        • Kernel prioritization for latency sensitive packets

      • What does this prototype demonstrate?

        • Use knowledge of app needs across the microservice via provenance tracing (in this case, we trace priority of requests through the microservice)

        • Easily evolvable sidecar can deploy many optimization schemes without modifying the app

        • Abstract the details from the app and make these optimizations available to any app running on top of the service mesh layer (i.e., no tight coupling)

    • Challenges and future directions

      • Added latency due to sidecars

        • Explore kernel-bypass techniques to improve performance

      • Different layers may conflict in terms of their roles and functionality

        • For example, app layer LB may conflict with network layer LB

        • How do we define the scope of responsibilities

      • Extending cross-layer prioritization

        • Include compute and storage level prioritization

Last updated