Leveraging Service Meshes as a New Network Layer

https://www.youtube.com/watch?v=ajkQNjXP5Zs

A recent shift from monolithic to microservice architecture
- For better scalability and functionality
- Talk to each other through network calls
- Network communication has become intrinsic to application's functioning
Visualization of a large scale microservice
- Usually is a multi-hop DAG with many stages
- Developers end up writing a of "communication code"
Service Meshes have emerged to "factor out communication" in microservices
- Take the existing microservices and convert it into an architecture where all the communication functionality has been factored out and placed within a separate process called a sidecar
- Sidecar performs all inbound and outbound network communication
- The sidecar routes traffic at the granularity of an application-level request (e.g., HTTP)
- End to end path involves 3 transport connections
  - Microservice A --> sidecar A (source)
  - Sidecar A --> sidecar B
  - Sidecar B --> Microservice B (destination)
- Data plane (i.e., sidecar) offers:
  - Load balancing: between replicas of the service
  - Name resolution: not keep track of the IP address
  - Encryption
  - Tracing
  - Authorization
  - ...
Control plane offers global view and policy
- Service discovery
- Certificate management
- Configuration management
Data plane executes local function
Service Meshes as a new network layer
- How is it a layer?
  - Service mesh abstracts lower layer functionality for the application
  - E.g., app is unaware which exact replica of a service it is talking to
  - All inbound/outbound app traffic goes through service mesh via the sidecars
- Why is that a useful abstraction?
  - Highlights service meshes as a convenient location to observe and enhance cross-layer communication
  - Can be application-agnostic and expose standard APIs which make any optimizations available to all apps running on top of it
- Cloud-native network stack
New opportunities enabled by service meshes
- Enhanced visibility
  - So far: IP packet
  - Now: HTTP request --> internal app-level, API call tree
  - Enhanced visibility is useful for
    Monitoring
    Troubleshooting
    Root cause analysis
- Better knowledge of application needs
  - Dozens of proposals assume knowledge of application specifics
    Priority aware flow scheduling
    App-aware traffic engineering
    ...
  - With service mesh APIs, apps can directly signal preferences
  - These proposals could be made more practical through service meshes
- Easier evolvability
  - Service meshes provide convenient platform to deploy new functionality
    Run in user space on end-hosts
    Service Mesh APIs are easily extensible
    E.g., WASM extensions can be used to extend sidecars directly
    E.g. transport protocols
- Coordination with lower layers
  - SDN controller can provide explicit hints to the sidecar (e.g., via congestion notification)
  - Example: congestion increases, SDN controller notifies service mesh, service mesh reroutes traffic
Case study:
- Goal: provide request-level prioritization for latency-sensitive requests in this microservice applications which serves a mix of workloads
- Aim: implement this prioritization, by leveraging the capability of the sidecar to do cross-layer coordination with many minimal coupling with the application
- Design components
  - Classify application's performance objectives at the ingress
  - Carry these performance objectives (i.e. priorities) through the entire system
  - Implement cross-layer prioritization at the sidecars
    Prioritization at the request/message queue level (e.g., at the RabbitMQ queue)
    Choice: transport protocol (BBR, PCC) as per QoS of the request
    Kernel prioritization for latency sensitive packets
  - What does this prototype demonstrate?
    Use knowledge of app needs across the microservice via provenance tracing (in this case, we trace priority of requests through the microservice)
    Easily evolvable sidecar can deploy many optimization schemes without modifying the app
    Abstract the details from the app and make these optimizations available to any app running on top of the service mesh layer (i.e., no tight coupling)
- Challenges and future directions
  - Added latency due to sidecars
    Explore kernel-bypass techniques to improve performance
  - Different layers may conflict in terms of their roles and functionality
    For example, app layer LB may conflict with network layer LB
    How do we define the scope of responsibilities
  - Extending cross-layer prioritization
    Include compute and storage level prioritization

PreviousHigh Performance Data Center Operating Systems NextAutomatically Discovering Machine Learning Optimizations

Last updated 3 years ago

Was this helpful?