Leveraging Service Meshes as a New Network Layer
https://www.youtube.com/watch?v=ajkQNjXP5Zs
Last updated
Was this helpful?
https://www.youtube.com/watch?v=ajkQNjXP5Zs
Last updated
Was this helpful?
A recent shift from monolithic to microservice architecture
For better scalability and functionality
Talk to each other through network calls
Network communication has become intrinsic to application's functioning
Visualization of a large scale microservice
Usually is a multi-hop DAG with many stages
Developers end up writing a of "communication code"
Service Meshes have emerged to "factor out communication" in microservices
Take the existing microservices and convert it into an architecture where all the communication functionality has been factored out and placed within a separate process called a sidecar
Sidecar performs all inbound and outbound network communication
The sidecar routes traffic at the granularity of an application-level request (e.g., HTTP)
End to end path involves 3 transport connections
Microservice A --> sidecar A (source)
Sidecar A --> sidecar B
Sidecar B --> Microservice B (destination)
Data plane (i.e., sidecar) offers:
Load balancing: between replicas of the service
Name resolution: not keep track of the IP address
Encryption
Tracing
Authorization
...
Control plane offers global view and policy
Service discovery
Certificate management
Configuration management
Data plane executes local function
Service Meshes as a new network layer
How is it a layer?
Service mesh abstracts lower layer functionality for the application
E.g., app is unaware which exact replica of a service it is talking to
All inbound/outbound app traffic goes through service mesh via the sidecars
Why is that a useful abstraction?
Highlights service meshes as a convenient location to observe and enhance cross-layer communication
Can be application-agnostic and expose standard APIs which make any optimizations available to all apps running on top of it
Cloud-native network stack
New opportunities enabled by service meshes
Enhanced visibility
So far: IP packet
Now: HTTP request --> internal app-level, API call tree
Enhanced visibility is useful for
Monitoring
Troubleshooting
Root cause analysis
Better knowledge of application needs
Dozens of proposals assume knowledge of application specifics
Priority aware flow scheduling
App-aware traffic engineering
...
With service mesh APIs, apps can directly signal preferences
These proposals could be made more practical through service meshes
Easier evolvability
Service meshes provide convenient platform to deploy new functionality
Run in user space on end-hosts
Service Mesh APIs are easily extensible
E.g., WASM extensions can be used to extend sidecars directly
E.g. transport protocols
Coordination with lower layers
SDN controller can provide explicit hints to the sidecar (e.g., via congestion notification)
Example: congestion increases, SDN controller notifies service mesh, service mesh reroutes traffic
Case study:
Goal: provide request-level prioritization for latency-sensitive requests in this microservice applications which serves a mix of workloads
Aim: implement this prioritization, by leveraging the capability of the sidecar to do cross-layer coordination with many minimal coupling with the application
Design components
Classify application's performance objectives at the ingress
Carry these performance objectives (i.e. priorities) through the entire system
Implement cross-layer prioritization at the sidecars
Prioritization at the request/message queue level (e.g., at the RabbitMQ queue)
Choice: transport protocol (BBR, PCC) as per QoS of the request
Kernel prioritization for latency sensitive packets
What does this prototype demonstrate?
Use knowledge of app needs across the microservice via provenance tracing (in this case, we trace priority of requests through the microservice)
Easily evolvable sidecar can deploy many optimization schemes without modifying the app
Abstract the details from the app and make these optimizations available to any app running on top of the service mesh layer (i.e., no tight coupling)
Challenges and future directions
Added latency due to sidecars
Explore kernel-bypass techniques to improve performance
Different layers may conflict in terms of their roles and functionality
For example, app layer LB may conflict with network layer LB
How do we define the scope of responsibilities
Extending cross-layer prioritization
Include compute and storage level prioritization