> For the complete documentation index, see [llms.txt](https://sliu583.gitbook.io/blog/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://sliu583.gitbook.io/blog/specific-work/seminar-and-talk/fall-21-reading-list/leveraging-service-meshes-as-a-new-network-layer.md).

# Leveraging Service Meshes as a New Network Layer

* A recent shift from monolithic to microservice architecture&#x20;
  * For better scalability and functionality&#x20;
  * Talk to each other through network calls&#x20;
  * Network communication has become intrinsic to application's functioning&#x20;
* Visualization of a large scale microservice&#x20;
  * Usually is a multi-hop DAG with many stages&#x20;
  * Developers end up writing a of "communication code"&#x20;
* **Service Meshes have emerged to "factor out communication" in microservices**&#x20;
  * Take the existing microservices and convert it into an architecture where all the communication functionality has been factored out and placed within a separate process called a sidecar&#x20;
  * Sidecar performs all inbound and outbound network communication&#x20;
  * The sidecar routes traffic at the granularity of an application-level request (e.g., HTTP)&#x20;
  * End to end path involves 3 transport connections&#x20;
    * Microservice A --> sidecar A (source)
    * Sidecar A --> sidecar B
    * Sidecar B --> Microservice B (destination)
  * **Data plane** (i.e., sidecar) offers:&#x20;
    * Load balancing: between replicas of the service&#x20;
    * Name resolution: not keep track of the IP address &#x20;
    * Encryption&#x20;
    * Tracing&#x20;
    * Authorization&#x20;
    * ...&#x20;
* **Control plane** offers global view and policy&#x20;
  * Service discovery&#x20;
  * Certificate management&#x20;
  * Configuration management&#x20;
* Data plane executes local function&#x20;
* Service Meshes as a new network layer&#x20;
  * How is it a layer?&#x20;
    * Service mesh abstracts lower layer functionality for the application
    * E.g., app is unaware which exact replica of a service it is talking to&#x20;
    * All inbound/outbound app traffic goes through service mesh via the sidecars&#x20;
  * Why is that a useful abstraction?
    * Highlights service meshes as a convenient location to observe and enhance cross-layer communication&#x20;
    * Can be application-agnostic and expose standard APIs which make any optimizations available to all apps running on top of it&#x20;
  * Cloud-native network stack&#x20;
* New opportunities enabled by service meshes&#x20;
  * **Enhanced visibility**
    * So far: IP packet
    * Now: HTTP request --> internal app-level, API call tree &#x20;
    * Enhanced visibility is useful for&#x20;
      * Monitoring
      * Troubleshooting
      * Root cause analysis&#x20;
  * **Better knowledge of application needs**&#x20;
    * Dozens of proposals assume knowledge of application specifics&#x20;
      * Priority aware flow scheduling
      * App-aware traffic engineering&#x20;
      * ...
    * With service mesh APIs, apps can directly signal preferences&#x20;
    * These proposals could be made more practical through service meshes&#x20;
  * &#x20;**Easier evolvability**&#x20;
    * Service meshes provide convenient platform to deploy new functionality&#x20;
      * Run in user space on end-hosts
      * Service Mesh APIs are easily extensible&#x20;
      * E.g., WASM extensions can be used to extend sidecars directly&#x20;
      * E.g. transport protocols&#x20;
  * **Coordination with lower layers**&#x20;
    * SDN controller can provide explicit hints to the sidecar (e.g., via congestion notification)
    * Example: congestion increases, SDN controller notifies service mesh, service mesh reroutes traffic&#x20;
* Case study:&#x20;
  * ![](/files/p6jV57jONKQdalYRJvc9)
  * Goal: provide request-level prioritization for latency-sensitive requests in this microservice applications which serves a mix of workloads&#x20;
  * Aim: implement this prioritization, by leveraging the capability of the sidecar to do cross-layer coordination with many minimal coupling with the application&#x20;
  * Design components&#x20;
    * Classify application's performance objectives at the ingress&#x20;
    * Carry these performance objectives (i.e. priorities) through the entire system&#x20;
    * Implement cross-layer prioritization at the sidecars&#x20;
      * Prioritization at the request/message queue level (e.g., at the RabbitMQ queue)
      * Choice: transport protocol (BBR, PCC) as per QoS of the request
      * Kernel prioritization for latency sensitive packets&#x20;
    * What does this prototype demonstrate?
      * Use knowledge of app needs across the microservice via provenance tracing (in this case, we trace priority of requests through the microservice)&#x20;
      * Easily evolvable sidecar can deploy many optimization schemes without modifying the app&#x20;
      * Abstract the details from the app and make these optimizations available to any app running on top of the service mesh layer (i.e., no tight coupling)&#x20;
  * Challenges and future directions&#x20;
    * Added latency due to sidecars&#x20;
      * Explore kernel-bypass techniques to improve performance&#x20;
    * Different layers may conflict in terms of their roles and functionality&#x20;
      * For example, app layer LB may conflict with network layer LB
      * How do we define the scope of responsibilities&#x20;
    * Extending cross-layer prioritization&#x20;
      * Include compute and storage level prioritization&#x20;
