Invisinets

Problem

Cloud tenant networks are complex to provision, configure, and manage: they need to figure out how to deal with a large set of low-level building blocks in order to achieve their high-level goals. The complexity scales poorly within cloud and between clouds, with networks increasingly spanning multiple clouds and on-premises infrastructure.

Main Insight

  • The guiding philosophy of this paper is that "the right way for a tenant to think about the network is not to think about it at all".

  • The key problem of the complexity is that tenant endpoints live in a private IP address space; then they must establish (virtual) connectivity between these private addresses (e.g. managing subnets, constructing virtual topology with links, routers, and appliances, running routing protocols, etc.).

  • The paper uses an approach to simplify connectivity with minor infrastructure changes, primarily leveraging existing "public-routable but default-off" addresses for all endpoints. Taking L3 connectivity as a given, they develop a new, declarative and endpoint-centric API for cloud tenant operators to reserve network resources based on high-level abstractions.

Key strength

  • Simplicity, operability: the declarative API designed around high-level tenant goals allows tenants to achieve connectivity trivially and put responsibility on the cloud provider wherever possible. This greatly simplifies the complexity for tenants, reduce the # of tenant errors, and decrease the support burden on the cloud provider.

  • Deployability: the API can co-exist with existing abstractions instead of a complete replacement. Incremental deployability makes things easier.

  • The simplicity metrics and the case-studies are particularly interesting and appealing

Key weakness

  • Tenants may deem public IPs being an risk for their endpoints (remove the layer of isolation provided by virtual network private address space)

  • The QoS API is built on top of some assumptions like well connectivity between clouds (so that dedicated egress between cloud region and other domain can approximate direct link between two clouds), and it requires CP that can classify egress traffic into reserved bandwidth packet and best effort-packets. Is that true in general for CP?

Questions

  • QoS is per region, per tenant; are there any ways to enforce QoS within a single region? [like NetHint]

  • Are there better simplicity metrics? i.e. maybe there are different levels of knowledge in terms of configuring different boxes, configuration points, or links

Last updated