ONIX: A Distributed Control Platform for Large-scale Production Networks

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Koponen.pdf

Present a network-wide control platform
- Control platform handles state distribution and coordinates the state among various platform servers (SDN setup)
- Provides a unified framework for understanding the scaling and performance properties of the system
- Provides a higher level API
- Handles difficulty of implementing distribution mechanisms and deploying them on switches
Essence of SDN: single control platform to implement a range of control functions (e.g. routing, TE, AC, VM migration) over a spectrum of granularities (e.g. flows to large traffic agg) in variety of contexts (e.g. enterprises, DCs, WANs)
- basic primitives for state distribution should be implemented once in the control platform rather than separately for individual control tasks
- should use well-known and general-purpose techniques from the distributed systems literature rather than the more specialized algorithms found in routing protocols and other network control mechanisms
Challenges: generality, scalability, reliability, simplicity, control plane performance (adequate, not optimal)
Key contribution
- General API
- Flexible distribution primitives (e.g. DHT storage, group memberships)

Physical infrastructure: switches, routers, other network elements
- Only run software that is required to support this interface and achieve basic connectivity
Connectivity infrastructure: between physical networking gear and Onix (the "control traffic") transits the connectivity infrastructure
- Need to support bi-directional communication between Onix instances and switches
Onix: distributed system running on a cluster of physical servers, each of which might run multiple Onix instances
- 1) giving the control logic programmatic access to the network
- 2) provide the requisite resilience for production deployments
Control logic: implement on top of Onix's API
- Determine desired network behavior

Consists of a data model that represents the network infrastructure, with each network element corresponding to one or more data objects
Control logic functionality
- Read the current state of the object
- Alter the network state by operating on these objects
- Register for notifications of state changes to these objects
- Customize the data model
- Have control over placement and consistency of each component of the network state
Network Information Base (NIB)
- Graph of entities within a network topology
- Control applications are implemented by reading and writing to the NIB
- Onix: provides scalability and resilience by replicating and distributing the NIB between multiple running instances (with app-specific logic)
- Details
  - network entities: each of which hold a set of KV pairs, identified by a flat, 128-bit global identifier
  - typed entities: can contain a predefined set of attributes and methods to perform operations over these attributes
  - No fine-grained or distributed locking mechanisms, just a mechanism to request and release exclusive access to the NIB data structure of the local instances
    Must use external mechanism
  - All ops are async; but also provide a sync primitive

Need to scale (i.e. as the # of elements of the network increases)
- What's the normal scale?
NIB distribution framework

In-band v.s Out-of-band
- Control traffic shares the same forwarding elements as the data traffic in the network
- V.s. separate physical network is used to handle the control traffic

Last updated 2 years ago

Was this helpful?