ONIX: A Distributed Control Platform for Large-scale Production Networks
https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Koponen.pdf
Contribution
Present a network-wide control platform
Control platform handles state distribution and coordinates the state among various platform servers (SDN setup)
Provides a unified framework for understanding the scaling and performance properties of the system
Provides a higher level API
Handles difficulty of implementing distribution mechanisms and deploying them on switches
Essence of SDN: single control platform to implement a range of control functions (e.g. routing, TE, AC, VM migration) over a spectrum of granularities (e.g. flows to large traffic agg) in variety of contexts (e.g. enterprises, DCs, WANs)
basic primitives for state distribution should be implemented once in the control platform rather than separately for individual control tasks
should use well-known and general-purpose techniques from the distributed systems literature rather than the more specialized algorithms found in routing protocols and other network control mechanisms
Challenges: generality, scalability, reliability, simplicity, control plane performance (adequate, not optimal)
Key contribution
General API
Flexible distribution primitives (e.g. DHT storage, group memberships)
Design
Components
Physical infrastructure: switches, routers, other network elements
Only run software that is required to support this interface and achieve basic connectivity
Connectivity infrastructure: between physical networking gear and Onix (the "control traffic") transits the connectivity infrastructure
Need to support bi-directional communication between Onix instances and switches
Onix: distributed system running on a cluster of physical servers, each of which might run multiple Onix instances
1) giving the control logic programmatic access to the network
2) provide the requisite resilience for production deployments
Control logic: implement on top of Onix's API
Determine desired network behavior
API: useful and general
Consists of a data model that represents the network infrastructure, with each network element corresponding to one or more data objects
Control logic functionality
Read the current state of the object
Alter the network state by operating on these objects
Register for notifications of state changes to these objects
Customize the data model
Have control over placement and consistency of each component of the network state
Network Information Base (NIB)
Graph of entities within a network topology
Control applications are implemented by reading and writing to the NIB
Onix: provides scalability and resilience by replicating and distributing the NIB between multiple running instances (with app-specific logic)
Details
network entities: each of which hold a set of KV pairs, identified by a flat, 128-bit global identifier
typed entities: can contain a predefined set of attributes and methods to perform operations over these attributes
No fine-grained or distributed locking mechanisms, just a mechanism to request and release exclusive access to the NIB data structure of the local instances
Must use external mechanism
All ops are async; but also provide a sync primitive
Scaling and reliability
Need to scale (i.e. as the # of elements of the network increases)
What's the normal scale?
NIB distribution framework
Scaling
Three strategies
Allow control applications to partition the workload so that adding instances reduces work without merely replicating it
Each controller can use
the NIB’s representation of the complete physical topology (from the replicated database)
coupled with link utilization data (from the DHT)
to configure the forwarding state necessary to ensure paths meeting the deployment’s requirements throughout the network
Allow for aggregation in which the network managed by a cluster of Onix nodes appears as a single node in a separate cluster's NIB
Federated, hierarchical structuring of Onix clusters
Provide applications with control over the consistency and durability of the network state
Two data store
State applications that favor durability and stronger consistency: replicated transactional database
Volatile state that is more tolerant of inconsistency: memory-based on-hop DHT
Notion
In-band v.s Out-of-band
Control traffic shares the same forwarding elements as the data traffic in the network
V.s. separate physical network is used to handle the control traffic
Last updated