Scalable and Efficient Self-configuring Networks (thesis)
Abstract:
Managing today's data networks is highly expensive, difficult, and error-prone. At the center of this enormous difficulty lies configuration: a Sisyphean task of updating operational settings of numerous network devices and protocols. Much has been done to mask this configuration complexity intrinsic to conventional networks, but little effort has been made to redesign the networks themselves to make them easier to configure.
As part of a broad effort to rearchitect networks with ease of configuration in mind, this dissertation focuses on enabling self-configuration in edge networks – corporate or university-campus, data-center, or virtual private networks – which are rapidly growing and yet significantly under-explored. To ensure wide deployment, however, selfconfiguring networks must be scalable and efficient at the same time. To this end, we first identify three technical principles: flat addressing (enabling self-configuration), traffic indirection (enhancing scalability), and usage-driven optimization (improving efficiency). Then, to demonstrate the benefits of these principles, we design, implement, and deploy practical network architectures built upon the principles.
Our first architecture, SEATTLE, combines Ethernet's self-configuration capability with IP's scalability and efficiency. Its key contribution is a novel host-information resolution system that leverages the strong consistency of a network-layer routing protocol. The resulting architecture is suitable for enterprises and campuses to build a largescale plug-and-play network. Extensive simulation and emulation tests, conducted by replaying real-world traffic traces on various real network topologies built with prototype SEATTLE switches, confirm that SEATTLE efficiently handles network failures and host mobility, while reducing control-plane overhead and state requirements by roughly two orders of magnitude compared with Ethernet bridging.
Our second solution, VL2, enables a plug-and-play network for a large cloud-computing data center. The core objective of a data-center network is to maintain the utilization of the data-center servers at a uniformly high level, and doing so requires agility: the ability to assign any available server to any service. VL2 ensures agility by establishing reachability without addressing and routing configuration, and by furnishing huge server-to-server capacity. Meanwhile, VL2 offers all these benefits with only commodity IP and Ethernet functions without requiring any expensive high-performance component. We built a prototype VL2 network using commodity Ethernet switches interconnecting hundreds of servers. Tests with various real and synthetic traffic patterns confirm the VL2 design can achieve 93% of the optimal utilization in the worst case. Our prototype network will soon be expanded for a cloud-computing cluster composed of more than a thousand servers offering real-world service to customers.
Then we turn our focus to VPNs, networks that interconnect geographically distributed corporate sites through public carrier networks. VPNs today are built with an efficient self-configuring architecture, which allows customer sites to autonomously choose their own address blocks and communicate with one another through the shortest (i.e., most efficient) paths. This architecture, however, blindly replicates customers' routing information at every router in the VPN provider network and thus rapidly depletes routers' memory to cope with more customers, significantly impairing scalability. Our solution, based on traffic indirection, lowers routers' memory footprint by choosing a small number of hub routers in each VPN that maintain full routing information, and by allowing non-hub routers to keep a single default route to a hub. This solution can be implemented via a slight modification to the routers' configuration without requiring router hardware or software upgrade. Extensive evaluations using real traffic matrices, routing configurations, and VPN topologies demonstrate that Relaying reduces routing tables by up to 90%, while hiding the increase of latency and workload due to indirection.