Data centers are the digital-era analogue of factories and gaining huge popularity among service providers and enterprises. The golden rule of designing and operating a data center is maximizing the amount of useful work per dollar spent. To meet this goal, the most desirable technical feature is agility—the ability to assign any computing resources to any tenants any time. Anything less inevitably results in stranded resources and poor performance perceived by data-center users.
In this talk, I first show how and why conventional networks specifically designed for large data centers inhibit, rather than facilitate, agility. Location-dependent constraints, huge and unpredictable performance variances, and poor scalability are the main culprits. Then I put forward network virtualization as key architecture that ensures agility at scale. The core properties of my network-virtualization architecture—abstraction, isolation, and efficiency—allow service providers to build a network that is not susceptible to the location dependencies and poor scalability in the first place, eliminating the huge burden of complicated and less effective cross-layer optimizations needed to outmaneuver the constraints. Then I explain how I turn this architecture into an operational system that virtualizes mega data-center networks for a real-world cloud service. In particular, I show how my virtualization architecture and its realization uniquely take advantage of a few critical opportunities and technical trends available in data centers, ranging from the power of a software switch present in every hypervisor, to the principle of separating network state from host state, and to the availability of commodity switching ASICs. Finally I evaluate how faithfully the resulting system meets the goal of offering a powerful and simple virtual-network abstraction—an imaginary switch that can host as many servers as a customer desires, offering predictably and uniformly high capacity between any servers under any traffic patterns, and yet dedicated only to the customer.
Changhoon Kim works at Windows Azure, Microsoft's cloud-service division, and leads research and engineering projects on the architecture, performance, management, and operation of datacenter and enterprise networks. His research themes span network virtualization, self-configuring networks, and debugging and diagnosis of large-scale distributed systems. Changhoon received Ph.D. from Princeton University in 2009, where he worked with Prof. Jennifer Rexford. Many of his research outcomes (including SEATTLE, VL2, VNet, Seawall, and the relay-routing technology for VPNs) are either directly adopted by production service providers or under review by standard bodies, such as IETF. In particular, his VL2 work was published in the Research Highlights section of the Communications of the ACM (CACM) as an invited paper, which the editors recognize as "one of the most important research results published in CS in recent years".