Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Chang Kim, and Jennifer Rexford Princeton.

Slides:



Advertisements
Similar presentations
Interconnection: Switching and Bridging
Advertisements

Interconnection: Switching and Bridging CS 4251: Computer Networking II Nick Feamster Fall 2008.
Communication Networks Recitation 3 Bridges & Spanning trees.
Transitioning to IPv6 April 15,2005 Presented By: Richard Moore PBS Enterprise Technology.
Mobility Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Switching & Operations. Address learning Forward/filter decision Loop avoidance Three Switch Functions.
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim, and Jennifer Rexford Princeton University.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim and Jennifer Rexford Princeton University.
Projects Related to Coronet Jennifer Rexford Princeton University
5/31/05CS118/Spring051 twisted pair hub 10BaseT, 100BaseT, hub r T= Twisted pair (copper wire) r Nodes connected to a hub, 100m max distance r Hub: physical.
Internetworking Different networks –Different bit rates –Frame lengths –Protocols.
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
CSEE W4140 Networking Laboratory Lecture 8: LAN Switching Jong Yul Kim
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
1 Interconnection ECS 152A. 2 Interconnecting with hubs r Backbone hub interconnects LAN segments r Extends max distance between nodes r But individual.
1 Network Layer: Host-to-Host Communication. 2 Network Layer: Motivation Can we built a global network such as Internet by extending LAN segments using.
1 Interconnecting LAN segments Repeaters Hubs Bridges Switches.
Tesseract A 4D Network Control Plane
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
COS 461: Computer Networks
IP Address 0 network host 10 network host 110 networkhost 1110 multicast address A B C D class to to
1 LAN switching and Bridges Relates to Lab 6. Covers interconnection devices (at different layers) and the difference between LAN switching (bridging)
COMS W COMS W Lecture 7. LAN Switching: Bridges & Spanning Tree Protocol.
A Scalable, Commodity Data Center Network Architecture.
Jennifer Rexford Princeton University MW 11:00am-12:20pm SDN Software Stack COS 597E: Software Defined Networking.
1 LAN switching and Bridges Relates to Lab 6. Covers interconnection devices (at different layers) and the difference between LAN switching (bridging)
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Mike Freedman Fall 2012 COS 561: Advanced Computer Networks Enterprise Configuration.
Layer 2 Switch  Layer 2 Switching is hardware based.  Uses the host's Media Access Control (MAC) address.  Uses Application Specific Integrated Circuits.
Network Redundancy Multiple paths may exist between systems. Redundancy is not a requirement of a packet switching network. Redundancy was part of the.
Homework Assignment #1 1. Homework Assignment Part 1: LAN setup –All nodes are hosts (including middle nodes) –Each link is its own LAN, with its own.
22.1 Chapter 22 Network Layer: Delivery, Forwarding, and Routing Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 CS 4396 Computer Networks Lab LAN Switching and Bridges.
Introduction1-1 Data Communications and Computer Networks Chapter 5 CS 3830 Lecture 27 Omar Meqdadi Department of Computer Science and Software Engineering.
10/8/2015CST Computer Networks1 IP Routing CST 415.
NUS.SOC.CS2105 Ooi Wei Tsang Application Transport Network Link Physical you are here.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
Review: –Ethernet What is the MAC protocol in Ethernet? –CSMA/CD –Binary exponential backoff Is there any relationship between the minimum frame size and.
1 © OneCloud and/or its affiliates. All rights reserved. VXLAN Overview Module 4.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
1 CSCD 433 Network Programming Fall 2011 Lecture 5 VLAN's.
“Hashing Out” the Future of Enterprise and Data-Center Networks Jennifer Rexford Princeton University Joint with Changhoon.
ICS 156: Networking Lab Magda El Zarki Professor, ICS UC, Irvine.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 Module 10 Routing Fundamentals and Subnets.
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 1 Cisco Networking Training (CCENT/CCT/CCNA R&S) Rick Rowe Ron Giannetti.
5: DataLink Layer 5a-1 Bridges and spanning tree protocol Reference: Mainly Peterson-Davie.
5: DataLink Layer5-1 Hubs Hubs are essentially physical-layer repeaters: m bits coming from one link go out all other links m at the same rate m no frame.
1 LAN switching and Bridges Relates to Lab Outline Interconnection devices Bridges/LAN switches vs. Routers Bridges Learning Bridges Transparent.
Routing Semester 2, Chapter 11. Routing Routing Basics Distance Vector Routing Link-State Routing Comparisons of Routing Protocols.
22.1 Network Layer Delivery, Forwarding, and Routing.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
CIS 700-5: The Design and Implementation of Cloud Networks
Link Layer 5.1 Introduction and services
Routing Jennifer Rexford.
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Revisiting Ethernet: Plug-and-play made scalable and efficient
Chapter 4 Data Link Layer Switching
Hubs Hubs are essentially physical-layer repeaters:
CS 457 – Lecture 8 Switching and Forwarding
Net 323: NETWORK Protocols
Hubs Hubs are essentially physical-layer repeaters:
LAN switching and Bridges
LAN switching and Bridges
EVPN a very short introduction
Revisiting Ethernet: Plug-and-play made scalable and efficient
LAN switching and Bridges
Ch 17 - Binding Protocol Addresses
Reconciling Zero-conf with Efficiency in Enterprises
Presentation transcript:

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Chang Kim, and Jennifer Rexford Princeton University

2 Goals of Today’s Lecture  Reviewing Ethernet bridging (Lec. 10, 11) Flat addressing, and plug-and-play networking Flooding, broadcasting, and spanning tree VLANs  New challenges to Ethernet Control-plane scalability  Avoiding flooding, and reducing routing-protocol overhead Data-plane efficiency  Enabling shortest-path forwarding and load-balancing  SEATTLE as a solution Amalgamation of various networking technologies covered so far E.g., link-state routing, name resolution, encapsulation, DHT, etc.

Quick Review of Ethernet

4 Ethernet  Dominant wired LAN technology Covers the first IP-hop in most enterprises/campuses  First widely used LAN technology  Simpler, cheaper than token LANs, ATM, and IP  Kept up with speed race: 10 Mbps – 10 Gbps Metcalfe’s Ethernet sketch

5 Ethernet Frame Structure  Addresses: source and destination MAC addresses Flat, globally unique, and permanent 48-bit value Adaptor passes frame to network-level protocol  If destination address matches the adaptor  Or the destination address is the broadcast address Otherwise, adapter discards frame  Type: indicates the higher layer protocol Usually IP

6 Ethernet Bridging: Routing at L2  Routing determines paths to destinations through which traffic is forwarded  Routing takes place at any layer (including L2) where devices are reachable across multiple hops IP routing (Lec. 13 ~ 15) Overlay routing (Lec. 17) P2P, or CDN routing (Lec. 18) Ethernet bridging (Lec. 10, 11) IP Layer App Layer Link Layer

7 Ethernet Bridges Self-learn Host Info.  Bridges (switches) forward frames selectively Forward frames only on segments that need them  Switch table Maps destination MAC address to outgoing interface Goal: construct the switch table automatically switch A B C D

8 Self Learning: Building the Table  When a frame arrives Inspect the source MAC address Associate the address with the incoming interface Store the mapping in the switch table Use a time-to-live field to eventually forget the mapping A B C D Switch learns how to reach A.

9 Self Learning: Handling Misses  Floods when frame arrives with unfamiliar dst or broadcast address Forward the frame out all of the interfaces … except for the one where the frame arrived Hopefully, this case won’t happen very often A B C D When in doubt, shout!

10 Flooding Can Lead to Loops  Flooding can lead to forwarding loops, confuse bridges, and even collapse the entire network E.g., if the network contains a cycle of switches Either accidentally, or by design for higher reliability

11 Solution: Spanning Trees  Ensure the topology has no loops Avoid using some of the links when flooding … to avoid forming a loop  Spanning tree Sub-graph that covers all vertices but contains no cycles Links not in the spanning tree do not forward frames

12 Interaction with the Upper Layer (IP)  Bootstrapping end hosts by automating host configuration (e.g., IP address assignment) DHCP (Dynamic Host Configuration Protocol) Broadcast DHCP discovery and request messages  Bootstrapping each conversation by enabling resolution from IP to MAC addr ARP (Address Resolution Protocol) Broadcast ARP requests  Both protocols work via Ethernet-layer broadcasting (i.e., shouting!)

13 Broadcast Domain and IP Subnet  Ethernet broadcast domain A group of hosts and switches to which the same broadcast or flooded frame is delivered Note: broadcast domain != collision domain  Broadcast domain == IP subnet Uses ARP to reach other hosts in the same subnet Uses default gateway to reach hosts in different subnets  Too large a broadcast domain leads to Excessive flooding and broadcasting overhead Insufficient security/performance isolation

New Challenges to Ethernet, and SEATTLE as a solution

15 “All-Ethernet” Enterprise Network?  “All-Ethernet” makes network mgmt easier Flat addressing and self-learning enables plug-and-play networking Permanent and location independent addresses also simplify  Host mobility  Access-control policies  Network troubleshooting

16 But, Ethernet Bridging Does Not Scale  Flooding-based delivery Frames to unknown destinations are flooded  Broadcasting for basic service Bootstrapping relies on broadcasting Vulnerable to resource exhaustion attacks  Inefficient forwarding paths Loops are fatal due to broadcast storms; uses the STP Forwarding along a single tree leads to inefficiency and lower utilization

17 State of the Practice: A Hybrid Architecture Enterprise networks comprised of Ethernet-based IP subnets interconnected by routers R R R R Ethernet Bridging - Flat addressing - Self-learning - Flooding - Forwarding along a tree IP Routing (e.g., OSPF) - Hierarchical addressing - Subnet configuration - Host configuration - Forwarding along shortest paths R Broadcast Domain (LAN or VLAN)

18 Motivation Neither bridging nor routing is satisfactory. Can’t we take only the best of each? Architectures Features Ethernet Bridging IP Routing Ease of configuration  Optimality in addressing  Host mobility  Path efficiency  Load distribution  Convergence speed  Tolerance to loop  SEATTLE (Scalable Ethernet ArchiTecTure for Larger Enterprises) SEATTLE       

19 Overview  Objectives  SEATTLE architecture  Evaluation  Applications and benefits  Conclusions

20 Overview: Objectives  Objectives Avoiding flooding Restraining broadcasting Keeping forwarding tables small Ensuring path efficiency  SEATTLE architecture  Evaluation  Applications and Benefits  Conclusions

21 Avoiding Flooding  Bridging uses flooding as a routing scheme Unicast frames to unknown destinations are flooded Does not scale to a large network  Objective #1: Unicast unicast traffic Need a control-plane mechanism to discover and disseminate hosts’ location information “Send it everywhere! At least, they’ll learn where the source is.” “Don’t know where destination is.”

22 Restraining Broadcasting  Liberal use of broadcasting for bootstrapping (DHCP and ARP) Broadcasting is a vestige of shared-medium Ethernet Very serious overhead in switched networks  Objective #2: Support unicast-based bootstrapping Need a directory service  Sub-objective #2.1: Yet, support general broadcast Nonetheless, handling broadcast should be more scalable

23 Keeping Forwarding Tables Small  Flooding and self-learning lead to unnecessarily large forwarding tables Large tables are not only inefficient, but also dangerous  Objective #3: Install hosts’ location information only when and where it is needed Need a reactive resolution scheme Enterprise traffic patterns are better-suited to reactive resolution

24 Ensuring Optimal Forwarding Paths  Spanning tree avoids broadcast storms. But, forwarding along a single tree is inefficient. Poor load balancing and longer paths Multiple spanning trees are insufficient and expensive  Objective #4: Utilize shortest paths Need a routing protocol  Sub-objective #4.1: Prevent broadcast storms Need an alternative measure to prevent broadcast storms

25 Backwards Compatibility  Objective #5: Do not modify end-hosts From end-hosts’ view, network must work the same way End hosts should  Use the same protocol stacks and applications  Not be forced to run an additional protocol

26 Overview: Architecture  Objectives  SEATTLE architecture Hash-based location management Shortest-path forwarding Responding to network dynamics  Evaluation  Applications and Benefits  Conclusions

27 SEATTLE in a Slide  Flat addressing of end-hosts Switches use hosts ’ MAC addresses for routing Ensures zero-configuration and backwards-compatibility (Obj # 5)  Automated host discovery at the edge Switches detect the arrival/departure of hosts Obviates flooding and ensures scalability (Obj #1, 5)  Hash-based on-demand resolution Hash deterministically maps a host to a switch Switches resolve end-hosts ’ location and address via hashing Ensures scalability (Obj #1, 2, 3)  Shortest-path forwarding between switches Switches run link-state routing to maintain only switch-level topology (i.e., do not disseminate end-host information) Ensures data-plane efficiency (Obj #4)

28 How does it work? Host discovery or registration B D x y Hash ( F ( x ) = B ) Store at B Traffic to x Hash ( F ( x ) = B ) Tunnel to egress node, A Deliver to x Switches End-hosts Control flow Data flow Notifying to D Entire enterprise (A large single IP subnet) LS core E Optimized forwarding directly from D to A C A Tunnel to relay switch, B

29 Terminology Ingress Relay (for x ) Egress x y B A Dst Src D Ingress applies a cache eviction policy to this entry shortest-path forwarding

30 Responding to Topology Changes  The quality of hashing matters! A B C D E F h h h h h h h h h h Consistent Hash minimizes re-registration overhead

31 Single Hop Look-up A B C D F(x) x y y sends traffic to x E Every switch on a ring is logically one hop away

32 Responding to Host Mobility Relay (for x ) x y B A Src D when shortest-path forwarding is used G Old Dst New Dst

33 Unicast-based Bootstrapping: ARP  ARP Ethernet: Broadcast requests SEATTLE: Hash-based on-demand address resolution 1. Host discovery 2. Hashing F ( IP a ) = r a 3. Storing ( IP a, mac a, s a ) 4. Broadcast ARP req for a 5. Hashing F ( IP a ) = r a Switch End-host Control msgs ARP msgs sasa a b rara sbsb 6. Unicast ARP req to r a 7. Unicast ARP reply ( IP a, mac a, s a ) to ingress Owner of ( IP a, mac a )

34 Unicast-based Bootstrapping: DHCP  DHCP Ethernet: Broadcast requests and replies SEATTLE: Utilize DHCP relay agent (RFC 2131)  Proxy resolution by ingress switches via unicasting 1. Host discovery 2. Hashing F ( mac d ) = r 3. Storing ( mac d, s d ) 4. Broadcast DHCP discovery 5. Hashing F ( 0xDHCP ) = r Switch End-host Control msgs DHCP msgs sdsd d h r shsh 6. DHCP msg to r DHCP server ( mac d = 0xDHCP ) 7. DHCP msg to s d 8. Deliver DHCP msg to d

35 Overview: Evaluation  Objectives  SEATTLE architecture  Evaluation Scalability and efficiency Simple and flexible network management  Applications and Benefits  Conclusions

36 Control-Plane Scalability When Using Relays  Minimal overhead for disseminating host-location information Each host’s location is advertised to only two switches  Small forwarding tables The number of host information entries over all switches leads to O(H), not O(SH)  Simple and robust mobility support When a host moves, updating only its relay suffices No forwarding loop created since update is atomic

37 Data-Plane Efficiency w/o Compromise  Price for path optimization Additional control messages for on-demand resolution Larger forwarding tables Control overhead for updating stale info of mobile hosts  The gain is much bigger than the cost Because most hosts maintain a small, static communities of interest (COIs) [Aiello et al., PAM’05] Classical analogy: COI ↔ Working Set (WS); Caching is effective when a WS is small and static

38 Large-scale Packet-level Simulation  In-house packet level simulator Event driven (similar to NS-2) Optimized for intensive control-plane simulation; models for data- plane simulation is limited (e.g., does not model queueing)  Test network topology Small enterprise (synthetic), campus (a large state univ.), and large Internet service providers (AS1239) Varying number of end hosts (10 ~ 50K) with up to 500 switches  Test traffic Synthetic traffic based on a large national research lab ’ s internal packet traces  17.8M packets from 5,128 hosts across 22 subnets  Inflate the trace while preserving original destination popularity distribution

39 Tuning the System

40 Stretch: Path Optimality Stretch = Actual path length / Shortest path length

41 Control Overhead: Noisiness of Protocol

42 Amount of State: Conciseness of Protocol

43 Prototype Implementation  Link-state routing: eXtensible Open Router Platform  Host information management and traffic forwarding: The Click modular router Host info. registration and notification messages User/Kernel Click XORP OSPF Daemon Ring Manager Host Info Manager SeattleSwitch Link-state advertisements from other switches Data Frames Routing Table Network Map Click Interface

44 Emulation Using the Prototype  Emulab experimentation Emulab is a large set of time-shared PCs and networks interconnecting them  Test Network Configuration 10 PC-3000 FreeBSD nodes Realistic latency on each link  Test Traffic Replayed LBNL internal packet traces in real time  Models tested Ethernet, SEATTLE w/o path opt., and SEATTLE w/ path opt. Inactive timeout-based eviction: 5 min ltout, 60 sec rtout SW2 SW1 SW0 SW3 N0N2N3N1

45 Table Size

46 Control Overhead

47 Overview: Applications and Benefits  Objectives  SEATTLE architecture  Evaluation  Applications and Benefits  Conclusions

48 Ideal Application: Data Center Network  Data centers Backend of the Internet Mid- (most enterprises) to mega-scale (Google, Yahoo, MS, etc.)  E.g., A regional DC of a major on-line service provider consists of 25K servers + 1K switches/routers  To ensure business continuity, and to lower operational cost, DCs must Adapt to varying workload  Breathing Avoid/Minimize service disruption (when maintenance, or failure)  Agility Maximize aggregate throughput  Load balancing

49 DC Mechanisms to Ensure HA and Low Cost  Agility and flexibility mechanisms Server virtualization and virtual machine migration to mask failure Could virtualize even networking devices as well  IP routing is scalable and efficient, however Can’t ensure service continuity across VM migration Must reconfigure network and hosts to handle topology changes (e.g., maintenance, breathing)  Ethernet allows for business continuity and lowers operational cost, however Can’t put 25K hosts and 1K switches in a single broadcast domain Tree-based forwarding simply doesn’t work  SEATTLE meets all these requirements neatly

50 Conclusions  SEATTLE is a plug-and-playable enterprise architecture ensuring both scalability and efficiency  Enabling design choices Hash-based location management Reactive location resolution and caching Shortest-path forwarding  Lessons Trading a little data-plane efficiency for huge control- plane scalability makes a qualitatively different system Traffic patterns are our friends

51 More Lessons  You can create a new solution by combining existing techniques/ideas from different layers E.g., DHT-based routing  First used for P2P, CDN, and overlay  Then extended to L3 routing (id-based routing)  Then again extended to L2 (SEATTLE) Deflecting through intermediaries Link-state routing Caching Mobility support through fixed registration points  Innovation is still underway

52 Thank you. Full paper is available at

Backup Slides

54 Solution: Sub-dividing Broadcast Domains  A large broadcast domain  Several small domains Group hosts by a certain rule (e.g., physical location, organizational structure, etc.) Then, wire hosts in the same group to a certain set of switches dedicated to the host group  People (and hosts) move, structures change … Re-wiring whenever such event occurs is a major pain  Solution: VLAN (Virtual LAN) Define a broadcast domain logically, rather than physically

55 Example: Two Virtual LANs Red VLAN and Orange VLAN Switches forward traffic as needed R O RORO R R R OOO RORRR O O O

56 Neither VLAN is Satisfactory  VLAN reduces the amount of broadcast and flooding, and enhances mobility to some extent Can retain IP addresses when moving inside a VLAN  Unfortunately, most problems remain, and yet new problems arise A switch must handle frames carried in every VLAN the switch is participating in; increasing mobility forces switches to join many, sometimes all, VLANs Forwarding path (i.e., a tree) in each VLAN is still inefficient STP converges slow Trunk configuration overhead increase significantly

57 More Unique Benefits  Optimal load balancing via relayed delivery Flows sharing the same ingress and egress switches are spread over multiple indirect paths For any valid traffic matrix, this practice guarantees 100% throughput with minimal link usage [Zhang-Shen et al., HotNets’04/IWQoS’05]  Simple and robust access control Enforcing access-control policies at relays makes policy management simple and robust Why? Because routing changes and host mobility do not change policy enforcement points