Revisiting Ethernet: Plug-and-play made scalable and efficient

Slides:



Advertisements
Similar presentations
Interconnection: Switching and Bridging CS 4251: Computer Networking II Nick Feamster Fall 2008.
Advertisements

Shortest Path Bridging IEEE 802
Communication Networks Recitation 3 Bridges & Spanning trees.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
Mobility Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim, and Jennifer Rexford Princeton University.
Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Chang Kim, and Jennifer Rexford Princeton.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim and Jennifer Rexford Princeton University.
Projects Related to Coronet Jennifer Rexford Princeton University
Tesseract A 4D Network Control Plane
COS 461: Computer Networks
CN2668 Routers and Switches Kemtis Kunanuraksapong MSIS with Distinction MCTS, MCDST, MCP, A+
Network Redundancy Multiple paths may exist between systems. Redundancy is not a requirement of a packet switching network. Redundancy was part of the.
Chapter 4: Managing LAN Traffic
Itrat Rasool Quadri ST ID COE-543 Wireless and Mobile Networks
Homework Assignment #1 1. Homework Assignment Part 1: LAN setup –All nodes are hosts (including middle nodes) –Each link is its own LAN, with its own.
TRansparent Interconnection of Lots of Links (TRILL) March 11 th 2010 David Bond University of New Hampshire: InterOperability.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
Address Resolution Protocol(ARP) By:Protogenius. Overview Introduction When ARP is used? Types of ARP message ARP Message Format Example use of ARP ARP.
NUS.SOC.CS2105 Ooi Wei Tsang Application Transport Network Link Physical you are here.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
“Hashing Out” the Future of Enterprise and Data-Center Networks Jennifer Rexford Princeton University Joint with Changhoon.
ICS 156: Networking Lab Magda El Zarki Professor, ICS UC, Irvine.
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 1 Cisco Networking Training (CCENT/CCT/CCNA R&S) Rick Rowe Ron Giannetti.
5: DataLink Layer 5a-1 Bridges and spanning tree protocol Reference: Mainly Peterson-Davie.
1 LAN switching and Bridges Relates to Lab Outline Interconnection devices Bridges/LAN switches vs. Routers Bridges Learning Bridges Transparent.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Assignment 1  Chapter 1:  Question 11  Question 13  Question 14  Question 33  Question 34  Chapter 2:  Question 6  Question 39  Chapter 3: 
Routing and Routing Protocols CCNA 2 v3 – Module 6.
1 Computer Networks Chapter 5. Network layer The network layer is concerned with getting packets from the source all the way to the destination. Getting.
Computer Communication Networks
NAT – Network Address Translation
Multi Node Label Routing – A layer 2.5 routing protocol
IP: Addressing, ARP, Routing
Dynamic Routing Protocols II OSPF
Semester 3, Chapter 5 Allan Johnson
Cellular IP: A New Approach to Internet Host Mobility
Routing Jennifer Rexford.
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Scaling the Network: The Internet Protocol
CS4470 Computer Networking Protocols
Revisiting Ethernet: Plug-and-play made scalable and efficient
MAC Addresses and ARP 32-bit IP address:
Mobicom ‘99 Per Johansson, Tony Larsson, Nicklas Hedman
ICMP ICMP – Internet Control Message Protocol
NOX: Towards an Operating System for Networks
Chapter 4: Network Layer
Chapter 4 Data Link Layer Switching
: An Introduction to Computer Networks
Introduction to Networking
Net 323: NETWORK Protocols
Intra-Domain Routing Jacob Strauss September 14, 2006.
Hubs Hubs are essentially physical-layer repeaters:
LAN switching and Bridges
Chapter 5: Dynamic Routing
Dynamic Routing Protocols II OSPF
ECE 544 Project3 Team member: BIAO LI, BO QU, XIAO ZHANG 1 1.
Dynamic Routing and OSPF
LAN switching and Bridges
VL2: A Scalable and Flexible Data Center Network
Scaling the Network: The Internet Protocol
LAN switching and Bridges
Ch 17 - Binding Protocol Addresses
EE 122: Lecture 13 (IP Multicast Routing)
Computer Networks Protocols
Reconciling Zero-conf with Efficiency in Enterprises
Chapter 5: Link Layer 5.1 Introduction and services
Presentation transcript:

Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim, and Jennifer Rexford http://www.cs.princeton.edu/~chkim Princeton University

An “All Ethernet” Enterprise Network? “All Ethernet” makes network management easier Zero-configuration of end-hosts and network due to Flat addressing Self-learning Location independent and permanent addresses also simplify Host mobility Network troubleshooting Access-control policies

But, Ethernet bridging does not scale Flooding-based delivery Frames to unknown destinations are flooded Broadcasting for basic service Bootstrapping relies on broadcasting Vulnerable to resource exhaustion attacks Inefficient forwarding paths Loops are fatal due to broadcast storms; use the STP Forwarding along a single tree leads to inefficiency

State of the Practice: A Hybrid Architecture Enterprise networks comprised of Ethernet-based IP subnets interconnected by routers Ethernet Bridging - Flat addressing - Self-learning - Flooding - Forwarding along a tree R R IP Routing - Hierarchical addressing - Subnet configuration - Host configuration - Forwarding along shortest paths R R R

SEIZE (Scalable and Efficient Zero-config Enterprise) Motivation Neither bridging nor routing is satisfactory. Can’t we take only the best of each? Architectures Features Ethernet Bridging IP Routing Ease of configuration   Optimality in addressing Mobility support Path efficiency Load distribution Convergence speed Tolerance to loop SEIZE  SEIZE (Scalable and Efficient Zero-config Enterprise)

Overview Objectives SEIZE architecture Evaluation Conclusions

Overview: Objectives Objectives SEIZE architecture Evaluation Avoiding flooding Restraining broadcasting Keeping forwarding tables small Ensuring path efficiency SEIZE architecture Evaluation Conclusions

Avoiding Flooding Bridging uses flooding as a routing scheme Unicast frames to unknown destinations are flooded Does not scale to a large network Objective #1: Unicast unicast traffic Need a control-plane mechanism to discover and disseminate hosts’ location information “Send it everywhere! At least, they’ll learn where the source is.” “Don’t know where destination is.”

Restraining Broadcasting Liberal use of broadcasting for bootstrapping (DHCP and ARP) Broadcasting is a vestige of shared-medium Ethernet Very serious overhead in switched networks Objective #2: Support unicast-based bootstrapping Need a directory service Sub-objective #2.1: Support general broadcast However, handling broadcast should be more scalable

Keeping Forwarding Tables Small Flooding and self-learning lead to unnecessarily large forwarding tables Large tables are not only inefficient, but also dangerous Objective #3: Install hosts’ location information only when and where it is needed Need a reactive resolution scheme Enterprise traffic patterns are better-suited to reactive resolution

Ensuring Optimal Forwarding Paths Spanning tree avoids broadcast storms. But, forwarding along a single tree is inefficient. Poor load balancing and longer paths Multiple spanning trees are insufficient and expensive Objective #4: Utilize shortest paths Need a routing protocol Sub-objective #4.1: Prevent broadcast storms Need an alternative measure to prevent broadcast storms

Backwards Compatibility Objective #5: Do not modify end-hosts From end-hosts’ view, network must work the same way End hosts should Use the same protocol stacks and applications Not be forced to run an additional protocol

Overview: Architecture Objectives SEIZE architecture Hash-based location management Shortest-path forwarding Responding to network dynamics Evaluation Conclusions

SEIZE in a Slide Flat addressing of end-hosts Switches use hosts’ MAC addresses for routing Ensures zero-configuration and backwards-compatibility (Obj # 5) Automated host discovery at the edge Switches detect the arrival/departure of hosts Obviates flooding and ensures scalability (Obj #1, 5) Hash-based on-demand resolution Hash deterministically maps a host to a switch Switches resolve end-hosts’ location and address via hashing Ensures scalability (Obj #1, 2, 3) Shortest-path forwarding between switches Switches run link-state routing with only their own connectivity info Ensures data-plane efficiency (Obj #4)

How does it work? y x C A D B E Optimized forwarding Deliver to x directly from D to A Deliver to x y x C Host discovery or registration A Traffic to x Tunnel to egress node, A Hash (F(x) = B) Tunnel to relay switch, B Hash (F(x) = B) D Entire enterprise (A large single IP subnet) LS core Notifying <x, A> to D B Store <x, A> at B Switches E End-hosts Control flow Data flow

cut-through forwarding Terminology Dst x < x, A > cut-through forwarding Src y A Ingress Egress D < x, A > Ingress applies a cache eviction policy to this entry Relay (for x) B < x, A >

Responding to Topology Changes Consistent Hash [Karger et al.,STOC’97] minimizes re-registration h h A E h h F B h h h h h D h C

Every switch on a ring is Single Hop Look-up y sends traffic to x y x A E Every switch on a ring is logically one hop away B F(x) D C

Responding to Host Mobility Old Dst < x, G > x < x, A > Src y when cut-through forwarding is used A D < x, A > < x, G > Relay (for x) G B New Dst < x, G > < x, A > < x, G >

Unicast-based Bootstrapping ARP Ethernet: Broadcast requests SEIZE: Hash-based on-demand address resolution Exactly the same mechanism as location resolution Proxy resolution by ingress switches via unicasting DHCP Ethernet: Broadcast requests and replies SEIZE: Utilize DHCP relay agent (RFC 2131)

Overview: Evaluation Objectives SEIZE architecture Evaluation Scalability and efficiency Simple and flexible network management Conclusions

Control-Plane Scalability When Using Relays Minimal overhead for disseminating host-location information Each host’s location is advertised to only two switches Small forwarding tables The number of host information entries over all switches leads to O(H), not O(SH) Simple and robust mobility support When a host moves, updating only its relay suffices No forwarding loop created since update is atomic

Data-Plane Efficiency w/o Compromise Price for path optimization Additional control messages for on-demand resolution Larger forwarding tables Control overhead for updating stale info of mobile hosts The gain is much bigger than the cost Because most hosts maintain a small, static communities of interest (COIs) [Aiello et al., PAM’05] Classical analogy: COI ↔ Working Set (WS); Caching is effective when a WS is small and static

Prototype Implementation Link-state routing: eXtensible Open Router Platform [Handley et al., NSDI’05] Host information management and traffic forwarding: The Click modular router [Kohler et al., TOCS’00] XORP Link-state advertisements from other switches Click Interface Network Map OSPF Daemon Click Host info. registration and notification messages Routing Table Ring Manager Host Info Manager SeizeSwitch Data Frames Data Frames

Evaluation: Set-up and Models Simulation; Emulation on Emulab Test Network Configuration Test Traffic LBNL internal packet traces [Pang et al., IMC’05] 17.8M packets from 5,128 hosts across 22 subnets Real-time replay Models tested Ethernet w/ STP, SEIZE w/o path opt., and SEIZE w/ path opt. Inactive timeout-based eviction: 5 min ltout, 1 ~ 60 sec rtout N0 N1 SW0 SW1 SW2 SW3 N2 N3

Simulations on AP: 315 rtrs, 50K hosts

Simulations: SmallDC: 4 core + 21 agg

Simulations: Host mobility on campus

Prototype: Failover performance Without backup pre-registration With backup pre-registration

Some Unique Benefits Optimal load balancing via relayed delivery Flows sharing the same ingress and egress switches are spread over multiple indirect paths For any valid traffic matrix, this practice guarantees 100% throughput with minimal link usage [Zhang-Shen et al., HotNets’04/IWQoS’05] Simple and robust access control Enforcing access-control policies at relays makes policy management simple and robust Why? Because routing changes and host mobility do not change policy enforcement points

Conclusions SEIZE is a plug-and-playable enterprise architecture ensuring both scalability and efficiency Enabling design choices Hash-based location management Reactive location resolution and caching Shortest-path forwarding Lessons Trading a little data-plane efficiency for huge control-plane scalability makes a qualitatively different system Traffic patterns (small static COIs, and short flow interarrival times) are our friends

Future Work Enriching evaluation Various topologies Dynamic set-ups (topology changes, and host mobility) Applying reactive location resolution to other networks There are some routing systems that need to be slimmer Generalization How aggressively can we optimize control-plane without losing data-plane efficiency?

Full paper is available at http://www.cs.princeton.edu/~chkim Thank you. Full paper is available at http://www.cs.princeton.edu/~chkim

Backup Slides

Overall Comparison Data-plane Efficiency Control-plane Scalability Low Cost

Sensitivity to Cache Eviction Policy

Group-based Broadcasting SEIZE uses per-group multicast tree

Group-based Access Control Relay switches enforce inter-group access policies The idea Allow resolution only when the access policy between a resolving host’s group and a resolved host’s group permits access

Simple and Flexible Management Using only a number of powerful switches as relays? Yes, a pre-hash can generate a set of identifiers for a switch Applying cut-through forwarding selectively? Yes, ingress switches can adaptively decide which policy to use (E.g., no cut-through forwarding for DNS look-ups) Controlling (or predicting) a switch’s table size? Yes, pre-hashing can determine the number of hosts for which a switch provides relay service The number of directly connected hosts to a switch is also usually known ahead of time Traffic engineering? Yes, adjusting link weights works effectively

Control Overhead

Host Information Replication Factor Max = NH RF 2.23 2H RF 1.83 RF 1.76 min = H

Path Efficiency + 29% + 27% + 2% Optimum

Understanding Traffic Patterns

Understanding Traffic Patterns - cont’d

Evaluation: Prototype Implementation XORP Click FEA RIBD OSPF Daemon Link State Advertisements from other switches Click Host info. registration and optimization msgs IP Forwarding Ring Mgr HostInfo Mgr SeizeSwitch Data Frames Data Frames

Prototype: Inside a Click Process FromDevice(em0) FromDevice(em1) Classifier(…) ARP IP Classifier(…) ARP IP FromDevice(eth0) FromDevice(eth1) to ARPResponder or ARPQuerier to ARPResponder or ARPQuerier Strip(14) CheckIPHeader(…) Classifier(…) ARP Strip(14) SeizeSwitch(…) IP IP LookupIPRoute(…) Classifier(…) IP Proto SEIZE Strip(20) Others Others ProcessIPMisc(…) ProcessIPMisc(…) to upper layer ARPQuerier(…) ARPQuerier(…) ToDevice(eth0) ToDevice(eth1) ToDevice(em0) ToDevice(em1)

Inside a SeizeSwitch Element EthFrame<srcmac, dstmac> arrives Layer 2 Layer 3 store or update <srcmac, in-port> in host-table c-hash srcmac, get a relay node rn notify <srcmac, my-IP> to rn to  Source Learning is dstmac me or broadcast? yes no  strip Ethernet header L2 Control  yes control message? no IP Forwarding is dstIP me? yes no apply to host table look up routing table is dstmac on host-table? yes no send down to L2 c-hash dstmac, get a relay node rn yes is dstmac local? no proto == SEIZE ? yes no encapsuate with <my-IP, rn>, set proto to SEIZE get egress-IP of dstmac, from host table send out to interface send up to L4 encapsuate with <my-IP, egress-IP>, set proto to SEIZE to  strip IP header inform ingress of <dstmac, egress-IP> to  to  to  L2 Data Forwarding EthFrame<srcmac, dstmac> departs

Control Plane: Single Hop DHT 1’s LOCAL K J 6 L C, H D, F 1 2 1 Forgets L 1’s REMOTE_AUTH A 2 Registers F I B 3 E, K, L 3 Registers L H C G D E 5 B, G F 4 A, J, I

Temporal Traffic Locality

Spatial Traffic Locality

Failover Performance Time/Sequence Graph Time/Sequence Graph Sequence Num. [KB] 50 150 250 350 450 550 650 Time (s) SW down SW up New ST built 100,000 50,000 Time/Sequence Graph Sequence Num. [KB] 100,000 50,000 50 100 150 Time (s) Relay down Relay up OSPF cnvg & host registration