SEATTLE and Recent Work Jennifer Rexford Princeton University Joint with Changhoon Kim, Minlan Yu, and Matthew Caesar.

Slides:



Advertisements
Similar presentations
Interconnection: Switching and Bridging CS 4251: Computer Networking II Nick Feamster Fall 2008.
Advertisements

Jennifer Rexford Princeton University MW 11:00am-12:20pm Logically-Centralized Control COS 597E: Software Defined Networking.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Nanxi Kang Princeton University
CSCI 465 D ata Communications and Networks Lecture 20 Martin van Bommel CSCI 465 Data Communications & Networks 1.
Data Center Fabrics. Forwarding Today Layer 3 approach: – Assign IP addresses to hosts hierarchically based on their directly connected switch. – Use.
OpenFlow-Based Server Load Balancing GoneWild
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim, and Jennifer Rexford Princeton University.
Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Chang Kim, and Jennifer Rexford Princeton.
Revisiting Ethernet: Plug-and-play made scalable and efficient Changhoon Kim and Jennifer Rexford Princeton University.
Projects Related to Coronet Jennifer Rexford Princeton University
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 14.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
Oct 21, 2004CS573: Network Protocols and Standards1 IP: Addressing, ARP, Routing Network Protocols and Standards Autumn
Slide Set 15: IP Multicast. In this set What is multicasting ? Issues related to IP Multicast Section 4.4.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Routing and Routing Protocols
Tesseract A 4D Network Control Plane
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
ProActive Routing In Scalable Data Centers with PARIS Joint work with Dushyant Arora + and Jennifer Rexford* + Arista Networks *Princeton University Theophilus.
COS 461: Computer Networks
SDN Scalability Issues
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Jennifer Rexford Princeton University MW 11:00am-12:20pm SDN Software Stack COS 597E: Software Defined Networking.
Scalable Management of Enterprise and Data Center Networks Minlan Yu Princeton University 1.
1 LAN switching and Bridges Relates to Lab 6. Covers interconnection devices (at different layers) and the difference between LAN switching (bridging)
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Computer Networks Layering and Routing Dina Katabi
Connecting LANs, Backbone Networks, and Virtual LANs
Mapping Internet Addresses to Physical Addresses (ARP)
LAN Overview (part 2) CSE 3213 Fall April 2017.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
NUS.SOC.CS2105 Ooi Wei Tsang Application Transport Network Link Physical you are here.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
1 Module 4: Implementing OSPF. 2 Lessons OSPF OSPF Areas and Hierarchical Routing OSPF Operation OSPF Routing Tables Designing an OSPF Network.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
“Hashing Out” the Future of Enterprise and Data-Center Networks Jennifer Rexford Princeton University Joint with Changhoon.
Routing and Routing Protocols
Cisco Systems Networking Academy S2 C 11 Routing Basics.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Mapping IP Addresses to Hardware Addresses Chapter 5.
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Improving Fault Tolerance in AODV Matthew J. Miller Jungmin So.
1 LAN switching and Bridges Relates to Lab Outline Interconnection devices Bridges/LAN switches vs. Routers Bridges Learning Bridges Transparent.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Routing and Routing Protocols CCNA 2 v3 – Module 6.
VL2: A Scalable and Flexible Data Center Network
IP: Addressing, ARP, Routing
Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes
Routing Jennifer Rexford.
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Scaling the Network: The Internet Protocol
CS4470 Computer Networking Protocols
Revisiting Ethernet: Plug-and-play made scalable and efficient
Intra-Domain Routing Jacob Strauss September 14, 2006.
VL2: A Scalable and Flexible Data Center Network
COMP/ELEC 429/556 Introduction to Computer Networks
Scaling the Network: The Internet Protocol
Revisiting Ethernet: Plug-and-play made scalable and efficient
Ch 17 - Binding Protocol Addresses
Reconciling Zero-conf with Efficiency in Enterprises
Presentation transcript:

SEATTLE and Recent Work Jennifer Rexford Princeton University Joint with Changhoon Kim, Minlan Yu, and Matthew Caesar

Challenges in Edge Networks Large number of hosts –Tens or even hundreds of thousands of hosts Dynamic hosts –Host mobility –Virtual machine migration Cost conscious –Equipment costs –Network-management costs 2 Need a scalable and efficient self-configuring network

An All-Ethernet Solution? Self-configuration –Hosts: MAC addresses –Switches: self-learning Simple host mobility –Location-independent flat addresses But, poor scalability and performance –Flooding frames to unknown destinations –Large forwarding tables (one entry per address) –Broadcast for discovery (e.g., ARP, DHCP) –Inefficient delivery of frames over spanning tree 3

An All-IP Solution? Scalability –Hierarchical prefixes (smaller tables) –No flooding Performance –Forwarding traffic over shortest paths But, several disadvantages –Complex joint configuration of routing and DHCP –Clumsy mobility (location-dependent addresses) –Expensive longest-prefix match in data plane 4

Compromise: Hybrid Architecture 5 Ethernet-based IP subnets interconnected by routers R R R R Ethernet Bridging - Flat addressing - Self-learning - Flooding - Forwarding along a tree IP Routing - Hierarchical addressing - Subnet configuration - Host configuration - Forwarding along shortest paths R Sacrifices Ethernet’s simplicity and IP’s efficiency for scalability

Can We Do Better? Shortest-path routing on flat addresses –Shortest paths: scalability and performance –MAC addresses: self-configuration and mobility Scalability without hierarchical addressing –Limit dissemination and storage of host info –Sending packets on slightly longer paths 6 S H S S S S S S S S S S S S H H H H H H H H

SEATTLE 7 Scalable Ethernet Architecture for Large Enterprises (joint work with Changhoon Kim and Matt Caesar)

SEATTLE Design Decisions 8 ObjectiveApproachSolution 1. Avoiding flooding Never broadcast unicast traffic Network-layer one-hop DHT 2. Restraining broadcasting Bootstrap hosts via unicast 3. Reducing routing state Populate host info only when and where it is needed Traffic-driven resolution with caching 4. Shortest-path forwarding Allow switches to learn topology L2 link-state routing maintaining only switch-level topology * Meanwhile, avoid modifying end hosts

Network-Layer One-hop DHT Maintains pairs with function F –Consistent hash mapping a key to a switch –F is defined over the set of live switches One-hop DHT –Link-state routing ensures switches know each other Benefits –Fast and efficient reaction to changes –Reliability and capacity naturally grow with size of the network

Location Resolution 10 Switches End hosts Control message Data traffic = Host discovery B B x x Hash F (MAC x ) = B Store Traffic to x Hash F (MAC x ) = B Tunnel to A Notify E E Forward directly from D to A A A Tunnel to B C C D D y y Owner User Resolver Publish

Address Resolution 11 = Traffic following ARP takes a shortest path without separate location resolution B B D D Hash F (IP x ) = B Store Broadcast ARP request for IP x Hash F (IP x ) = B Unicast reply E E A A Unicast look-up to B C C x x y y

Handling Network and Host Dynamics Network events –Switch failure/recovery  Change in for DHT neighbor  Fortunately, switch failures are not common –Link failure/recovery  Link-state routing finds new shortest paths Host events –Host location, MAC address, or IP address –Must update stale host-information entries 12

Handling Host Information Changes 13 Resolver y y Host talking with x D D Old location New location Dealing with host mobility MAC- or IP-address change can be handled similarly B B x x A A C C E E F F

Packet-Level Simulations Large-scale packet-level simulation –Event-driven simulation of control plane –Synthetic traffic based on LBNL traces –Campus, data center, and ISP topologies Main results –Much less routing state than Ethernet –Only slightly more stretch than IP routing –Low overhead for handling host mobility 14

Amount of Routing State 15 SEATTLE w/o caching SEATTLE w/ caching Ethernet SEATTLE reduces the amount of routing state by more than an order of magnitude

Cache Size vs. Stretch 16 Stretch = actual path length / shortest path length (in latency) SEATTLE offers near-optimal stretch with very small amount of routing state SEATTLE ROFL

Sensitivity to Mobility 17 SEATTLE rapidly updates routing state with very low overhead SEATTLE w/o caching SEATTLE w/ caching Ethernet

Prototype Implementation 18 Host-info registration and notification msgs User/Kernel Click XORP OSPF Daemon OSPF Daemon Ring Manager Ring Manager Host Info Manager SeattleSwitch Link-state advertisements Data Frames Routing Table Network Map Click Interface Throughput: 800 Mbps for 512B packets, or 1400 Mbps for 896B packets

Conclusions on SEATTLE SEATTLE –Self-configuring –Scalable –Efficient Enabling design decisions –One-hop DHT with link-state routing –Reactive location resolution and caching –Shortest-path forwarding 19

VL2 Architecture (Microsoft) 20

VL2 Data Center Architecture VL2 work at Microsoft –Focus on data centers, rather than enterprise networks Similarities to SEATTLE –Flat addressing –Mapping of destination address to egress switch –Packet encapsulation to egress switch Differences from SEATTLE –Separate directory servers, not an in-network DHT –Lookup and encapsulation performed by the servers –Load balancing over many paths (ECMP + anycast) 21

Scalable Flow-Based Networking with DIFANE Joint work with Minlan Yu, Mike Freedman, and Jia Wang

HTTP Flexible Policies in Enterprises Access control –Drop packets from malicious hosts Customized routing –Direct Skype calls on a low-latency path Measurement –Collect detailed HTTP traffic statistics 23 HTTP

Flow-based Switches Install rules in flow-based switches –Store rules in high speed memory (TCAM) Perform simple actions based on rules –Rules: Match on bits in the packet header –Actions: Drop, forward, count 24 drop forward via link 1 Flow space src. dst.

Challenges of Policy-Based Management Policy-based network management –Specify high-level policies in a management system –Enforce low-level rules in the switches Challenges –Large number of hosts, switches and policies –Limited TCAM space in switches –Support host mobility –No hardware changes to commodity switches 25

Pre-install Rules in Switches Problems: –No host mobility support –Switches do not have enough memory 26 Packets hit the rules Forward Pre-install rules Controller

Install Rules on Demand (Ethane, NOX) Problems: –Delay of going through the controller –Switch complexity –Misbehaving hosts 27 First packet misses the rules Buffer and send packet header to the controller Install rules Forward Controller

DIFANE Architecture (two stages) DIstributed Flow Architecture for Networked Enterprises 28

Stage 1 The controller proactively generates the rules and distributes them to authority switches. 29

Partition and Distribute the Flow Rules 30 Ingress Switch Egress Switch Distribute partition information Authority Switch A AuthoritySwitch B Authority Switch C reject accept Flow space Controller Authority Switch A Authority Switch B Authority Switch C

Stage 2 The authority switches keep packets always in the data plane and reactively cache rules. 31

Following packets Packet Redirection and Rule Caching 32 Ingress Switch Authority Switch Egress Switch First packet Redirect Forward Feedback: Cache rules Hit cached rules and forward A slightly longer path in the data plane is faster than going through the control plane

Locate Authority Switches Partition information in ingress switches –Using a small set of coarse-grained wildcard rules –… to locate the authority switch for each packet Distributed directory service but not DHT –Hashing does not work for wildcards –Keys can have wildcards in arbitrary bit positions 33 Authority Switch A AuthoritySwitch B Authority Switch C X:0-1 Y:0-3  A X:2-5 Y: 0-1  B X:2-5 Y:2-3  C X:0-1 Y:0-3  A X:2-5 Y: 0-1  B X:2-5 Y:2-3  C

Three Sets of Rules in TCAM TypePriorityField 1Field 2ActionTimeout Cache Rules 21000**111*Forward to Switch B10 sec **Drop10 sec …………… Authority Rules 11000**001*Forward Trigger cache manager Infinity ***Drop, Trigger cache manager …………… Partition Rules 150***000*Redirect to auth. switch 14… …………… 34 In ingress switches reactively installed by authority switches In ingress switches reactively installed by authority switches In authority switches proactively installed by controller In authority switches proactively installed by controller In every switch proactively installed by controller In every switch proactively installed by controller

Cache Rules DIFANE Switch Prototype Built with OpenFlow switch 35 Data Plane Control Plane Cache Manager Cache Manager Send Cache Updates Recv Cache Updates Only in Auth. Switches Authority Rules Partition Rules Just software modification for authority switches Notification Cache rules

Caching Wildcard Rules Overlapping wildcard rules –Cannot simply cache matching rules 36

Caching Wildcard Rules Multiple authority switches –Contain independent sets of rules –Avoid cache conflicts in ingress switch 37 Authority switch 1 Authority switch 2

Partition Wildcard Rules Partition rules –Minimize the TCAM entries in switches –Decision-tree based rule partition algorithm 38 Cut A Cut B Cut B is better than Cut A

Handling Network Dynamics 39 Network dynamics Cache rules Authority Rules Partition Rules Policy changes at controller TimeoutChange Mostly no change Topology changes at switches No change Change Host mobilityTimeoutNo change

Prototype Evaluation Evaluation setup –Kernel-level Click-based OpenFlow switch –Traffic generators, switches, controller run on separate 3.0GHz 64-bit Intel Xeon machines Compare delay and throughput –NOX: Buffer packets and reactively install rules –DIFANE: Forward packets to authority switches 40

Delay Evaluation Average delay (RTT) of the first packet –NOX: 10 ms –DIFANE: 0.4 ms Reasons for performance improvement –Always keep packets in the data plane –Packets are delivered without waiting for rule caching –Easily implemented in hardware to further improve performance – 41

Peak Throughput One authority switch; Single-packet flow ingress switch Controller Bottleneck (50K) Controller Bottleneck (50K) DIFANE (800K) DIFANE (800K) Ingress switch Bottleneck (20K) Ingress switch Bottleneck (20K) DIFANE further increases the throughput linearly with the number of authority switches.

Scaling with Many Rules How many authority switches do we need? –Depends on total number of rules –… and the TCAM space in these authority switches 43 CampusIPTV # Rules30K5M # Switches1.7K3K Assumed Authority Switch TCAM size 160 KB1.6 MB Required # Authority Switches 5 (0.3%)100 (3%)

Conclusion DIFANE is a scalable way to support policy –Mix of reactive and proactive rule installation –Proactive compute and partition the rules –Reactively cache rules at ingress switches Generalized SEATTLE –SEATTLE: destination-based forwarding  Rules based (only) on flat MAC addresses  Lookup using hashing, in a one-hop DHT –DIFANE: multidimensional packet classification  Rules based on many fields, with possible wildcards  Lookup based on coarse-grained partition rules Scalable solution for flow-based networking 44

Backup Slides 45

46 BUFFALO Bloom Filter Forwarding Architecture for Large Organizations

Data Plane Scaling Challenge Large layer-two networks –Many end-host MAC addresses –Many switches Forwarding-table size becomes a problem –Requires a large, fast memory –Expensive and power hungry –Over-provisioning to avoid running out Buffalo’s goal –Given a small, fast memory –… make the most of it!

48 Bloom Filters Bloom filters in fast memory –A compact data structure for a set of elements –Calculate s hash functions to store element x –Easy to check set membership –Reduce memory at expense of false positives h 1 (x)h 2 (x)h s (x) x V0V0 V m-1 h 3 (x)

49 Bloom Filter Forwarding One Bloom filter (BF) per next hop –Store all addresses forwarded to that next hop Nexthop 1 Nexthop 2 Nexthop T …… packet query Bloom Filters hit

50 BUFFALO Challenges How to optimize memory usage? Minimize the false-positive rate How to handle false positives quickly? No memory/payload overhead How to handle routing dynamics? Make it easy and fast to adapt Bloom filters

51 1. Optimize Memory Usage Goal: Minimize overall false-positive rate –Probability that one BF has a false positive Input: –Fast memory size M –Number of destinations per next hop –The maximum number of hash functions Output: the size of each Bloom filter –Larger BF for next hop with more destinations

52 1. Optimize Memory Usage (cont.) Constraints –Memory constraint  Sum of all BF sizes <= fast memory size M –Bound on number of hash functions  To bound CPU calculation time  Bloom filters share the same hash functions Convex optimization problem –An optimal solution exists –Solved by IPOPT optimization tool –Runs in about 50 msec

53 1. Minimize False Positives Forwarding table with 200K entries, 10 next hops 8 hash functions

54 1. Comparing with Hash Table Save 65% memory with 0.1% false positive 65%

55 2. Handle False Positives Design goals –Should not modify the packet –Never go to slow memory –Ensure timely packet delivery BUFFALO solution –Exclude incoming interface  Avoid loops in one false positive case –Random selection among the rest  Guarantee reachability with multiple FPs

56 2. One False Positive Most common case: one false positive –When there are multiple matching next hops –Avoid sending to incoming interface We prove that there is at most a 2-hop loop with a stretch <= l(AB)+l(BA) False positive A A Shortest path B B dst

57 2. Multiple False Positives Handle multiple false positives –Random selection from matching next hops –Random walk on shortest path tree plus a few false positive links –To eventually find out a way to the destination dst Shortest path tree for destination False positive link

58 2. Multiple False Positives (cont.) Provable stretch bound –With k false positives –… expected stretch is at most O(k 2 *3 k/3 ) –Proved by random walk theories Stretch bound is actually not that bad –False positives are independent  Different switches use different hash functions –k false positives for one packet are rare  Probability of k false positives drops exponentially in k Tighter bounds in special cases: e.g., tree

59 2. Stretch in Campus Network When fp=0.001% 99.9% of the packets have no stretch When fp=0.001% 99.9% of the packets have no stretch % packets have a stretch of shortest path length When fp=0.5%, % packets have a stretch 6 times of shortest path length

3. Update on Routing Change Use CBF in slow memory –Assist BF to handle forwarding-table updates –Easy to add/delete a forwarding-table entry CBF in slow memory BF in fast memory Delete a route

3. Occasionally Resize BF Under significant routing changes –# of addresses in BFs changes significantly –Re-optimize BF sizes Use CBF to assist resizing BF –Large CBF and small BF –Easy to expand BF size by contracting CBF 1 0 Hard to expand to size 4 CBF BF Easy to contract CBF to size 4

BUFFALO Switch Architecture 62 Prototype implemented in kernel-level Click

63 Prototype Evaluation Environment –3.0 GHz 64-bit Intel Xeon –2 MB L2 data cache (fast-memory size M) –200K forwarding-table entries and 10 next hops Peak forwarding rate –365 Kpps, 1.9 μs per packet –10% faster than hash-based EtherSwitch Performance with forwarding-table updates –10.7 μs to update a route –0.47 s to reconstruct BFs based on CBFs

64 Conclusion on BUFFALO BUFFALO –Small, bounded memory requirement –Small stretch –Fast reaction to routing updates Enabling design decisions –Bloom filter per next hop –Optimizing of Bloom filter sizes –Preventing forwarding loops –Dynamic updates using counting Bloom filters

65 SeaBuff Seattle + Buffalo

Seattle –Shortest-path routing and scalable control plane –Fewer host MAC addresses stored at switches Buffalo –Less memory for given # of routable addresses –Graceful handling of increase in # of addresses Combined data plane cache destination egress switch outgoing link Seattle Bloom filter Buffalo

Two Small Sources of Stretch Seattle: diverting some packets through relay B Buffalo: extra stretch from D to B, or B to A B B E E C C D D A A x x y y Traffic to x Relay for x

Choosing the Right Solution Spatial distribution of traffic –Sparse: caching in Seattle is effective –Dense: caching is not as effective Temporal distribution of traffic –Stable: shortest path routing is effective –Volatile: forwarding through relay spreads traffic Topology –Arbitrary: false positives have more impact –Tree-like: false positives have less impact

Conclusions Growing important of edge networks –Enterprise networks –Data center networks Shortest-path routing and flat addressing –Self-configuring and efficient –Scalability in exchange for stretch Ongoing work –SeaBuff = Seattle + Buffalo –Theoretical analysis of stretch in Buffalo –Efficient access control in OpenFlow