The TeraPaths Testbed: Exploring End-to-End Network QoS Dimitrios Katramatos, Dantong Yu, Bruce Gibbard, Shawn McKee TridentCom 2007 Presented by D.Katramatos, BNL
2 Outline Introduction The TeraPaths project The TeraPaths system architecture The TeraPaths testbed In progress/future work
3 Introduction Project background: modern nuclear and high-energy physics community (e.g., LHC experiments) extensively uses grid computing model; US, European, and international networks are being upgraded to multiple 10Gbps connections to cope with data movements of gigantic proportions The problem: support efficient/reliable/predictable peta-scale data movement in modern grid environments utilizing high-speed networks Multiple data flows with varying priorities Default “best effort” network behavior can cause performance and service disruption problems Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows Schedule network usage as a critical grid resource
4 e.g., ATLAS Data Distribution Tier 1 Tier 1 site Online System CERN Tier 1 siteBNL Tier 3 site Workstations ~GBps Mbps ~PBps ~10-40 Gbps ~10 Gbps Tier 0+1 Tier 2 Tier 2 site Tier 3 Tier 4 ATLAS experiment ~ Gbps Tier 3 site UMich muon calibration
5 Prioritized vs. Best Effort Traffic
6 Partition Available Bandwidth Minimum Best Effort traffic Dynamic bandwidth allocation Shared dynamic class(es) Dynamic aggregate and microflow policing Mark packets within a class using DSCP bits, police at ingress, trust DSCP bits downstream Dedicated static classes Aggregate flow policing Shared static classes Aggregate and microflow policing
7 T BNL’s TeraPaths project: q Under the U.S.ATLAS umbrella, funded by DOE q Research the use of DiffServ, MPLS/GMPLS in data-intensive distributed computing environments q Develop theoretical models for LAN/WAN coordination q Develop necessary software for integrating end-site services and WAN services to provide end-to-end (host-to-host) guaranteed bandwidth network paths to users Create, maintain, and expand a multi-site testbed for QoS network research Collaboration includes BNL, University of Michigan, ESnet, Internet 2, SLAC; Tier 2 centers being added; Tier 3s to follow The TeraPaths Project
8 End-to-End QoS… How? Within a site’s LAN (administrative domain) DiffServ works and scales fine Assign data flows to service classes Pass-through “difficult” segments But once packets leave site… DSCP markings get reset. Unless… Ongoing effort by new high-speed network providers to offer reservation- controlled dedicated paths with predetermined bandwidth ESnet’s OSCARS Internet2’s BRUW, DRAGON Reserved paths configured to respect DSCP markings Address scalability by grouping data flows with same destination and forwarding to common tunnels/circuits
9 Extend LAN QoS through WAN(s) End sites use the DiffServ architecture to prioritize data flows at the packet level: Per-packet QoS marking DSCP bits (64 classes of service) Pass-through: make high-risk and 3 rd party segments QoS-friendly WAN(s) connecting end sites forward prioritized traffic: MPLS tunnels of requested bandwidth (L3 service) Circuits of requested bandwidth (L2 service) No changes to DSCP bits
10 Automate LAN/WAN Setup QoS reservation and network configuration grid middleware Bandwidth partitioning Access to QoS reservations: Interactive web interface API Integration with popular grid data transfer tools (plug-ins) Compatible with a variety of networking hardware Coordination with WAN providers and remote LAN sites User access control and accounting Enable the reservation of end-to-end network resources to assure specific levels of bandwidth User requests bandwidth, start time, and duration System either grants request or makes a “counter offer” Network is setup end-to-end with a single user request
11 A. “star” model A WAN 1WAN 2 B WAN n A WAN 1WAN 2 B WAN n A WAN 1WAN 2 B WAN n WAN chain C. star/daisy chain hybrid model B. “daisy chain” model End-to-End Configuration Models
12 End-to-End Configuration Models Model C, hybrid star/daisy chain selected as most feasible End sites and WAN providers don’t really want to need to understand each other’s internal operations End sites don’t have any direct control over WAN providers and vice versa End sites typically deal with only the first provider of a WAN chain and have no direct business with downstream providers Star model A requires extensive topology information at end sites, authorization at all involved WAN segments Daisy chain model B requires wide adoption of new sophisticated communication protocols If the end sites cannot agree, daisy chain wastes time and cycles
13 Envisioned Overall Architecture TeraPaths Site A Site B Site C Site D WAN 1 WAN 2 WAN 3 service invocation data flow peering WAN chain
14 TeraPaths System Architecture Site A (initiator) Site B (remote) WAN chain web services WAN web services hardware drivers Web Interface API QoS requests user manager scheduler … … router manager user manager scheduler … … router manager WAN chain WAN web services
15 TeraPaths Web Services TeraPaths modules implemented as “web services” Each network device (router/switch) is accessible/programmable from at least one management node Site management node maintains databases (reservations, etc.) and distributes network programming by invoking web services on subordinate management nodes Remote requests to/from other sites invoke corresponding site’s TeraPaths public web services layer WAN services invoked through clients appearing as proxy servers (standardization of interface, dynamic pluggability, fault tolerance) Web services benefits Standardized, reliable, and robust environment Implemented in Java for portability Accessible via web interface and/or API Integration with grid services
16 TeraPaths Web Services Architecture Internal Services Public Services Web Interface Admin Module NDC Database protected network API remote local WAN Services WAN Services proxy
17 Reservation Negotiation Capabilities of site reservation systems Yes/No vs. Counteroffer(s) Direct commit vs. Temporary/Commit/Start Algorithms Serial vs.Parallel Counteroffer processing vs. multiple trials TeraPaths (current implementation): Counteroffers and temporary/commit/start Serial procedure (local site/remote site/WAN), limited iterations User approval requested for counteroffers WAN is yes/no and direct commit
18 Initial Experimental Testbed Full-featured LAN QoS simulation testbed using a private network environment: Two Cisco switches (same models as production hardware) interconnected with 1Gb link Two managing nodes, one per switch Four host nodes, two per switch All nodes have dual 1Gb Ethernet ports, also connected to BNL campus network Managing nodes run web services, database servers, have exclusive access to switches Demo of prototype TeraPaths functionality given at SC’05
19 LAN Setup hosts trust DSCP ACLs, policers to WAN from WAN border router admit re-police re-mark admit police mark non-participating subnets ACLs, policers
20 BNL testbed edge router BNL testbed (virtual) border router BNL border router ESnetUltraLight OSCARS UltraLight router at UMich TeraPaths peerin g at Chicag o test host NDC New TeraPaths Testbed (end-to-end) 1 st end-to-end fully automated route setup BNL-ESnet-Umich on 1:41pm EST
21 BNL-side Testbed
22 BNL-UMich route peering
23 Current BNL/UMich Testbed Details
24 Testbed Expansion in 2007 Sites for 2007 expansion: University of Michigan / Michigan State University University of Chicago / Indiana University University of Oklahoma / University of Texas at Arlington Boston University / Harvard University SLAC More?
25 L2 issues Special path appears as a single-hop connection between end sites (dynamic VLAN setup) Forwarding of priority flows to special path now has to take place at source end site instead of at WAN ingress point Non-priority flows must not have access to special path Pass-through issues Stricter coordination necessary between end sites and WAN segments (VLAN tags, routing tables) No mix and match between L2 and L3, but coexistence required Scalability issues
26 Utilizing Dynamic Circuits TeraPaths-controlled “virtual border” router (directs flows w/PBR) e.g.,1 to X, 2 to Y Local Provider’s Router WAN switch Site’s Border Router trunked VLAN pass-through TeraPaths-controlled host router #X #Y DSCP-friendly LAN host 1host n host to X 2 to Y
27 In progress/future work End site support for dynamic circuits Interoperation with I2’s Dynamic Circuit Services (DCS through DRAGON software) Expansion to Tier 2 sites and beyond (including BNL’s production network) Experimentation with grid data transfer tools (dCache, GridFTP, etc.) Continual improvement of TeraPaths software and feature addition Reservation negotiation algorithms Grid software plug-ins Bandwidth partitioning schemes Grid AAA
28 Thank you! Questions?
29 Simulated (testbed) and Actual Traffic BNL to Umich. – 2 bbcp dtd xfers with iperf Testbed demo – competing iperf streams background traffic through ESnet MPLS tunnel
30 BNL Site Infrastructure LAN/MPLS TeraPaths Domain Controller MPLS/L2 requests traffic identification: addresses, port #, DSCP bits grid AAA Bandwidth Requests & Releases OSCARS ingress / egress LAN QoS M10 data transfer management monitoring GridFtp & dCache/SRM SE network usage policy ESnet remote TeraPaths Remote LAN QoS requests