TeraPaths: End-to-End Network Path QoS Configuration Using Cross-Domain Reservation Negotiation Bruce Gibbard Dimitrios Katramatos Shawn McKee Dantong Yu GridNets 2006
2 Outline Introduction The TeraPaths project The TeraPaths system architecture Experimental deployment and testing Future work
3 Introduction The problem: support efficient/reliable/predictable peta-scale data movement in modern high-speed networks Multiple data flows with different priorities Default “best effort” network behavior can cause performance and service disruption problems Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows
4 e.g. ATLAS Data Distribution Tier 1 Tier 1 site Online System CERN Tier 1 siteBNL Tier 3 site Workstations ~GBps Mbps ~PBps ~10-40 Gbps ~10 Gbps Tier 0+1 Tier 2 SW siteNE site Midwest site Tier 3 Tier 4 ATLAS experiment ~ Gbps Tier 3 site UMich muon calibration SLAC
5 Combining LAN QoS with MPLS End sites use the DiffServ architecture to prioritize data flows at the packet level. DiffServ supports: Per-packet QoS marking IP precedence (6+2 classes of service) DSCP (64 classes of service) WAN(s) connecting end sites direct prioritized traffic through MPLS tunnels of requested bandwidth configured to preserve packet markings. MPLS/GMPLS: Uses RSVP-TE Is QoS compatible Supports Virtual tunnels, constraint-based routing, policy-based routing
6 Prioritized vs. Best Effort Traffic
7 The TeraPaths project investigates the integration and use of LAN QoS and MPLS/GMPLS-based differentiated network services in the ATLAS data intensive distributed computing environment in order to manage the network as a critical resource DOE: The collaboration includes BNL and the University of Michigan, as well as OSCARS (ESnet), Lambda Station (FNAL), and DWMI (SLAC) NSF: BNL participates in UltraLight to provide the network advances required in enabling petabyte-scale analysis of globally distributed data NSF: BNL participates in a new network initiative: PLaNetS (Physics Lambda Network System ), led by CalTech The TeraPaths Project
8 Automate MPLS/LAN QoS Setup QoS reservation and network configuration system for data flows Access to QoS reservations: Manually, through interactive web interface From a program, through APIs Compatible with a variety of networking components Cooperation with WAN providers and remote LAN sites Access Control and Accounting System monitoring Design goal: enable the reservation of end-to-end network resources to assure a specified “Quality of Service” User requests minimum bandwidth, start time, and duration System either grants request or makes a “counter offer” End-to-end network path is setup with one simple user request
9 A. “star” model A WAN 1WAN 2 B WAN n A WAN 1WAN 2 B WAN n A WAN 1WAN 2 B WAN n WAN chain C. star/daisy chain hybrid model B. “daisy chain” model End-to-End Configuration Models
10 Envisioned Overall Architecture TeraPaths Site A Site B Site C Site D WAN 1 WAN 2 WAN 3 service invocation data flow peering WAN chain
11 TeraPaths System Architecture Site A (initiator) Site B (remote) WAN chain web services WAN monitoring WAN web services hardware drivers Web page APIs Cmd line QoS requests user manager scheduler site monitor … router manager user manager scheduler site monitor … router manager WAN chain WAN web services
12 TeraPaths Web Services TeraPaths modules implemented as “web services” Each network device (router/switch) is accessible/programmable from at least one management node Site management node maintains databases (reservations, etc.) and distributes network programming by invoking web services on subordinate management nodes Remote requests to/from other sites invoke via corresponding site’s TeraPaths public web services layer WAN services invoked through proxy servers (standardization of interface, dynamic pluggability, fault tolerance) which enables interoperability among different implementation of web services Web services benefits Standardized, reliable, and robust environment Implemented in Java and completely portable Accessible via web clients and/or APIs Compatible and easily portable into Grid services and the Web Services Resource Framework (WSRF in GT4)
13 TeraPaths Web Services Architecture Internal Services Public Services Web Interface Admin Module NDC Database protected network API remote local WAN Services WAN Services proxy
14 Site Bandwidth Partitioning Scheme Minimum Best Effort traffic Dynamic bandwidth allocation Shared dynamic class(es) Dynamic microflow policing Mark packets within a class using DSCP bits, police at ingress, trust DSCP bits downstream Dedicated static classes Aggregate flow policing Shared static classes Aggregate and microflow policing
15 Reservation Negotiation Capabilities of site reservation systems Yes/No vs. Counteroffer(s) Direct commit vs. Temporary/Commit/Start Algorithms Serial vs.Parallel Counteroffer processing vs. multiple trials TeraPaths (current implementation): Counteroffers and temporary/commit/start Serial procedure (local site / remote site / WAN), limited iterations User approval requested for counteroffers WAN (OSCARS) is yes/no and direct commit
16 Initial Experimental Testbed Full-featured LAN QoS simulation testbed using a private network environment: Two Cisco switches (same models as production hardware) interconnected with 1Gb link Two managing nodes, one per switch Four host nodes, two per switch All nodes have dual 1Gb Ethernet ports, also connected to BNL campus network Managing nodes run web services, database servers, have exclusive access to switches Demo of prototype TeraPaths functionality given at SC’05
17 Simulated (testbed) and Actual Traffic BNL to Umich. – 2 bbcp dtd xfers with iperf Testbed demo – competing iperf streams background traffic through ESnet MPLS tunnel
18 BNL testbed edge router BNL testbed (virtual) border router BNL border router ESnetUltraLight OSCARS UltraLight router at UMich TeraPaths peerin g at Chicag o test host NDC New TeraPaths Testbed (end-to-end) 1 st end-to-end fully automated route setup BNL-ESnet-Umich on 1:41pm EST
19 In Progress T Develop TeraPaths-aware tools (e.g., iperf, bbcp, gridftp, etc.) T Dynamically configure and partition QoS-enabled paths to meet time- constrained network requirements T Develop site-level network resource manager for multiple VOs vying for limited WAN resources T Integrate with software from other network projects: OSCARS, lambda station, and DWMI T Collaborate on creating standards for the support of the “daisy chain” setup model
20 Future Work T Support dynamic bandwidth/routing adjustments based on resource usage policies and network monitoring data (provided by DWMI) T Extend MPLS within a site’s LAN Backbone. Further goal: widen deployment of QoS capabilities to tier 1 and tier 2 sites and create services to be honored/adopted by CERN ATLAS/LHC tier 0
21 Route Planning with MPLS (future capability) WAN WAN monitoring WAN web services TeraPaths site monitoring
22 BNL Site Infrastructure LAN/MPLS TeraPaths resource manager MPLS requests traffic identification: addresses, port #, DSCP bits grid AAA Bandwidth Requests & Releases OSCARS ingress / egress LAN QoS M10 data transfer management monitoring GridFtp & dCache/SRM SE network usage policy ESnet remote TeraPaths Remote LAN QoS requests