The DataTAG Project UCSD/La Jolla, USA Olivier H. Martin / CERN

Slides:



Advertisements
Similar presentations
Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Advertisements

The DataTAG Project 25 March, Brussels FP6 Information Day Peter Clarke, University College London.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
CGW03, Crakow, 28 October 2003 DataTAG Project Update CGW’2003 workshop, Crakow (Poland) October 28, 2003 Olivier Martin, CERN, Switzerland.
Cheng Jin David Wei Steven Low FAST TCP: design and experiments.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Efficient Network Protocols for Data-Intensive Worldwide Grids Seminar at JAIST, Japan 3 March 2003 T. Kelly, University of Cambridge, UK S. Ravot, Caltech,
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
1 Internet Networking Spring 2003 Tutorial 11 Explicit Congestion Notification (RFC 3168) Limited Transmit (RFC 3042)
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
FAST TCP in Linux Cheng Jin David Wei
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #8 Explicit Congestion Notification (RFC 3168) Limited Transmit.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
1 Robust Transport Protocol for Dynamic High-Speed Networks: enhancing XCP approach Dino M. Lopez Pacheco INRIA RESO/LIP, ENS of Lyon, France Congduc Pham.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
UDT: UDP based Data Transfer Protocol, Results, and Implementation Experiences Yunhong Gu & Robert Grossman Laboratory for Advanced Computing / Univ. of.
NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge,
MAIN TECHNICAL CHARACTERISTICS Next generation optical transport networks with 40Gbps capabilities are expected to be based on the ITU’s.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
23 January 2003Paolo Moroni (Slide 1) SWITCH-cc meeting DataTAG overview.
GNEW’2004 – 15/03/2004 DataTAG project Status & Perspectives Olivier MARTIN - CERN GNEW’2004 workshop 15 March 2004, CERN, Geneva.
Internet data transfer record between CERN and California Sylvain Ravot (Caltech) Paolo Moroni (CERN)
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Murari Sridharan and Kun Tan (Collaborators: Jingmin Song, MSRA & Qian Zhang, HKUST.
Project Results Thanks to the exceptional cooperation spirit between the European and North American teams involved in the DataTAG project,
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
Tiziana FerrariThe DataTAG Projct, Roma Nov DataTAG Project.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract The main idea behind the DataTAG project was to strengthen the collaboration.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.
The EU DataTAG Project Richard Hughes-Jones Based on Olivier H. Martin GGF3 Frascati, Italy Oct 2001.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
28/09/2016 Congestion Control Ian McDonald (with many other WAND members)
Internet Networking recitation #9
Chapter 3 outline 3.1 transport-layer services
R. Hughes-Jones Manchester
Networking between China and Europe
Transport Protocols over Circuits/VCs
Networking for grid Network capacity Network throughput
TransPAC HPCC Engineer
CERN-USA connectivity update DataTAG project
1st European Across Grids Conference Santiago de Compostela, Spain
The DataTAG Project Olivier H. Martin
HighSpeed TCP for Large Congestion Windows
CEOS workshop on Grid computing Slides mainly from Olivier Martin
DataTAG Project update
TCP Performance over a 2.5 Gbit/s Transatlantic Circuit
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
Wide Area Networking at SLAC, Feb ‘03
Presented at the GGF3 conference 8th October Frascati, Italy
The EU DataTAG Project Olivier H. Martin CERN - IT Division
5th EU DataGrid Conference
Presentation at University of Twente, The Netherlands
Internet Networking recitation #10
Internet2 Spring Member meeting
Presented at the 4th DataGrid Conference
High-Performance Data Transport for Grid Applications
Review of Internet Protocols Transport Layer
Presentation transcript:

The DataTAG Project UCSD/La Jolla, USA Olivier H. Martin / CERN CHEP’2003 Conference 25 March 2003, UCSD/La Jolla, USA Olivier H. Martin / CERN http://www.datatag.org

Funding agencies Cooperating Networks 28-févr.-19 The DataTAG Project

EU collaborators Brunel University CERN CLRC CNAF DANTE INFN INRIA NIKHEF PPARC UvA University of Manchester University of Padova University of Milano University of Torino UCL 28-févr.-19 The DataTAG Project

US collaborators Northwestern University ANL UIC Caltech Fermilab University of Chicago University of Michigan SLAC Starlight ANL Caltech Fermilab FSU Globus Indiana Wisconsin 28-févr.-19 The DataTAG Project

Project information & goals Two years project started on 1/1/2002 Following successful 1st year review, extension until 1Q04 is likely 3.9 MEUROs 50% Circuit cost, hardware Manpower Grid related network research High Performance Transport protocols Inter-domain QoS Advance bandwidth reservation Interoperability between European and US Grids 28-févr.-19 The DataTAG Project

Workplan WP1: WP2: WP3 WP4 WP5 & WP6 Establishment of a high performance intercontinental Grid testbed (CERN) WP2: High performance networking (PPARC) WP3 Bulk data transfer validations and application performance monitoring (UvA) WP4 Interoperability between Grid domains (INFN) WP5 & WP6 Dissemination and project management (CERN) 28-févr.-19 The DataTAG Project

Interoperability Framework EU Part US Part GriPhyN PPDG iVDGL DataTAG-WP4 iVDGL HICB GLUE DataGRID Griphyn/PPDG HEP experiments and LCG LCG middleware selection 28-févr.-19 The DataTAG Project

DataTAG testbed status

Evolution of the testbed 2.5G circuit in operation since August 20, 2002 On request from the partners, the testbed evolved from a simple layer3 testbed into an extremely rich, most probably unique, multi-vendor layer2 & layer 3 testbed Alcatel, Cisco, Juniper Direct extensions to Amsterdam (UvA)/Surfnet (10G) & Lyon (INRIA)/VTHD (2.5G) VPN layer 2 extension to INFN/CNAF over GEANT & GARR using Juniper’s MPLS In order to guarantee exclusive access to the testbed a reservation application has been developed Proved to be essential 28-févr.-19 The DataTAG Project

Major 2.5/10 Gbps circuits between Europe & USA DataTAG connectivity NewYork IT GARR-B Abilene UK SuperJANET4 3*2.5G VPN Layer 2 STAR-LIGHT ESNET GEANT CERN 2.5G --> 10G 10G MREN NL SURFnet STAR-TAP FR INRIA ATRIUM/VTHD Major 2.5/10 Gbps circuits between Europe & USA 28-févr.-19 The DataTAG Project

Multi vendor layer 2/3 testbed INFN (Bologna) STARLIGHT (Chicago) CERN (Geneva) Abilene Canarie ESnet INRIA (Lyon) GEANT Surfnet 2.5Gbps 10Gbps 10Gbps Juniper Juniper Wave triangle Extreme Summit5i M M Alcatel 2.5Gbps Alcatel GBE GBE Cisco Cisco M=A1670 (Layer 2over SDH Mux) 28-févr.-19 The DataTAG Project

Phase I (iGRID2002) Layer2 28-févr.-19 The DataTAG Project

Phase II Generic Layer 3 configuration (Oct. 2002 – Feb. 2003) CERN StarLight Servers Servers GigE switch GigE switch 2.5Gbps C7606 C7606 28-févr.-19 The DataTAG Project

Phase III Layer2/3 (March 2003) INRIA Layer 2 VTHD Layer 1 Routers Servers GigE switch A1670 Multiplexer GigE switch A7770 2.5G 2*GigE C7606 To STARLIGHT 8*GigE CERN J-M10 C-ONS15454 10G UvA GEANT From CERN Servers Ditto Abilene 2.5G GARR ESNet Canarie 28-févr.-19 The DataTAG Project STARLIGHT INFN/CNAF

Main achievements GLUE Interoperability effort with DataGrid, iVDGL & Globus GLUE testbed & demos VOMS design and implementation in collaboration with DataGrid VOMS evaluation within iVDGL underway Integration of GLUE compliant components in DataGrid and VDT middleware Internet landspeed records have been beaten one after the other by DataTAG project members and/or teams closely associated with DataTAG: Atlas Canada lightpath experiment (iGRID2002) New Internet2 landspeed record (I2 LSR) by Nikhef/Caltech team (SC2002) Scalable TCP, HSTCP, GridDT & FAST experiments (DataTAG partners & Caltech) Intel 10GigE tests between CERN (Geneva) and SLAC (Sunnyvale) – (Caltech, CERN, Los Alamos NL, SLAC) 2.38Gbps sustained rate, single TCP/IP flow, 1TB in one hour (S. Ravot/Caltech) 28-févr.-19 The DataTAG Project

10GigE Data Transfer Trial On Feb. 27-28, a terabyte of data was transferred in 3700 seconds by S. Ravot of Caltech between the Level3 PoP in Sunnyvale near SLAC and CERN through the TeraGrid router at StarLight from memory to memory with a single TCP/IP stream. This achievement translates to an average rate of 2.38 Gbps (using large windows and 9kB “jumbo frames”). This beat the former record by a factor of ~2.5 and used the US-CERN link at 99% efficiency. European Commission

PFLDnet workshop (CERN – Feb 3-4) 1st workshop on protocols for fast long distance networks Co-organized by Caltech & DataTAG Sponsored by Cisco 65 attendees Most key actors were present e.g. S. Floyd, T. Kelly, S. Low Headlines: High Speed TCP (HSTCP), Limited Slow start Quickstart, XCP, Tsunami (UDP) Grid DT, Scalable TCP, FAST (Fast AQM (Active Queue Management)) 28-févr.-19 The DataTAG Project

Main TCP issues Does not scale to some environments High speed, high latency Noisy Unfair behaviour with respect to: RTT MSS Bandwidth Widespread use of multiple streams in order to compensate for inherent TCP/IP unfairness (e.g. Gridftp, BBftp): Bandage rather than a cure New TCP/IP proposals in order to restore performance in single stream environments 28-févr.-19 The DataTAG Project

TCP dynamics (10Gbps, 100ms RTT, 1500Bytes packets) Window size (W) = Bandwidth*Round Trip Time Wbits = 10Gbps*100ms = 1Gb Wpackets = 1Gb/(8*1500) = 83333 packets Standard Additive Increase Multiplicative Decrease (AIMD) mechanisms: W=W/2 (halving the congestion window on loss event) W=W + 1 (increasing congestion window by one packet every RTT) Time to recover from W/2 to W (congestion avoidance) at 1 packet per RTT: RTT*Wp/2 = 1.157 hour In practice, 1 packet per 2 RTT because of delayed acks, i.e. 2.31 hour Packets per second: RTT*Wpackets = 833’333 packets 28-févr.-19 The DataTAG Project

Maximum throughput with standard Window sizes as a function of the RTT Window(KB) 16 32 64 RTTms 25 640K 1.28M 2.56MB/s 50 320K 640K 1.28MB/s 100 160K 320K 640KB/s The best throughput one can hope for, on a standard intra-European path with 50ms RTT, is only about 10Mb/s! 28-févr.-19 The DataTAG Project

HSTCP (IETF Draft August 2002) Modifying TCP’s response function in order to allow high performance in high-speed environments and in the presence of packet losses Target 10Gbps performance in 100ms Round Trip Times (RTT) environments Acceptable fairness when competing with standard TCP in environments with packet loss rates of 10-4 or 10-5. Wmss = 1.2/sqrt(p) Equivalent to W/1.5 RTTs between losses 28-févr.-19 The DataTAG Project

HSTCP Response Function (Additive Increase – HSTP vs standard TCP) Packet Avg. Congestion RTTs between Drop Rate Window Losses 10-2 12 8 10-3 38 25 10-4 263(120) 38(80) 10-5 1795(379) 57(252) 10-6 12279(1200) 83(800) 10-7 83981(3795) 123(2530) …… 10-10 26864653(120000) 388(80000) 28-févr.-19 The DataTAG Project

Relative Fairness (HSTCP vs standard TCP) Packet Rel. Agg. Agg. Drop Rate Fairness Window Bandwidth 10-2 1 24 2.8 Mb 10-3 1 76 9.1 Mb 10-4 2.2 383 45.9 Mb 10-5 4.7 2174 260.8 Mb 10-6 10.2 13479 1.6 Gb 10-7 22.1 87776 10.5 Gb …… 10-10 223.9 26984653 3238.1 Gb N.B. Aggregate bandwidth used by one standard TCP plus one HSTCP connections with 100ms RTT and 1500Bytes MSS 28-févr.-19 The DataTAG Project

Limited Slow-Start (IETF Draft August 2002) Current « slow-start » procedure can result in increasing the congestion window by thousands of packets in a single RTT Massive packet losses Counter-productive Limited slow-start introduces a new parameter max_ssthresh in order to limit the increase of the congestion window. max_ssthresh < cwnd < ssthresh Recommended value 100 MSS K = int (cwnd/(0.5*max_ssthresh)) When cwnd > max_ssthresh Cwnd += int(MSS/K) for each receivingACK instead of Cwnd += MSS This ensures that cwnd is increased by at most max_ssthresh/2 per RTT, i.e. ½MSS when cwnd=max_ssthresh, 1/3MSS when cwnd=1.5*max_ssthresh, etc 28-févr.-19 The DataTAG Project

Limited Slow-Start (cont.) With limited slow-start it takes: Log(max_ssthresh) RTTs to reach the condition where cwnd = max_ssthresh Log(max_ssthresh) + (cwnd – max_ssthresh)/(max_sstresh/2) RTTs to reach a congestion window of cwnd when cwnd > max_ssthresh Thus with max_ssthresh = 100 MSS It would take 836 RTT to reach a congestion window of 83000 packets Compared to 16 RTT otherwise (assuming NO packet drops) Transient queue limited to 100 packets against 32000 packets otherwise! Limited slow-start could be used in conjunction with rate based pacing 28-févr.-19 The DataTAG Project

Slow-start vs Limited Slow-start 100000 ssthresh (83333) 10000 Congestion window size (MSS) 1000 max-ssthresh (100) 100 10Gbps bandwidth! (RTT=100msec, MSS=1500B) 16000 16 160 1600 Time (RTT) 28-févr.-19 The DataTAG Project

Scalable TCP (Tom Kelly – Cambridge) The responsiveness of traditional TCP connection to loss events is proportional to: window size and round trip time. With Scalable TCP the responsiveness is proportional to the round trip time only. Scalable TCP alters the congestion window, cwnd, on each acknowledgement in a RTT without loss by cwnd -> cwnd + 0:02 and for each window experiencing loss, cwnd is reduced by, cwnd -> cwnd – 0.125*cwnd 28-févr.-19 The DataTAG Project

Scalable TCP (2) As a result, the responsiveness of a connection with 200ms RTT is changed as follows: Standard TCP connection: packet loss recovery time is nearly 3 minutes at 100 Mbit/s and 28 minutes at 1 Gbit/s Scalable TCP: packet loss recovery time is about 2.7s at any rate. Scalable TCP has been implemented on a Linux 2.4.19 kernel. Gigabit kernel modifications, remove the copying of small packets in the SysKonnect driver and the scale device driver decoupling buffers to reflect Gigabit Ethernet devices. Initial results on performance suggest that the variant is capable of providing high speed in a robust manner using only sender side modifications. Up to 400% improvement over standard Linux 2.4.19 28-févr.-19 The DataTAG Project

Standard TCP/IP (recovery time vs window size) Scalable TCP (3) Scalable TCP/IP Standard TCP/IP (recovery time vs window size) 28-févr.-19 The DataTAG Project

QuickStart Initial assumption is that routers have the ability to determine whether the destination link is significantly under-utilized Similar capabilities also assumed for Active Queue Management (AQM) and Early Congestion Notifications (ECN) techniques Coarse grain mechanism only focusing on initial window size Incremental deployment New IP & TCP options QS request (IP) & QS response (TCP) Initial Window size = Rate*RTT*MSS 28-févr.-19 The DataTAG Project

QuickStart (cont.) SYN/SYN-ACK IP packets New IP option Quick Start Request (QSR) Two TTL (Time To Live) IP & QSR Sending rate expressed in packet rates per 100ms Therefore maximum rate is 2560 packets/seconds Rate based pacing assumed Non-participating router ignores QSR option Therefore does not decrease QSR TTL Participating router Delete QSR option or reset initial sending rate Accept or reduce initial rate 28-févr.-19 The DataTAG Project

XCP Congestion Window Set by Bottleneck Router 28-Feb-19 The DataTAG Project

FAST Intellectual advances (S. Low/Caltech) New mathematical theory of large scale networks FAST = Fast Active-queue-managed Scalable TCP Innovative implementation: TCP stack in Linux Experimental facilities “High energy physics networks” Caltech and CERN/DataTAG site equipment: Switches, Routers Servers Level(3) SNV-CHI OC192 Link; DataTAG link; Cisco 12406, GbE and 10 GbE port cards donated Abilene, Calren2, … Unique features: Delay (RTT) as congestion measure Feedback loop for resilient window, and stable throughput The DataTAG Projectnetlab.caltech.edu

SCinet Bandwidth Challenge (FAST) SC2002 Baltimore, Nov 2002 Highlights FAST TCP Standard MTU Peak window = 14,100 pkts 940 Mbps single flow/GE card 9.4 petabit-meter/sec 1.9 times LSR 9.4 Gbps with 10 flows 37.0 petabit-meter/sec 6.9 times LSR 16 TB in 6 hours with 7 flows Implementation Sender-side modification Delay based Stabilized Vegas Sunnyvale-Geneva Baltimore-Geneva Baltimore-Sunnyvale SC2002 10 flows SC2002 2 flows I2 LSR 29.3.00 multiple SC2002 1 flow 9.4.02 1 flow 22.8.02 IPv6 Berkeley, June 7, 2002 (Bannister, Walrand, Fisher) IPAM, Arrowhead, June 10, 2002 (Doyle, Low, Willinger) Postech-Caltech Symposium, Pohang, S. Korea, Sept 30, 2002 (Jin Lee) Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct 3, 2002 (Hajek, Srikant) IEEE Computer Communications Workshop (CCW), Santa Fe, NM, Oct 14, 2002 (Kunniyur), presenter: Dawn Lee Center Industry Workshop, Oct 18, 2002, Pasadena (include Cheng Jin’s slides) Internet: distributed feedback system Rf (s) Rb’(s) x p TCP AQM Theory Experiment Sunnyvale Baltimore Chicago Geneva 3000km 1000km 7000km C. Jin, D. Wei, S. Low FAST Team & Partners netlab.caltech.edu/FAST

Grid DT (Sylvain Ravot/Caltech) Set of patches to Linux RedHat allowing to control: Slow start threshold & behaviour AIMD parameters Parameter tuning New parameter to better start a TCP transfer Set the value of the initial SSTHRESH Smaller backoff Reduce the strong penalty imposed by a loss 28-févr.-19 The DataTAG Project

GRID DT (cont) Modifications of the TCP algorithms (RFC 2001) Modification of the well-know congestion avoidance algorithm During congestion avoidance, for every acknowledgement received, cwnd increases by A * (segment size) * (segment size) / cwnd. This is equivalent to increase cwnd by A segments each RTT. A is called additive increment Modification of the slow start algorithm During slow start, for every acknowledgement received, cwnd increases by M segments. M is called multiplicative increment. Note: A=1 and M=1 in TCP RENO. Single-stream with modified backoff policy + different increment allows us to simulate multi-streaming Single stream implementation differs from multi-stream in some important ways: - it is simpler (CPU utilization – try to quantify) - startup and shutdown are faster (performance impact on short transfers– try to quantify) - fewer keys to manage if it is secure. 28-févr.-19 The DataTAG Project

Comments on above proposals Recent Internet history shows that: any modifications to the Internet standards can take years before being: accepted and widely deployed, especially if it involves router modifications, e.g. RED, ECN Therefore, the chances of getting Quickstart or XCP type proposals implemented and widely deployed soon are somewhat limited! Proposals only requiring TCP sender stacks modifications are much easier to deploy. 28-févr.-19 The DataTAG Project

Conclusions TCP/IP performance in long distance high speed networks has been known for very many years. What is new, however, is the widespread availability of 10Gbps A&R backbones as well as the emergence of 10GigE technology. Thus, the awareness that the problem requires quick resolution has been growing rapidly during the last 2 years, hence the flurry of proposals. Hard to predict who will win! 28-févr.-19 The DataTAG Project