1st European Across Grids Conference Santiago de Compostela, Spain

Slides:



Advertisements
Similar presentations
Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Advertisements

The DataTAG Project 25 March, Brussels FP6 Information Day Peter Clarke, University College London.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Cheng Jin David Wei Steven Low FAST TCP: design and experiments.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
TDC365 Spring 2001John Kristoff - DePaul University1 Internetworking Technologies Transmission Control Protocol (TCP)
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
1 Internet Networking Spring 2003 Tutorial 11 Explicit Congestion Notification (RFC 3168) Limited Transmit (RFC 3042)
1 Internet Networking Spring 2003 Tutorial 11 Explicit Congestion Notification (RFC 3168)
FAST TCP in Linux Cheng Jin David Wei
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #8 Explicit Congestion Notification (RFC 3168) Limited Transmit.
Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
Large File Transfer on 20,000 km - Between Korea and Switzerland Yusung Kim, Daewon Kim, Joonbok Lee, Kilnam Chon
UDT: UDP based Data Transfer Yunhong Gu & Robert Grossman Laboratory for Advanced Computing University of Illinois at Chicago.
NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge,
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
23 January 2003Paolo Moroni (Slide 1) SWITCH-cc meeting DataTAG overview.
Internet data transfer record between CERN and California Sylvain Ravot (Caltech) Paolo Moroni (CERN)
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Project Results Thanks to the exceptional cooperation spirit between the European and North American teams involved in the DataTAG project,
Tiziana FerrariThe DataTAG Projct, Roma Nov DataTAG Project.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract The main idea behind the DataTAG project was to strengthen the collaboration.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu GNEW, CERN, March 2004.
The EU DataTAG Project Richard Hughes-Jones Based on Olivier H. Martin GGF3 Frascati, Italy Oct 2001.
DataTAG overview. 3 February 2003Paolo Moroni (Slide 2) APM meeting - Barcelona Summary  Why DataTAG?  DataTAG project  Test-bed extensions  General.
1 FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Internet Networking recitation #9
Chapter 3 outline 3.1 transport-layer services
R. Hughes-Jones Manchester
Networking between China and Europe
Transport Protocols over Circuits/VCs
CERN-USA connectivity update DataTAG project
The DataTAG Project Olivier H. Martin
HighSpeed TCP for Large Congestion Windows
Lecture 19 – TCP Performance
DataTAG Project update
TCP Performance over a 2.5 Gbit/s Transatlantic Circuit
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
Wide Area Networking at SLAC, Feb ‘03
Presented at the GGF3 conference 8th October Frascati, Italy
The EU DataTAG Project Olivier H. Martin CERN - IT Division
So far, On the networking side, we looked at mechanisms to links hosts using direct linked networks and then forming a network of these networks. We introduced.
5th EU DataGrid Conference
Presentation at University of Twente, The Netherlands
Internet Networking recitation #10
If both sources send full windows, we may get congestion collapse
The DataTAG Project UCSD/La Jolla, USA Olivier H. Martin / CERN
TCP Congestion Control
Internet2 Spring Member meeting
Transport Layer: Congestion Control
Presented at the 4th DataGrid Conference
TCP flow and congestion control
High-Performance Data Transport for Grid Applications
Review of Internet Protocols Transport Layer
Presentation transcript:

1st European Across Grids Conference Santiago de Compostela, Spain The DataTAG Project 1st European Across Grids Conference Santiago de Compostela, Spain Olivier H. Martin / CERN http://www.datatag.org

Presentation outline Partners: Goals Status & perspectives Funding agencies & cooperating networks EU & US partners Goals Networking Testbed Grid Interoperability Early results Status & perspectives Major Grid networking issues PFLDnet workshop headlines 9/20/2018 The DataTAG Project

Funding agencies Cooperating Networks

EU contributors

US contributors

Main Goals End to end Gigabit Ethernet performance using innovative high performance transport protocol stacks Assess, develop & demonstrate inter-domain QoS and bandwidth reservation techniques. Interoperability between major GRID projects middleware/testbeds in Europe and North America. DataGrid, CrossGrid, possibly other EU funded Grid projects PPDG, GriPhyN, iVDGL, Teragrid (USA) LCG (LHC Computing Grid) 9/20/2018 The DataTAG Project

In a nutshell Grid related network research (WP2, WP3): 2.5 Gbps transatlantic lambda between CERN (Geneva) and StarLight (Chicago) (WP1) Dedicated to research (no production traffic) Very unique multi-vendor testbed with layer 2 and layer 3 capabilities in effect, a distributed transatlantic Internet Exchange Point. Interoperability between European and US Grids (WP4) Middleware integration and coexistence - Applications GLUE integration/standardization GLUE testbed and demo 9/20/2018 The DataTAG Project

Multi-vendor testbed with layer 2 & layer 3 capabilities INFN (Bologna) STARLIGHT (Chicago) CERN (Geneva) Abilene Canarie INRIA (Lyon) ESnet GEANT Surfnet 2.5Gbps Juniper Juniper Research2.5Gbps Cisco 6509 M M Alcatel Alcatel GBE GBE Cisco Cisco M= Layer 2 Mux 9/20/2018 The DataTAG Project

Phase I - Generic configuration (August 2002) CERN Servers StarLight Servers GigE switch GigE switch 2.5Gbps C7606 C7606 9/20/2018 The DataTAG Project

Phase II (March 2003) GEANT VTHD Routers Servers GigE switch A1670 Multiplexer GigE switch A7770 C7606 n*GigE CERN J-M10 C-ONS15454 Amsterdam GEANT STARLIGHT Servers Ditto Abilene ESNet Canarie 9/20/2018 The DataTAG Project

Phase III (September 2003, tentative) VTHD Routers Servers GigE switch Multi-Service Multiplexer GigE switch A7770 n*GigE/10GigE 10Gbps C7606 n*2.5Gbps) CERN J-M10 C7609 C-ONS15454 Amsterdam GEANT STARLIGHT Servers Ditto Abilene ESNet Canarie 9/20/2018 The DataTAG Project

Major 2.5/10 Gbps circuits between Europe & USA DataTAG connectivity NewYork Abilene UK SuperJANET4 3*2.5G IT GARR-B STAR-LIGHT ESNET GEANT CERN 2.5G 10G MREN NL SURFnet STAR-TAP FR INRIA ATRIUM/VTHD 9/20/2018 Major 2.5/10 Gbps circuits between Europe & USA The DataTAG Project

DataTAG Network map DataTAG Chicago Geneva CERN External Network R06chi-Alcatel7770 R06gva-Alcatel7770 W01chi w02chi w03chi w04chi w05chi w06chi ONS15454 V10chi v11chi v12chi v13chi W01gva w02gva w05gva w06gva ONS15454 SURFNET W03gva w04gva SURFNET Stm16(GC) 4x1GE 2x1GE 8x1GE 1GE CANARIE 2x1GE 2x1GE Alcatel 1670 Alcatel 1670 ONS15454 2x1GE VTHD/INRIA 10x1GE 4x1GE 2x1GE 2x1GE 1GE Stm16 (FranceTelecom) 1GE 2x1GE 1GE 1GE R05gva-JuniperM10 Extreme Summit5i Stm16(DTag) R05chi-JuniperM10 Extreme Summit1i R04chi-Cisco7609 SUNNYVALE Stm64(L3) 10GE R04gva-Cisco7606 1GE ABILENE 2x1GE Cisco5505-management DataTAG 10GE Teragrid JuniperT640 1GE Chicago Geneva CERN External Network 1GE Vlan4 Vlan5 SWITCH 1GE Vlan7 1GE Stm4(DTag) 1GE Stm16(Swisscom) Cisco2950-management Cernh4-Cisco7609 Cernh7-Cisco7609 GEANT ar3-chicago -Cisco7606 Stm16(Colt) backup+projects 9/20/2018 The DataTAG Project GARR/CNAF edoardo.martelli@cern.ch - last update: 20021204

DataTAG/WP4 framework and relationships HEP applications, Other experiments Integration GriPhyN PPDG iVDGL HICB/HIJTB Interoperability standardization GLUE 9/20/2018 The DataTAG Project

Status of GLUE activities in DataTAG Resource Discovery and GLUE schema Authorization services from VO LDAP to VOMS Common software deployment procedures IST2002 and SC2002 joint EU-US demos Interoperability : between Grid domains for core grid services, coexistence of optimization/collective services Data Movement and Description Job Submission Services 9/20/2018 The DataTAG Project

Some early results: Atlas Canada Lightpath Data Transfer Trial A Terabyte of research data was recently transferred between Vancouver and CERN from disk-to-disk at close to Gbps rates This is equivalent to transferring a full CD in less than 8 seconds (or a full length DVD movie in less than 1 minute) How much data is a Terabyte? Equivalent to the amount of data on approximately 1500 CDs (680M) or 200 full length DVD movies Corrie Kost, Steve McDonald (TRIUMF) Bryan Caron (Alberta), Wade Hong (Carleton) 9/20/2018 The DataTAG Project

Extreme Networks TRIUMF CERN 9/20/2018 The DataTAG Project

Comparative Results Tool Transferred Average Max Avg wuftp 100 MbE 3.4 Mbps wuftp 10 GbE 6442 MB 71 Mbps iperf 275 MB 940 Mbps 1136 Mbps pftp 600MB 532 Mbps bbftp (10 streams) 1.4 TB 666 Mbps 710 Mbps Tsunami - disk to disk 0.5 TB 700 Mbps 825 Mbps Tusnami - disk to memory 12 GB > 1GBps 9/20/2018 The DataTAG Project

Project status Great deal of expertise on high speed transport protocols is now available through DataTAG We plan to make more active dissemination in 2003 to share our experiences with the Grid community at large The DataTAG testbed is open to other EU Grid projects In order to guarantee excluse access to the testbed a reservation application has been developed Proved to be essential As the access requirements to the testbed are evolving (e.g. access to GEANT, INRIA) and as the testbed itself is changing (e.g. inclusion of additional layer 2 services) Additional features will need to be provided 9/20/2018 The DataTAG Project

Major Grid networking issues (1) QoS (Quality of Service) Still largely unresolved on a wide scale because of complexity of deployment Non elevated services like “Scavenger/LBE” (lower than best effort) or Alternate Best Effort (ABE) are very fashionable! End to end performance in the presence of firewalls There is (will always be) a lack of high performance firewalls, can we rely on products becoming available or should a new architecture be evolved? Full ransparency Evolution of LAN infrastructure to 1Gbps then 10Gbps Uniform end to end performance (LAN/WAN) 9/20/2018 The DataTAG Project

CERN’s new firewall architecture Gbit Ethernet Regular flow HTAR (High Throughput Access Route) CERNH2 (Cisco OSR 7603) 1/10 Gbit Ethernet FastEthernet Cisco PIX FastEthernet Fast Ethernet Cabletron SSR Security monitor 9/20/2018 The DataTAG Project Gbit Ethernet

Major Grid networking issues (2) TCP/IP performance over high bandwidth, long distance networks The loss of a single packet will affect a 10Gbps stream with 100ms RTT (round trip time) for 1.16 hours. During that time the average throughput will be 7.5 Gbps. On the 2.5Gbps DataTAG circuit with 100ms RTT, this translates to 38 minutes recovery time, during that time the average throughput will be 1.875Gbps. Link error & loss events rates A 2.5 Gbps circuit can absorb 0.2 Million 1500 Bytes packets/second Bit error rates of 10E-9 means one packet loss every 250 milliseconds Bit error rates of 10E-11 means one packet loss every 25 seconds 9/20/2018 The DataTAG Project

TCP/IP Responsiveness (I) Courtesy S. Ravot (Caltech) The responsiveness measures how quickly we go back to using the network link at full capacity after experiencing a loss if we assume that the congestion window size is equal to the Bandwidth Delay product when the packet is lost. C : Capacity of the link C . RTT 2 r = 2 . MSS 9/20/2018 The DataTAG Project

TCP/IP Responsiveness (II) Courtesy S. Ravot (Caltech) Case C RTT (ms) MSS (Byte) Responsiveness Typical LAN in 1988 10 Mb/s [ 2 ; 20 ] 1460 [ 1.7 ms ; 171 ms ] Typical LAN today 1 Gb/s 2 (worst case) 96 ms Future LAN 10 Gb/s 1.7s WAN Geneva <-> Chicago 120 10 min WAN Geneva <-> Sunnyvale 180 23 min WAN Geneva <-> Tokyo 300 1 h 04 min 2.5 Gb/s 58 min Future WAN CERN <-> Starlight 1 h 32 min Future WAN link CERN <-> Starlight 8960 (Jumbo Frame) 15 min The Linux kernel 2.4.x implements delayed acknowledgment. Due to delayed acknowledgments, the responsiveness is multiplied by two. Therefore, values above have to be multiplied by two! 9/20/2018 The DataTAG Project

Maximum throughput with standard Window sizes as a function of the RTT W(KB) 16 32 64 RTTms 25 640K 1.28M 2.56MB/s 50 320K 640K 1.28MB/s 100 160K 320K 640KB/s N.B. The best throughput one can hope for, on a standard intra-European path with 50ms RTT, is only about 10Mb/s! 9/20/2018 The DataTAG Project

Considerations on WAN & LAN For many years the Wide Area Network has been the bottlemeck, hence the common belief that if that bottleneck was to disappear, a global transparent Grid could be easily deployed! Unfortunately, in real environments good end to end performance, e.g. Gigabit Ethernet, is somewhat easier to achieve when the bottleneck link is in the WAN rather than in the LAN. E.g. 1GigE over 622M rather than 1GigE over 2.5G 9/20/2018 The DataTAG Project

Considerations on WAN & LAN (cont) The dream of abundant bandwith has now become a reality in large, but not all, parts of the world! Challenge shifted from getting adequate bandwidth to deploying adequate LANs and security infrastructure as well as making effective use of it! Major transport protocol issues still need to be resolved, however there are very encouraging signs that practical solutions may now be in sight (see PFLDnet summary). 9/20/2018 The DataTAG Project

PFLDnet workshop (CERN – Feb 3-4) 1st workshop on protocols for fast long distance networks Co-organized by Caltech & DataTAG Sponsored by Cisco Most key actors were present e.g. S. Floyd, T. Kelly, S. Low Headlines: High Speed TCP (HSTCP), Limited Slow start Quickstart, XCP, Tsunami GridDT, Scalable TCP, FAST (Fast AQM (Active Queue Management) 9/20/2018 The DataTAG Project

TCP dynamics (10Gbps, 100ms RTT, 1500Bytes packets) Window size (W) = Bandwidth*Round Trip Time Wbits = 10Gbps*100ms = 1Gb Wpackets = 1Gb/(8*1500) = 83333 packets Standard Additive Increase Multiplicative Decrease (AIMD) mechanisms: W=W/2 (halving the congestion window on loss event) W=W + 1 (increasing congestion window by one packet every RTT) Time to recover from W/2 to W (congestion avoidance) at 1 packet per RTT: RTT*Wp/2 = 1.157 hour In practice, 1 packet per 2 RTT because of delayed acks, i.e. 2.31 hour Packets per second: RTT*Wpackets = 833’333 packets 9/20/2018 The DataTAG Project

HSTCP (IETF Draft August 2002) Modifying TCP’s response function in order to allow high performance in high-speed environments and in the presence of packet losses Target 10Gbps performance in 100ms Round Trip Times (RTT) environments Acceptable fairness when competing with standard TCP in environments with packet loss rates of 10-4 or 10-5. Wmss = 1.2/sqrt(p) Equivalent to W/1.5 RTT between losses 9/20/2018 The DataTAG Project

HSTCP Response Function (Additive Increase HSTCP vs standard TCP) Packet Congestion RTTs between Drop Rate Window Losses 10-2 12 8 10-3 38 25 10-4 120(263) 80(38) 10-5 379(1795) 252(57) 10-6 1200(12279) 800(83) 10-7 3795(83981) 2530(123) …… 10-10 120000(26864653) 80000(388) 9/20/2018 The DataTAG Project

Limited Slow-Start (IETF Draft August 2002) Current « slow-start » procedure can result in increasing the congestion window by thousands of packets in a single RTT Massive packet losses Counter-productive Limited slow-start introduces a new parameter max_ssthresh in order to limit the increase of the congestion window. max_ssthresh < cwnd < ssthresh Recommended value 100 MSS K = int (cwnd/(0.5*max_ssthresh)) When cwnd > max_ssthresh Cwnd += int(MSS/K) for each receivingACK instead of Cwnd += MSS This ensures that cwnd is increased by at most max_ssthresh/2 per RTT, i.e. ½MSS when cwnd=max_ssthresh, 1/3MSS when cwnd=1.5*max_ssthresh, etc 9/20/2018 The DataTAG Project

Limited Slow-Start (cont.) With limited slow-start it takes: Log(max_ssthresh) RTTs to reach the condition where cwnd = max_ssthresh Log(max_ssthresh) + (cwnd – max_ssthresh)/(max_sstresh/2) RTTs to reach a congestion window of cwnd when cwnd > max_ssthresh Thus with max_ssthresh = 100 MSS It would take 836 RTT to reach a congestion window of 83000 packets Compared to 16 RTT otherwise (assuming NO packet drops) Transient queue limited to 100 packets against 32000 packets otherwise! Limited slow-start could be used in conjunction with rate based pacing 9/20/2018 The DataTAG Project

Slow-start vs Limited Slow-start 100000 ssthresh (83333) 10000 Congestion window size (MSS) 1000 max-ssthresh (100) 100 10Gbps bandwidth! (RTT=100msec, MSS=1500B) 16000 16 160 1600 Time (RTT) 9/20/2018 The DataTAG Project

QuickStart Initial assumption is that routers have the ability to determine whether the destination link is significantly under-utilized Similar capabilities also assumed for Active Queue Management (AQM) and Early Congestion Notifications (ECN) techniques Coarse grain mechanism only focusing on initial window size Incremental deployment New IP & TCP options QS request (IP) & QS response (TCP) Initial Window size = Rate*RTT*MSS 9/20/2018 The DataTAG Project

QuickStart (cont.) SYN/SYN-ACK IP packets New IP option Quick Start Request (QSR) Two TTL (Time To Live) IP & QSR Sending rate expressed in packet rates per 100ms Therefore maximum rate is 2560 packets/seconds Rate based pacing assumed Non-participating router ignores QSR option Therefore does not decrease QSR TTL Participating router Delete QSR option or reset initial sending rate Accept or reduce initial rate 9/20/2018 The DataTAG Project

Scalable TCP (Tom Kelly – Cambridge) The Scalable TCP algorithm modifies the characteristic AIMD behaviour of TCP for the conditions found with high bandwidth delay links. This work differs from the High Speed TCP proposal by using a fixed adjustment for both the increase and the decrease of the congestion window. Scalable TCP alters the congestion window, cwnd, on each acknowledgement in a RTT without loss by cwnd -> cwnd + 0:02 and for each window experiencing loss, cwnd is reduced by, cwnd -> cwnd – 0.125*cwnd 9/20/2018 The DataTAG Project

Scalable TCP (2) The responsiveness of traditional TCP connection to loss events is proportional to: the connection’s window size and round trip time. With Scalable TCP the responsiveness is proportional to the round trip time only. this invariance to link bandwidth allows Scalable TCP to outperform traditional TCP in high speed wide area networks. For example, the responsiveness of a connection with round trip time 200ms; for a traditional TCP connection it is nearly 3 minutes at 100 Mbit/s and 28 minutes at 1 Gbit/s while a connection using Scalable TCP will have a packet loss recovery time about 2.7s at any rate. 9/20/2018 The DataTAG Project

Scalable TCP status Scalable TCP has been implemented on a Linux 2.4.19 kernel. The implementation went through various performance debugging iterations primarily relating to kernel internal network buffers and the SysKonnect driver. These alterations, termed the gigabit kernel modifications, remove the copying of small packets in the SysKonnect driver and the scale device driver decoupling buffers to reflect Gigabit Ethernet devices. Initial results on performance suggest that the variant is capable of providing high speed in a robust manner using only sender side modifications. Up to 400% improvement over standard Linux 2.4.19 It is also intended to improve the code performance to lower CPU utilisation where, for example, currently a transfer rate of 1 Gbit/s uses 50% of a dual 2.2Ghz Xeon including the user (non-kernel) copy. 9/20/2018 The DataTAG Project

Grid DT (Sylvain Ravot/Caltech) Similar to MulTCP i.e. aggregate N virtual TCP connections on a single connection Avoid the brute force approach of opening N parallel connections a la GridFTP or BBFTP Set of patches to Linux RedHat allowing to control: Slow start threshold & behaviour AIMD parameters 9/20/2018 The DataTAG Project

Linux Patch “GRID DT” Parameter tuning New parameter to better start a TCP transfer Set the value of the initial SSTHRESH Modifications of the TCP algorithms (RFC 2001) Modification of the well-know congestion avoidance algorithm During congestion avoidance, for every acknowledgement received, cwnd increases by A * (segment size) * (segment size) / cwnd. This is equivalent to increase cwnd by A segments each RTT. A is called additive increment Modification of the slow start algorithm During slow start, for every acknowledgement received, cwnd increases by M segments. M is called multiplicative increment. Note: A=1 and M=1 in TCP RENO. Smaller backoff Reduce the strong penalty imposed by a loss Single-stream with modified backoff policy + different increment allows us to simulate multi-streaming Single stream implementation differs from multi-stream in some important ways: - it is simpler (CPU utilization – try to quantify) - startup and shutdown are faster (performance impact on short transfers– try to quantify) - fewer keys to manage if it is secure. 9/20/2018 The DataTAG Project

Grid DT Very simple modifications to the TCP/IP stack Alternative to Multi-streams TCP transfers Single stream vs Multi streams it is simpler startup/shutdown are faster fewer keys to manage (if it is secure) Virtual increase of the MTU. Compensate the effect of delayed ack Can improve “fairness” between flows with different RTT between flows with different MTU 9/20/2018 The DataTAG Project

Comments on above proposals Recent Internet history shows that: any modifications to the Internet standards can take years before being: accepted and widely deployed, especially if it involves router modifications, e.g. RED, ECN Therefore, the chances of getting Quickstart or XCP type proposals implemented in commercial routers soon are somewhat limited! Modifications to TCP stacks are more promising and much easier to deploy incrementally when: Only the TCP stack of the sender is affected Is Active network technology a possible solution to the help solve de-facto freeze of the Internet? 9/20/2018 The DataTAG Project

Additional slides Tsunami (S. Wallace/Indiana Uni) Grid DT (S. Ravot/Caltech) FAST (S. Low/Caltech) 9/20/2018 The DataTAG Project

The Tsunami Protocol (S. Wallace/University of Indiana) Developed specifically to address extremely high-performance batch file transfer over global-scale WANs. Transport is UDP using 32K datagrams/blocks superimposed over standard 1500-byte Ethernet packets. No sliding window (a-la TCP), each missed/dropped block is re-requested autonomously (similar to smart ACK) Very limited congestion avoidance compared to TCP. Loss behavior is similar to Ethernet collision behavior, not TCP congestion avoidance. 9/20/2018 The DataTAG Project

Tsunami Protocol UDP Data Flow … 9 4 3 Data Type Seq Data Type Seq Server (retransmit request) (shutdown request) Client 5 6 7 8 TCP Control Flow 9/20/2018 The DataTAG Project

Effect of the MTU on the responsiveness Effect of the MTU on a transfer between CERN and Starlight (RTT=117 ms, bandwidth=1 Gb/s) Larger MTU improves the TCP responsiveness because you increase your cwnd by one MSS each RTT. Couldn’t reach wire-speed with standard MTU Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets) The DataTAG Project

TCP: Carries >90% Internet traffic Prevents congestion collapse Not scalable to ultrascale network Equilibrium and stability problems Ns2 Simulation The DataTAG Projectnetlab.caltech.edu

Intellectual advances New mathematical theory of large scale networks FAST = Fast Active-queue-managed Scalable TCP Innovative implementation: TCP stack in Linux Experimental facilities “High energy physics networks” Caltech and CERN/DataTAG site equipment: Switches, Routers Servers Level(3) SNV-CHI OC192 Link; DataTAG link; Cisco 12406, GbE and 10 GbE port cards donated Abilene, Calren2, … Unique features: Delay (RTT) as congestion measure Feedback loop for resilient window, and stable throughput The DataTAG Projectnetlab.caltech.edu

SCinet Bandwidth Challenge SC2002 Baltimore, Nov 2002 Highlights FAST TCP Standard MTU Peak window = 14,100 pkts 940 Mbps single flow/GE card 9.4 petabit-meter/sec 1.9 times LSR 9.4 Gbps with 10 flows 37.0 petabit-meter/sec 6.9 times LSR 16 TB in 6 hours with 7 flows Implementation Sender-side modification Delay based Stabilized Vegas Sunnyvale-Geneva Baltimore-Geneva Baltimore-Sunnyvale SC2002 10 flows SC2002 2 flows I2 LSR 29.3.00 multiple SC2002 1 flow 9.4.02 1 flow 22.8.02 IPv6 Berkeley, June 7, 2002 (Bannister, Walrand, Fisher) IPAM, Arrowhead, June 10, 2002 (Doyle, Low, Willinger) Postech-Caltech Symposium, Pohang, S. Korea, Sept 30, 2002 (Jin Lee) Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct 3, 2002 (Hajek, Srikant) IEEE Computer Communications Workshop (CCW), Santa Fe, NM, Oct 14, 2002 (Kunniyur), presenter: Dawn Lee Center Industry Workshop, Oct 18, 2002, Pasadena (include Cheng Jin’s slides) Internet: distributed feedback system Rf (s) Rb’(s) x p TCP AQM Theory Experiment Sunnyvale Baltimore Chicago Geneva 3000km 1000km 7000km 9/20/2018 The DataTAG Project C. Jin, D. Wei, S. Low FAST Team & Partners netlab.caltech.edu/FAST

Stability: from CIT and SLAC Booths to Starlight+SNV; and to Abilene 10 9 Power glitch Reboot Rapid recovery after disaster 8 7 BW Gbps 6 5 4 14 Streams: 11.5 Gbps 10 Streams (~9 Gbps) to Starlight & SNV, With 150 Mbps of Acks; 4 Streams to Abilene 3 2 1 100 200 300 400 500 600 700 800 900 Secs The DataTAG Projectnetlab.caltech.edu

SCinet Bandwidth Challenge SC2002 Baltimore, Nov 2002 Bmps pbm/s Thruput Mbps Duratn sec 10 flows 37.0 9,400 300 1 flow 9.42 940 1,152 Multiple 5.38 1,020 82 4.93 402 13 IPv6 0.03 8 3,600 FAST: 7 flows Statistics Data: 2.857 TB Distance: 3,936 km Delay: 85 ms Average Duration: 60 mins Thruput: 6.35 Gbps Bmps: 24.99 petab-m/s Peak Duration: 3.0 mins Thruput: 6.58 Gbps Bmps: 25.90 petab-m/s cwnd = 6,658 pkts per flow I2 LSR FAST 17 Nov 2002 Sun Network SC2002 (Baltimore)  SLAC (Sunnyvale) GE, Standard MTU, cwnd = 6,658 pkts netlab.caltech.edu/FAST Acknowledgments Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, S. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, L. Cottrell, C. Logg, W. Matthews, R. Mount, J. Navratil StarLight: T. deFanti, L. Winkler Major sponsors/partners ARO, CACR, Cisco, DataTAG, DoE, Lee Center, Level 3, NSF FAST: 1 flows Statistics Data: 273 GB Distance: 10,025 km Delay: 180 ms Average Duration: 43 mins Thruput: 847 Mbps Bmps: 8.49 petab-m/s Peak Duration: 19.2 mins Thruput: 940 Mbps Bmps: 9.42 petab-m/s cwnd = 14,100 pkts Berkeley, June 7, 2002 (Bannister, Walrand, Fisher) IPAM, Arrowhead, June 10, 2002 (Doyle, Low, Willinger) Postech-Caltech Symposium, Pohang, S. Korea, Sept 30, 2002 (Jin Lee) Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, Oct 3, 2002 (Hajek, Srikant) IEEE Computer Communications Workshop (CCW), Santa Fe, NM, Oct 14, 2002 (Kunniyur), presenter: Dawn Lee Center Industry Workshop, Oct 18, 2002, Pasadena (include Cheng Jin’s slides) 17 Nov 2002 Sun Network CERN (Geneva)  SLAC (Sunnyvale) GE, Standard MTU, cwnd = 14,100 pkts 9/20/2018 The DataTAG Project

FAST Summary (Les Cottrell/SLAC) Our team achieved outstanding success with the Level(3) circuit from Sunnyvale to Chicago/StarLight (for a blow by blow account see: http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2002/hiperf.htm With a new experimental TCP stack being developed by Steven Low & his group at Caltech (see http://netlab.caltech.edu/) we demonstrated over 950Mbits/s from SC2002 to Sunnyvale (~ 3000km) TCP traffic with standard packet sizes (MSS 1460 bytes). Using 12 hosts at SC2002 to Sunnyvale we demonstrated over 9 Gbits/s TCP traffic over a 10Gbit/s circuit (rather stably). From Amsterdam to Sunnyvale with the NIKHEF & CERN people we demonstrated a single TCP stream 923Mbits/s over 10974 km, and have submitted an Internet 2 Land Speed Record Entry (see http://www.internet2.edu/lsr/). For the SC002 Bandwidth Challenge we were able to sustain 11.5 Gbits/s (see http://wwwiepm.slac.stanford.edu/ monitoring/bulk/sc2002/bwc-scinet-plot.gif) for 15 minutes using the Level (3) Sunnyvale link and the SC2002 10GBit/s Abilene link. The DataTAG Projectnetlab.caltech.edu

Single stream vs Multiple streams effect of a single packet loss (e. g Single stream vs Multiple streams effect of a single packet loss (e.g. link error, buffer overflow) Streams/Throughput 10 5 1 7.5 4.375 2 9.375 10 Avg. 7.5 Gbps Throughput Gbps 7 5 Avg. 6.25 Gbps Avg. 4.375 Gbps 5 2.5 Avg. 3.75 Gbps T = 1.16 hours! (RTT=100msec, MSS=1500B) T T T Time T 9/20/2018 The DataTAG Project