Breaking the Internet2 Land Speed Record: Twice

Slides:

Advertisements

Similar presentations

TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot

Advertisements

FAST TCP Anwis Das Ajay Gulati Slides adapted from : IETF presentation slides Link:

Cheng Jin David Wei Steven Low FAST TCP: design and experiments.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.

1 High Performance Active End-to- end Network Monitoring Les Cottrell, Connie Logg, Warren Matthews, Jiri Navratil, Ajay Tirumala – SLAC Prepared for the.

TCP friendlyness: Progress report for task 3.1 Freek Dijkstra Antony Antony, Hans Blom, Cees de Laat University of Amsterdam CERN, Geneva 25 September.

GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.

1 Testbeds Les Cottrell Site visit to SLAC by DoE program managers Thomas Ndousse & Mary Anne Scott April 27,

1 Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell, Saad Ansari, Parakram Khandpur, Ruchi Gupta, Richard Hughes-Jones,

FAST TCP in Linux Cheng Jin David Wei

PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.

ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.

Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.

1 High Performance WAN Testbed Experiences & Results Les Cottrell – SLAC Prepared for the CHEP03, San Diego, March 2003

1 Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell Sun SuperG Spring 2005, April,

02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)

The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.

Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.

A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.

Technology for Using High Performance Networks or How to Make Your Network Go Faster…. Robin Tasker UK Light Town Meeting 9 September.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February

FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

Parallel TCP Bill Allcock Argonne National Laboratory.

1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.

Internet data transfer record between CERN and California Sylvain Ravot (Caltech) Paolo Moroni (CERN)

Iperf Quick Mode Ajay Tirumala & Les Cottrell. Sep 12, 2002 Iperf Quick Mode at LBL – Les Cottrell & Ajay Tirumala Iperf QUICK Mode Problem – Current.

1 Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell, Saad Ansari, Parakram Khandpur, Ruchi Gupta, Richard Hughes-Jones,

Project Results Thanks to the exceptional cooperation spirit between the European and North American teams involved in the DataTAG project,

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.

Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003

30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.

GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.

Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.

1 Evaluation of Advanced TCP stacks on Fast Long-Distance production Networks Prepared by Les Cottrell & Hadrien Bullot, Richard Hughes-Jones EPFL, SLAC.

INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.

FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu GNEW, CERN, March 2004.

1 FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003

1 Achieving Record Speed Trans-Atlantic End-to-end TCP Throughput Les Cottrell – SLAC Prepared for the NANOG meeting, Salt Lake City, June 2003

Architecture and Algorithms for an IEEE 802

Efficient utilization of 40/100 Gbps long-distance network

R. Hughes-Jones Manchester

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

Networking between China and Europe

Networking for grid Network capacity Network throughput

CENIC Road to Ten Gigabit: Biggest Fastest in the West

TransPAC HPCC Engineer

High Speed File Replication

Using Netflow data for forecasting

TCP Performance over a 2.5 Gbit/s Transatlantic Circuit

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

MB-NG Review High Performance Network Demonstration 21 April 2004

Wide Area Networking at SLAC, Feb ‘03

High Performance Active End-to-end Network Monitoring

Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell, Saad Ansari, Parakram Khandpur, Ruchi Gupta, Richard Hughes-Jones,

My Experiences, results and remarks to TCP BW and CT Measurements Tools Jiří Navrátil SLAC.

Advanced Networking Collaborations at SLAC

Wide-Area Networking at SLAC

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

High-Performance Data Transport for Grid Applications

Review of Internet Protocols Transport Layer

Summer 2002 at SLAC Ajay Tirumala.

Evaluation of Objectivity/AMS on the Wide Area Network

Presentation transcript:

Breaking the Internet2 Land Speed Record: Twice Les Cottrell – SLAC Prepared for the Ricoh, Menlo Park, April 2003 http://www.slac.stanford.edu/grp/scs/net/talk/ricoh-hiperf.html Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), by the SciDAC base program.

Outline Who did it? What was done? How was it done? What was special about this anyway? Who needs it? So what’s next? Where do I find out more?

Who did it: Collaborators and sponsors Caltech: Harvey Newman, Steven Low, Sylvain Ravot, Cheng Jin, Xiaoling Wei, Suresh Singh, Julian Bunn SLAC: Les Cottrell, Gary Buhrmaster, Fabrizio Coccetti LANL: Wu-chun Feng, Eric Weigle, Gus Hurwitz, Adam Englehart NIKHEF/UvA: Cees DeLaat, Antony Antony CERN: Olivier Martin, Paolo Moroni ANL: Linda Winkler DataTAG, StarLight, TeraGrid, SURFnet, NetherLight, Deutsche Telecom, Information Society Technologies Cisco, Level(3), Intel DoE, European Commission, NSF

What was done? Set a new Internet2 TCP land speed record, 10,619 Tbit-meters/sec (see http://lsr.internet2.edu/) With 10 streams achieved 8.6Gbps across US Beat the Gbps limit for a single TCP stream across the Atlantic – transferred a TByte in an hour One Terabyte transferred in less than one hour When From To Bottle-neck MTU Streams TCP Thru-put Nov ’02 (SC02) Amsterdam Sunny-vale 1 Gbps 9000B 1 Standard 923 Mbps Balti-more 10 Gbps 1500 10 FAST 8.6 Gbps Feb ‘03 Geneva 2.5 Gbps 2.38 Gbps

10GigE Data Transfer Trial European Commission On February 27-28, over a Terabyte of data was transferred in 3700 seconds by S. Ravot of Caltech between the Level3 PoP in Sunnyvale, near SLAC, and CERN. The data passed through the TeraGrid router at StarLight from memory to memory as a single TCP/IP stream at an average rate of 2.38 Gbps (using large windows and 9KByte “jumbo” frames). This beat the former record by a factor of approximately 2.5, and used the US-CERN link at 99% efficiency. Original slide by: Olivier Martin, CERN

How was it done: Typical testbed 6*2cpu servers 12*2cpu servers 7609 T640 GSR 4 disk servers 4 disk servers OC192/POS (10Gbits/s) Chicago Sunnyvale 2.5Gbits/s (EU+US) 7609 Sunnyvale section deployed for SC2002 (Nov 02) 6*2cpu servers SNV Geneva CHI AMS > 10,000 km GVA

Typical Components Disk servers CPU Routers Compute servers Earthquake strap Typical Components Disk servers CPU Pentium 4 (Xeon) with 2.4GHz cpu For GE used Syskonnect NIC For 10GE used Intel NIC Linux 2.4.19 or 20 Routers Cisco GSR 12406 with OC192/POS & 1 and 10GE server interfaces (loaned, list > $1M) Cisco 760x Juniper T640 (Chicago) Level(3) OC192/POS fibers (loaned SNV-CHI monthly lease cost ~ $220K) Compute servers Heat sink GSR Note bootees

Challenges PCI bus limitations (66MHz * 64 bit = 4.2Gbits/s at best) At 2.5Gbits/s and 180msec RTT requires 120MByte window Some tools (e.g. bbcp) will not allow a large enough window – (bbcp limited to 2MBytes) Slow start problem at 1Gbits/s takes about 5-6 secs for 180msec link, i.e. if want 90% of measurement in stable (non slow start), need to measure for 60 secs need to ship >700MBytes at 1Gbits/s After a loss it can take over an hour for stock TCP (Reno) to recover to maximum throughput at 1Gbits/s i.e. loss rate of 1 in ~ 2 Gpkts (3Tbits), or BER of 1 in 3.6*1012 Sunnyvale-Geneva, 1500Byte MTU, stock TCP

Windows and Streams Well accepted that multiple streams (n) and/or big windows are important to achieve optimal throughput Effectively reduces impact of a loss by 1/n, and improves recovery time by 1/n Optimum windows & streams changes with changes (e.g. utilization) in path, hard to optimize n Can be unfriendly to others

Even with big windows (1MB) still need multiple streams with Standard TCP ANL, Caltech & RAL reach a knee (between 2 and 24 streams) above this gain in throughput slow Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams Streams, windows can change during day, hard to optimize

Impact on others

New TCP Stacks Reno (AIMD) based, loss indicates congestion Standard Back off less when see congestion Recover more quickly after backing off Scalable TCP: exponential recovery Tom Kelly, Scalable TCP: Improving Performance in Highspeed Wide Area Networks Submitted for publication, December 2002. High Speed TCP: same as Reno for low performance, then increase window more & more aggressively as window increases using a table Vegas based, RTT indicates congestion Caltech FAST TCP, quicker response to congestion, but … Standard Scalable High Speed

Stock vs FAST TCP MTU=1500B Need to measure all parameters to understand effects of parameters, configurations: Windows, streams, txqueuelen, TCP stack, MTU, NIC card Lot of variables Examples of 2 TCP stacks FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables to tune by 1) Stock TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT

Jumbo frames Become more important at higher speeds: Reduce interrupts to CPU and packets to process, reduce cpu utilization Similar effect to using multiple streams (T. Hacker) Jumbo can achieve >95% utilization SNV to CHI or GVA with 1 or multiple stream up to Gbit/s Factor 5 improvement over single stream 1500B MTU throughput for stock TCP (SNV-CHI(65ms) & CHI-AMS(128ms)) Complementary approach to a new stack Deployment doubtful Few sites have deployed Not part of GE or 10GE standards 1500B Jumbos

TCP stacks with 1500B MTU @1Gbps txqueuelen

Jumbo frames, new TCP stacks at 1 Gbits/s SNV-GVA

Other gotchas Large windows and large number of streams can cause last stream to take a long time to close. Linux memory leak Linux TCP configuration caching What is the window size actually used/reported 32 bit counters in iperf and routers wrap, need latest releases with 64bit counters Effects of txqueuelen (number of packets queued for NIC) Routers do not pass jumbos Performance differs between drivers and NICs from different manufacturers May require tuning a lot of parameters

What was special? End-to-end application-to-application, single and multi-streams (not just internal backbone aggregate speeds) TCP has not run out of stream yet, scales into multi-Gbits/s region TCP well understood, mature, many good features: reliability etc. Friendly on shared networks New TCP stacks only need to be deployed at sender Often just a few data sources, many destinations No modifications to backbone routers etc No need for jumbo frames Used Commercial Off The Shelf (COTS) hardware and software

Who needs it? HENP – current driver Data intensive science: Multi-hundreds Mbits/s and Multi TByte files/day transferred across Atlantic today SLAC BaBar experiment already has > PByte stored Tbits/s and ExaBytes (1018) stored in a decade Data intensive science: Astrophysics, Global weather, Bioinformatics, Fusion, seismology… Industries such as aerospace, medicine, security … Future: Media distribution Gbits/s=2 full length DVD movies/minute 2.36Gbits/s is equivalent to Transferring a full CD in 2.3 seconds (i.e. 1565 CDs/hour) Transferring 200 full length DVD movies in one hour (i.e. 1 DVD in 18 seconds) Will sharing movies be like sharing music today?

When will it have an impact ESnet traffic doubling/year since 1990 SLAC capacity increasing by 90%/year since 1982 SLAC Internet traffic increased by factor 2.5 in last year International throughput increase by factor 10 in 4 years So traffic increases by factor 10 in 3.5 to 4 years, so in: 3.5 to 5 years 622 Mbps => 10Gbps 3-4 years 155 Mbps => 1Gbps 3.5-5 years 45Mbps => 622Mbps 2010-2012: 100s Gbits for high speed production net end connections 10Gbps will be mundane for R&E and business Home: doubling ~ every 2 years, 100Mbits/s by end of decade? Throughput from US Throughput Mbits/s

What’s next? Break 2.5Gbits/s limit Disk-to-disk throughput & useful applications Need faster cpus (extra 60% MHz/Mbits/s over TCP for disk to disk), understand how to use multi-processors Evaluate new stacks with real-world links, and other equipment Other NICs Response to congestion, pathologies Fairnesss Deploy for some major (e.g. HENP/Grid) customer applications Understand how to make 10GE NICs work well with 1500B MTUs Move from “hero” demonstrations to commonplace

More Information Internet2 Land Speed Record Publicity www-iepm.slac.stanford.edu/lsr/ www-iepm.slac.stanford.edu/lsr2/ 10GE tests www-iepm.slac.stanford.edu/monitoring/bulk/10ge/ sravot.home.cern.ch/sravot/Networking/10GbE/10GbE_test.html TCP stacks netlab.caltech.edu/FAST/ datatag.web.cern.ch/datatag/pfldnet2003/papers/kelly.pdf www.icir.org/floyd/hstcp.html Stack comparisons www-iepm.slac.stanford.edu/monitoring/bulk/fast/ www.csm.ornl.gov/~dunigan/net100/floyd.html