1 Achieving high performance throughput in production networks Les Cottrell – SLAC Presented at the Internet 2 HENP Networking Working Group kickoff meeting.

Slides:

Advertisements

Similar presentations

Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.

Advertisements

TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot

CCNA3: Switching Basics and Intermediate Routing v3.0 CISCO NETWORKING ACADEMY PROGRAM Switching Concepts Introduction to Ethernet/802.3 LANs Introduction.

Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.

1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.

Merit Network: Connecting People and Organizations Since 1966 CALEA Compliance – A Feasibility Study October 25, 2006 Mary Eileen McLaughlin Director –

1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.

Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.

High-Performance Throughput Tuning/Measurements Davide Salomoni & Steffen Luitz Presented at the PPDG Collaboration Meeting, Argonne National Lab, July.

1 A new infrastructure for high throughput network and application performance measurement. Les Cottrell – SLAC Prepared for the IPAM meeting, UCLA Mar.

1 Achieving high performance throughput in production networks Les Cottrell – SLAC Presented at the Internet 2 HENP Networking Working Group kickoff meeting.

1 Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell, Saad Ansari, Parakram Khandpur, Ruchi Gupta, Richard Hughes-Jones,

Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.

1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.

Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.

Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.

MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.

Comparing the Accuracy of Network Simulators for Packet-Level Analysis using a Network Testbed Chaudhry Usman Ali UNB, Fredericton.

KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.

1 Monitoring Internet connectivity of Research and Educational Institutions Les Cottrell – SLAC/Stanford University Prepared for the workshop on “Developing.

Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.

Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.

Raj Jain The Ohio State University R1: Performance Analysis of TCP Enhancements for WWW Traffic using UBR+ with Limited Buffers over Satellite.

2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.

1 Network performance measurements Les Cottrell – SLAC Prepared for the ICFA-SCIC, CERN December 8, 2001 Partially funded by DOE/MICS Field Work Proposal.

1 A new infrastructure for high throughput network and application performance measurement. Les Cottrell – SLAC Prepared for the IPAM meeting, UCLA Mar.

Sharing Information across Congestion Windows CSE222A Project Presentation March 15, 2005 Apurva Sharma.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February

LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor

1 Grid Related Activities at Caltech Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000.

High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

Parallel TCP Bill Allcock Argonne National Laboratory.

1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.

1 High performance Throughput Les Cottrell – SLAC Lecture # 5a presented at the 26 th International Nathiagali Summer College on Physics and Contemporary.

Iperf Quick Mode Ajay Tirumala & Les Cottrell. Sep 12, 2002 Iperf Quick Mode at LBL – Les Cottrell & Ajay Tirumala Iperf QUICK Mode Problem – Current.

NET100 Development of network-aware operating systems Tom Dunigan

Draft-ietf-ippm-tcp-throughput-tm-04.txt 1 TCP Throughput Testing Methodology IETF 78 Maastricht Reinhard Schrage Barry Constantine.

1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet.

IEPM. Warren Matthews (SLAC) Presented at the ESCC Meeting Miami, FL, February 2003.

1 Passive and Active Monitoring on a High-performance Network Les Cottrell, Warren Matthews, Davide Salomoni, Connie Logg – SLAC

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.

SLAC Status, Les CottrellESnet International Meeting, Kyoto July 24-25, 2000 SLAC Update Les Cottrell & Richard Mount July 24, 2000.

Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003

GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.

Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.

NET100 Development of network-aware operating systems Tom Dunigan

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

1 Experiences and results from implementing the QBone Scavenger Les Cottrell – SLAC Presented at the CENIC meeting, San Diego, May

Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.

INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.

LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.

1 IEPM / PingER project & PPDG Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99 Partially funded by DOE/MICS Field Work Proposal on.

1 FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003

Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.

Achieving high performance throughput in production networks

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

R. Hughes-Jones Manchester

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

High Speed File Replication

Using Netflow data for forecasting

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

Wide Area Networking at SLAC, Feb ‘03

IEPM. Warren Matthews (SLAC)

Wide-Area Networking at SLAC

Summer 2002 at SLAC Ajay Tirumala.

Presentation transcript:

1 Achieving high performance throughput in production networks Les Cottrell – SLAC Presented at the Internet 2 HENP Networking Working Group kickoff meeting at Internet 2 Ann Arbor, Michigan, Oct 26 ‘01 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

2 High Speed Bulk Throughput Driven by: –Data intensive science, e.g. data grids –HENP data rates, e.g. BaBar 300TB/year, collection doubling yearly, i.e. PBytes in couple of years –Data rate from experiment today ~ 20MBytes/s ~ 200GBytes/d –Multiple regional computer centers (e.g. Lyon-FR, RAL-UK, INFN-IT, LBNL-CA, LLNL-CA, Caltech-CA) need copies of data Tier A gets 1/3 data in 1/3 year (full rate), SLAC does not keep copy –Boeing 747 high throughput, BUT poor latency (~ 2 weeks) & very people intensive So need high-speed networks and ability to utilize –High speed today = few hundred GBytes/day (100GB/d ~ 10Mbits/s) Data vol Moore’s law

3 How to measure network throughput Selected about 2 dozen major collaborator sites in US, CA, JP, FR, CH, IT, UK over last year –Of interest to SLAC –Can get logon accounts Use iperf –Choose window size and # parallel streams –Run for 10 seconds together with ping (loaded) –Stop iperf, run ping (unloaded) for 10 seconds –Change window or number of streams & repeat Record # streams, window, throughput (Mbits/s), loaded & unloaded ping responses, cpu utilization, real time Verify window sizes are set properly by using tcpdump can’t believe what application tells you   Note cpu speeds, interface speeds, operating system, path characteristics

4 Solaris Default window size Typical results Today Hi-thru usually = big windows & multiple streams Improves ~ linearly with streams for small windows 8kB 16kB 32kB 100kB 64kB Broke 100Mbps Trans Atlantic Barrier

5 Windows vs Streams Often for fixed streams*window product, streams are more effective than window size, e.g. SLAC>CERN, Jul ‘01: There is an optimum number of streams above which performance flattens out Common for throughputs to be asymmetric –more congestion one way, different routes, host dependencies WindowStreamsMbpsW*S 64KB KB 128KB KB 256KB KB 512KB KB 1024KB KB KB 4096KB KB

6 Windows vs Streams Multi-streams often more effective than windows –more agile in face of congestion Often easier to set up –Need root to configure kernel to set max window –Network components may not support big windows –Some OS’ treat max windows strangely  May be able to take advantage of multiple paths But: – may be considered over-aggressive (RFC 2914)  –can take more cpu cycles –how to know how many streams?

7 Iperf client CPU utilization As expected increases with throughput (mainly kernel)  0.7*MHz/Mbits/s For fixed throughput –Fewer streams take less cpu  –E.g. 1-4 streams take 20% less cpu than 8-16 streams for same throughput (if can get it)

8 Throughput quality improvements TCP BW < MSS/(RTT*sqrt(loss)) Macroscopic Behavior of the TCP Congestion Avoidance Algorithm, Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), July 1997 Note E. Europe keeping up 80% annual improvement ~ factor 10/4yr China

9 Bandwidth changes with time 1/2 Short term competing cross-traffic, other users, factors of 3-5 observed in 1 minute Long term: link, route upgrades, factors 3-16 in 12 months All hosts had 100Mbps NICs. Recently have measured 105Mbps SLAC > IN2P3 and 340Mbps Caltech > SLAC with GE

10 Network Simulator (ns-2) From UCB, simulates network –Choice of stack (Reno, Tahoe, Vegas, SACK…) –RTT, bandwidth, flows, windows, queue lengths … Compare with measured results –Agrees well –Confirms observations (e.g. linear growth in throughput for small window sizes as increase number of flows)

11 Agreement of ns2 with observed

12 Ns-2 thruput & loss predict Indicates on unloaded link can get 70% of available bandwidth without causing noticeable packet loss Can get over 80-90% of available bandwidth Can overdrive: no extra throughput BUT extra loss 90%

13 Simulator benefits No traffic on network (nb throughput can use 90%) Can do what if experiments No need to install iperf servers or have accounts No need to configure host to allow large windows BUT –Need to estimate simulator parameters, e.g. RTT use ping or synack Bandwidth, use pchar, pipechar etc., moderately accurate AND its not the real thing –Need to validate vs. observed data –Need to simulate cross-traffic etc

14 Impact on Others Make ping measurements with & without iperf loading –Loss loaded(unloaded) –RTT Looking at how to avoid impact: e.g. QBSS/LBE, application pacing, control loop on stdev(RTT) reducing streams, want to avoid scheduling

15 File Transfer Used bbcp (written by Andy Hanushevsky) –similar methodology to iperf, except ran for file length rather than time, provides incremental throughput reports, supports /dev/zero, adding duration –looked at /afs/, /tmp/, /dev/null –checked different file sizes Behavior with windows & streams similar to iperf Thru bbcp ~0.8*Thru iperf For modest throughputs (< 50Mbits/s) rates are independent of whether destination is /afs/, /tmp/ or /dev/null. Cpu utilization ~ 1MHz/Mbit/s is ~ 20% > than for iperf

16 Application rate-limiting Bbcp has transfer rate limiting –Could use network information (e.g. from Web100 or independent pinging) to bbcp to reduce/increase its transfer rate, or change number of parallel streams No rate limiting, 64KB window, 32 streams 15MB/s rate limiting, 64KB window, 32 streams

17 Using bbcp to make QBSS measurements Run bbcp src data /dev/zero, dst=/dev/null, report throughput at 1 second intervals –with TOS=32 (QBSS) –After 20 s. run bbcp with no TOS bits specified (BE) –After 20 s. run bbcp with TOS=40 (priority) –After 20 more secs turn off Priority –After 20 more secs turn off BE

18 QBSS test bed with Cisco 7200s Set up QBSS testbed Configure router interfaces –3 traffic types: QBSS, BE, Priority –Define policy, e.g. QBSS > 1%, priority < 30% –Apply policy to router interface queues 10Mbps 100Mbps 1Gbps Cisco 7200s

19 Example of effects Also tried: 1 stream for all, and priority at 30%

20 QBSS with Cisco s + Policy Feature Card (PFC) –Routing by PFC2, policing on switch interfaces –2 queues, 2 thresholds each –QBSS assigned to own queue with 5% bandwidth – guarantees QBSS gets something –BE & Priority traffic in 2 nd queue with 95% bandwidth –Apply ACL to switch port to police Priority traffic to < 30% 100Mbps 1Gbps Cisco 6500s + MSFC/Sup2 Time 100% BE Priority (30%) QBSS (~5%)

21 Impact on response time (RTT) Run ping with Iperf loading with various QoS settings, iperf ~ 93Mbps –No iperf ping avg RTT ~ 300usec (regardless of QoS) –Iperf = QBSS, ping=BE or Priority: RTT~550usec 70% greater than unloaded –Iperf=Ping QoS (exc. Priority) then RTT~5msec > factor of 10 larger RTT than unloaded –If both ping & iperf have QoS=Priority then ping RTT very variable since iperf limited to 30% RTT quick when iperf limited, long when iperf transmits

22 Possible HEP usage Apply priority to lower volume interactive voice/video-conferencing and real time control Apply QBSS to high volume data replication Leave the rest as Best Effort Since 40-65% of bytes to/from SLAC come from a single application, we have modified to enable setting of TOS bits Need to identify bottlenecks and implement QBSS there Bottlenecks tend to be at edges so hope to try with a few HEP sites

23 Acknowledgements for SC2001 Many people assisted in getting accounts, setting up servers, providing advice, software etc. –Suresh Man Singh, Harvey Newman, Julian Bunn (Caltech), Andy Hanushevsky, Paola Grosso, Gary Buhrmaster, Connie Logg (SLAC), Olivier Martin (CERN), Loric Totay, Jerome Bernier (IN2P3), Dantong Yu (BNL), Robin Tasker, Paul Kummer (DL), John Gordon (RL), Brian Tierney, Bob Jacobsen, (LBL), Stanislav Shalunov (Internet 2), Joe Izen (UT Dallas), Linda Winkler, Bill Allcock (ANL), Ruth Pordes, Frank Nagy (FNAL), Emanuele Leonardi (INFN), Chip Watson (JLab), Yukio Karita (KEK), Tom Dunigan (ORNL), John Gordon (RL), Andrew Daviel (TRIUMF), Paul Avery, Greg Goddard (UFL), Paul Barford, Miron Livny (UWisc), Shane Canon (NERSC), Andy Germain (NASA), Andrew Daviel (TRIUMF), Richard baraniuk, Rold Reidi (Rice).

24 SC2001 demo Send data from SLAC/FNAL booth computers (emulate a tier 0 or 1 HENP site) to over 20 other sites with good connections in about 6 countries –Throughputs from SLAC range from 3Mbps to > 300Mbps Part of bandwidth challenge proposal Saturate 2Gbps connection to floor network Apply QBSS to some sites, priority to a few and rest Best Effort –See how QBSS works at high speeds Competing bulk throughput streams Interactive low throughput streams, look at RTT with ping

25 WAN thruput conclusions High FTP performance across WAN links is possible –Even with 20-30Mbps bottleneck can do > 100Gbytes/day –Can easily saturate a fast Ethernet interface over WAN –Need GE NICs, > OC3 WANs & to improve performance Performance is improving OS must support big windows selectable by application Need multiple parallel streams in some cases Loss is important in particular interval between losses Can get close to max thruput with small (<=32Mbyte) with sufficient (5-10) streams Improvements of 5 to 60 in thruput by using multiple streams & larger windows Impacts others users, QBSS looks hopeful

26 More Information IEPM/PingER home site: –www-iepm.slac.stanford.edu/www-iepm.slac.stanford.edu/ Bulk throughput site: –www-iepm.slac.stanford.edu/monitoring/bulk/www-iepm.slac.stanford.edu/monitoring/bulk/ Transfer tools: – – – – TCP Tuning: – –www-didc.lbl.gov/tcp-wan.htmlwww-didc.lbl.gov/tcp-wan.html QBSS measurements –www-iepm.slac.stanford.edu/monitoring/qbss/measure.htmlwww-iepm.slac.stanford.edu/monitoring/qbss/measure.html

27 Extra supporting slides

28 Streams usually share bandwidth TCP fair share with best effort

29 Progress towards goal: 100 Mbytes/s Site-to-Site Focus on SLAC – Caltech over NTON; Using NTON wavelength division fibers up & down W. Coast US; Replaced Exemplar with 8*OC3 & Suns with Pentium IIIs & OC12 (622Mbps) SLAC Cisco with OC48 (2.4Gbps) and 2 × OC12; Caltech Juniper M160 & OC48 ~500 Mbits/s single stream achieved recently over single OC12. 2 OC12s, 1 machine at Caltech and 2 machines at SLAC gave >~ 600Mbits/

30 Optimizing streams Choose # streams to optimize throughput/impact –Measure RTT from Web100 –App controls # streams

31 SC2000 WAN Challenge SC2000, Dallas to SLAC RTT ~ 48msec –SLAC/FNAL booth: Dell PowerEdge PIII 2 * 550MHz with 64bit PCI + Dell 850MHz both running Linux, each with GigE, connected to Cat 6009 with 2GigE bonded to Extreme SC2000 floor switch –NTON: OC48 to GSR to Cat 5500 Gig E to Sun E4500 4*460MHz and Sun E4500 6*336MHz Internet 2: 300 Mbits/s NTON 960Mbits/s Details: –www-iepm.slac.stanford.edu/monitoring/bulk/sc2k.htmlwww-iepm.slac.stanford.edu/monitoring/bulk/sc2k.html

32 Optimum window & streams 1/2 Require RTT * BandWidth   *window*streams E.g. for SLAC CERN: –Window*streams=159msec*85Mbits/s~1.7Mbytes –Over 1000 pkts in pipe at a time W KB SW * S MB Use ping for RTT Pipechar for BW

33 LAN MTU Caltech 2 Dells dual 730MHz & dual 860MHz PCs Connect with Syskonnect dual port GE interfaces, Linux smp –Run iperf & ttcp, window –Got 475Mbps/port for 1.5KB MTU –Got 960MBytes/port for 9KB MTU Windows 2000 –Run Microsoft “speedy” application –Got 820Mbits/s with 9KB MTU, max IRQ 3000/s, 128KB buffers, 20MB window size for TCP, single stream

34 LAN windows & streams With today’s GE interfaces need big windows: –e.g. 2 hosts on same Ethernet switch, RTT~300usec, needs window size of 300KBytes MB/s for window >= 64KB & streams 5-8 –windows*streams does not work –windows alone no good –streams alone no good Between Sun E420R with 4 cpus at 450MHz and Netra 1405 with 4 cpus running at 440MHz, running Solaris 5.7

35 Bandwidth changes with time 2/2 CERN to Caltech for last year and a half –Improvements come in steps, when bottleneck upgraded

36 Other Improvements Improved TCP stacks –Faster recovery after losses, selective acknowledgement Web100 experimental TCP stack –Allows read/write access to TCP internal variables –Can provide improved monitoring & diagnostics –So application can tune stack on the fly

37 Applications Main network application focus today is on replication at multiple sites worldwide (mainly N. America, Europe and Japan) Need fast, secure, easy to use, extendable way to copy data between sites –Need to interactive and real time at same time, e.g. experiment control, video & voice conferencing HEP community has developed 2 major (freely available) applications to meet replication need: bbftp and bbcp

38 Bbcp Peer-to-peer copy program with multiple (<=64) streams, large window support, secure password exchange (ssh control path, single use passwords (data path)), similar syntax to scp bbcp Data bbcp Data Peer-to-peer –No server, if have program, have service (usually no need for admins), any node can act as source or sink, 3 rd party copies Provides sequential I/O (e.g. from /dev/zero, to pipe or tape or /dev/null) and progress reporting C++ component design allows testing new algorithms (relatively easy to extend)

39 Bbcp: algorithms Data pipelining –Multiple streams “simultaneously” pushed Automatically adapts to router traffic shaping Can control maximum rate Can write to tape, read from /dev/zero, write to /dev/null, pipe Check-pointing (resume failed transmission) Coordinated buffers –All buffers same-sized end-to-end Page aligned buffers –Allows direct I/O on many file-systems (e.g. Veritas)