High Performance Active End-to-end Network Monitoring

Slides:

Advertisements

Similar presentations

Web100 at SLAC Presented at the Web100 Workshop, Boulder, CO, August 2002.

Advertisements

TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

1 High Performance Active End-to- end Network Monitoring Les Cottrell, Connie Logg, Warren Matthews, Jiri Navratil, Ajay Tirumala – SLAC Prepared for the.

1 IEPM-BWIEPM-BW Warren Matthews (SLAC) Presented at the UCL Monitoring Infrastructure Workshop, London, May 15-16, 2003.

MAGGIE Monitoring and Analysis for the Global Grid and Internet End-to-end performance Warren Matthews (SLAC) Presented at the Measurement SIG ESCC/Internet2.

1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.

1 SLAC Internet Measurement Data Les Cottrell, Jerrod Williams, Connie Logg, Paola Grosso SLAC, for the ISMA Workshop, SDSC June,

GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.

PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.

1 Characterization and Evaluation of TCP and UDP-based Transport on Real Networks Les Cottrell, Saad Ansari, Parakram Khandpur, Ruchi Gupta, Richard Hughes-Jones,

1 Terapaths: Datagrid WAN Network Monitoring Infrastructure Les Cottrell, Connie Logg, Jerrod Williams SLAC, for the DoE 2004 PI Network Research Meeting,

1 IEPM-BW a new network/application throughput performance measurement infrastructure Les Cottrell – SLAC Presented at the GGF4 meeting, Toronto Feb 20-21,

DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.

Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.

1 High Performance WAN Testbed Experiences & Results Les Cottrell – SLAC Prepared for the CHEP03, San Diego, March 2003

KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.

Hands-on Networking Fundamentals

1 ESnet Network Measurements ESCC Feb Joe Metzger

What we have learned from developing and running ABwE Jiri Navratil, Les R.Cottrell (SLAC)

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February

1 ESnet/HENP Active Internet End-to-end Performance & ESnet/University performance Les Cottrell – SLAC Presented at the ESSC meeting Albuquerque, August.

1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.

1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.

Iperf Quick Mode Ajay Tirumala & Les Cottrell. Sep 12, 2002 Iperf Quick Mode at LBL – Les Cottrell & Ajay Tirumala Iperf QUICK Mode Problem – Current.

NET100 Development of network-aware operating systems Tom Dunigan

1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet.

IEPM. Warren Matthews (SLAC) Presented at the ESCC Meeting Miami, FL, February 2003.

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.

Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003

1 MAGGIE Monitoring and Analysis for the Global Grid and Internet End-to-end performance Warren Matthews Stanford Linear Accelerator Center (SLAC)

GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.

NET100 Development of network-aware operating systems Tom Dunigan

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.

INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.

Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.

1 IEPM / PingER project & PPDG Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99 Partially funded by DOE/MICS Field Work Proposal on.

1 FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003

Toward a Measurement Infrastructure. Warren Matthews (SLAC) Presented at the e2e Workshop Miami, FL, February 2003.

Measurement team Hans Ludwing Reyes Chávez Network Operation Center

IEPM-BW (or PingER on steroids) and the PPDG

Paola Grosso SLAC October

R. Hughes-Jones Manchester

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

Networking for grid Network capacity Network throughput

Milestones/Dates/Status Impact and Connections

High Speed File Replication

Warren Matthews and Les Cottrell (SLAC)

Using Netflow data for forecasting

ESnet Network Measurements ESCC Feb Joe Metzger

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

MB-NG Review High Performance Network Demonstration 21 April 2004

By Les Cottrell for UltraLight meeting, Caltech October 2005

Wide Area Networking at SLAC, Feb ‘03

Connie Logg February 13 and 17, 2005

My Experiences, results and remarks to TCP BW and CT Measurements Tools Jiří Navrátil SLAC.

Experiences in Traceroute and Available Bandwidth Change Analysis

Breaking the Internet2 Land Speed Record: Twice

Advanced Networking Collaborations at SLAC

IEPM. Warren Matthews (SLAC)

Wide-Area Networking at SLAC

Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.

PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.

Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the

Achieving reliable high performance in LFNs (long-fat networks)

Summer 2002 at SLAC Ajay Tirumala.

Presentation transcript:

High Performance Active End-to-end Network Monitoring Les Cottrell, Connie Logg, Warren Matthews, Jiri Navratil, Ajay Tirumala – SLAC Prepared for the Protocols for Long Distance Networks Workshop, CERN, February 2003 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), by the SciDAC base program, and also supported by IUPAP

Outline High performance testbed Challenges for measurements at high speeds Simple infrastructure for regular high-performance measurements Results

Testbed 12 cpu servers 6 cpu servers 7606 T640 GSR 4 disk servers OC192/POS (10Gbits/s) 4 disk servers Sunnyvale 2.5Gbits/s 6 cpu servers 7606 Sunnyvale section deployed for SC2002 (Nov 02)

Problems: Achievable TCP throughput Typically use iperf Want to measure stable throughput (i.e. after slow start) Slow start takes quite long at high BW*RTT GE for RTT from California to Geneva (RTT=182ms) slow start takes ~ 5s So for slow start to contribute < 10% to throughput measured need to run for 50s About double for Vegas/FAST TCP Ts~2*ceiling(log2(W/MSS))*RTT W=RTT*BW SStime=2*ceiling(log2(W/MSS))*RTT abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ`1234567890-=~!@#$%^&*()_+;’,./:”<>? So developing Quick Iperf Use web100 to tell when out of slow start Measure for 1 second afterwards 90% reduction in duration and bandwidth used

Examples (stock TCP, MTU 1500B) BW*RTT~800KB, Tcp_win_max=16MB 24ms RTT Caltech is typical (BW*RTT=800KB, max TCP window 16MB), RTT=24ms Rice has a small receive window of 256KB but BW*RTT=1.6MB, RTT=45ms Japan (apan.jp) has 132ms 140ms RTT BW*RTT~5MB Rcv_window=256KB BW*RTT=1.6MB, 132ms

Problems: Achievable bandwidth Typically use packet pair dispersion or packet size techniques (e.g. pchar, pipechar, pathload, pathchirp, …) In our experience current implementations fail for > 155Mbits/s and/or take a long time to make a measurement Developed a simple practical packet pair tool ABwE Typically uses 40 packets, tested up to 950Mbits/s Low impact Few seconds for measurement (can use for real-time monitoring) Pipechar typically takes 4 minutes to make a measurement

Typically use packet pair dispersion or packet size techniques (e. g Typically use packet pair dispersion or packet size techniques (e.g. pchar, pipechar, pathload, pathchirp, …) Measurements 1 minute separation Normalize with iperf ABwE Results Drops caused by a cron job copying data to NFS file every hour. Note every hour sudden dip in available bandwidth

Problem: File copy applications Some tools (e.g. bbcp will not allow a large enough window – currently limited to 2MBytes) Same slow start problem as iperf Need big file to assure not cached E.g. 2GBytes, at 200 Mbits/s takes 80s to transfer, even longer at lower speeds Looking at whether can get same effect as a big file but with a small (64MByte) file, by playing with commit Many more factors involved, e.g. adds file system, disks speeds, RAID etc. Maybe best bet is to let the user measure it for us.

Passive (Netflow) Measurements Use Netflow measurements from border router Netflow records time, duration, bytes, packets etc./flow Calculate throughput from Bytes/duration Validate vs. iperf, bbcp etc. No extra load on network, provides other SLAC & remote hosts & applications, ~ 10-20K flows/day, 100-300 unique pairs/day Tricky to aggregate all flows for single application call Look for flows with fixed triplet (sce & dst addr, and port) Starting at the same time +- 2.5 secs, ending at roughly same time - needs tuning missing some delayed flows Check works for known active flows To ID application need a fixed server port (bbcp peer-to-peer but have modified to support) Investigating differences with tcpdump Aggregate throughputs, note number of flows/streams

Passive vs active Iperf SLAC to Caltech (Feb-Mar ’02) + Active 450 Mbits/s Passive Active Date Bbftp SLAC to Caltech (Feb-Mar ’02) Iperf matches well 80 BBftp reports under what it achieves Mbits/s + Active + Passive Date

Problems: Host configuration Need fast interface and hi-speed Internet connection Need powerful enough host Need large enough available TCP windows Need enough memory Need enough disk space

Windows and Streams Well accepted that multiple streams and/or big windows are important to achieve optimal throughput Can be unfriendly to others Optimum windows & streams changes with changes in path, hard to optimize For 3Gbits/s and 200ms RTT need a 75MByte window

Even with big windows (1MB) still need multiple streams with stock TCP ANL, Caltech & RAL reach a knee (between 2 and 24 streams) above this gain in throughput slow Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams

Impact on others

Configurations 1/2 Do we measure with standard parameters, or do we measure with optimal? Need to measure all to understand effects of parameters, configurations: Windows, streams, txqueuelen, TCP stack, MTU Lot of variables Examples of 2 TCP stacks FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables by 1) Stock TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU 65ms RTT

Configurations: Jumbo frames Become more important at higher speeds: Reduce interrupts to CPU and packets to process Similar effect to using multiple streams (T. Hacker) Jumbo can achieve >95% utilization SNV to CHI or GVA with 1 or multiple stream up to Gbit/s Factor 5 improvement over 1500B MTU throughput for stock TCP (SNV-CHI(65ms) & CHI-AMS(128ms)) Alternative to a new stack

Time to reach maximum throughput 23ms~320 9000 Byte MTUs, 1916 1500 Byte MTUs

Other gotchas Linux memory leak Linux TCP configuration caching What is the window size actually used/reported 32 bit counters in iperf and routers wrap, need latest releases with 64bit counters Effects of txqueuelen Routers do not pass jumbos

Repetitive long term measurements

IEPM-BW = PingER NG Driven by data replication needs of HENP, PPDG, DataGrid No longer ship plane/truck loads of data Latency is poor Now ship all data by network (TB/day today, double each year) Complements PingER, but for high performance nets Need an infrastructure to make E2E network (e.g. iperf, packet pair dispersion) & application (FTP) measurements for high-performance A&R networking Started SC2001

Tasks Develop/deploy a simple, robust ssh based E2E app & net measurement and management infrastructure for making regular measurements Major step is setting up collaborations, getting trust, accounts/passwords Can use dedicated or shared hosts, located at borders or with real applications COTS hardware & OS (Linux or Solaris) simplifies application integration Integrate base set of measurement tools (ping, iperf, bbcp …), provide simple (cron) scheduling Develop data extraction, reduction, analysis, reporting, simple forecasting & archiving

Purposes Compare & validate tools With one another (pipechar vs pathload vs iperf or bbcp vs bbftp vs GridFTP vs Tsunami) With passive measurements, With web100 Evaluate TCP stacks (FAST, Sylvain Ravot, HS TCP, Tom Kelley, Net100 …) Trouble shooting Set expectations, planning Understand requirements for high performance, jumbos performance issues, in network, OS, cpu, disk/file system etc. Provide public access to results for people & applications

Measurement Sites Production, i.e. choose own remote hosts, run monitor themselves: SLAC (40) San Francisco, FNAL (2) Chicago, INFN (4) Milan, NIKHEF (32) Amsterdam, APAN Japan (4) Evaluating toolkit: Internet 2 (Michigan), Manchester University, UCL, Univ. Michigan, GA Tech (5) Also demonstrated at: iGrid2002, SC2002 Using on Caltech / SLAC / DataTag / Teragrid / StarLight / SURFnet testbed If all goes well 30-60 minutes to install monitoring host, often problems with keys, disk space, ports blocked, not registered in DNS, need for web access, disk space SLAC monitoring over 40 sites in 9 countries

Monitor NY CHI SNV ORN 100Mbps GE SEA SNV NY ATL HSTN IPLS CLV 278 17 TRIUMF NIKHEF 56 Monitor KEK 120 LANL 17 CERN 300 433 478 FNAL IN2P3 CAnet Surfnet 65 NERSC ANL CERN CHI 110 Renater RAL 220 ESnet SLAC SNV 80 ORN NY UManc UCL SLAC 31 JLAB JAnet DL 323 ORNL NNW BNL Stanford 42 APAN 44 290 95 93 GARR Stanford RIKEN INFN-Roma 11 100Mbps GE APAN Geant INFN-Milan Boxes with bold border are monitoring sites Crosshatched boxes and network collaborators Boxes with diagonal lines are PPDG/GriPhyN collaborators Open boxes are EDG collaborators Grey characters are the “GigaPoPs” that the nodes connect to in the ISP Italics are hosts with 100Mbits/s NICs, others have GE NICs Clouds are ISPs. There is not enough space to show all the ISPs outside the US. 15 CalREN SEA SNV NY Abilene CESnet 220 ATL 220 HSTN IPLS CLV 68 133 SOX Caltech SDSC Rice UIUC 31 UTDallas I2 UMich 125 140 18 UFL 226 84

Results Time series data, scatter plots, histograms CPU utilization required (MHz/Mbits/s) jumbo and standard, new stacks Forecasting Diurnal behavior characterization Disk throughput as function of OS, file system, caching Correlations with passive, web100

www.slac.stanford.edu/comp/net/bandwidth-tests/antonia/html/slac_wan_bw_tests.html

Excel

Problem Detection Must be lots of people working on this ? Our approach is: Rolling averages if have recent data Diurnal changes

Rolling Averages Step changes Diurnal Changes EWMA~Avg of last 5 points +- 2%

Fit to a*sin(t+f)+g Indicate “diurnalness” by df, can look at previous week at same time, if do not have recent measurements, 25% hosts show strong diurnalness

Alarms Too much to keep track of Rather not wait for complaints Automated Alarms Rolling average à la RIPE-TTM

Week number

Action However concern is generated Look for changes in traceroute Compare tools Compare common routes Cross reference other alarms

Next steps Rewrite (again) based on experiences Improved ability to add new tools to measurement engine and integrate into extraction, analysis GridFTP, tsunami, UDPMon, pathload … Improved robustness, error diagnosis, management Need improved scheduling Want to look at other security mechanisms

More Information IEPM/PingER home site: IEPM-BW site Quick Iperf www-iepm.slac.stanford.edu/ IEPM-BW site www-iepm.slac.stanford.edu/bw Quick Iperf http://www-iepm.slac.stanford.edu/bw/iperf_res.html ABwE Submitted to PAM2003