What we have learned from developing and running ABwE Jiri Navratil, Les R.Cottrell (SLAC)

Slides:



Advertisements
Similar presentations
pathChirp Efficient Available Bandwidth Estimation
Advertisements

Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,
Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
Computer Networks Performance Metrics Computer Networks Term B10.
1 Enhanced EDF Scheduling Algorithms for Orchestrating Network-wide Active Measurements Prasad Calyam, Chang-Gun Lee Phani Kumar Arava, Dima Krymskiy OARnet,
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.
1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil.
PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.
Available bandwidth measurement as simple as running wget D. Antoniades, M. Athanatos, A. Papadogiannakis, P. Markatos Institute of Computer Science (ICS),
WBest: a Bandwidth Estimation Tool for IEEE Wireless Networks Presented by Feng Li Mingzhe Li, Mark Claypool, and.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
1 Active Probing for Available Bandwidth Estimation Sridhar Machiraju UC Berkeley OASIS Retreat, Jan 2005 Joint work with D.Veitch, F.Baccelli, A.Nucci,
Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.
AdHoc Probe: Path Capacity Probing in Wireless Ad Hoc Networks Ling-Jyh Chen, Tony Sun, Guang Yang, M.Y. Sanadidi, Mario Gerla Computer Science Department,
1 Chapter 10 Introduction to Metropolitan Area Networks and Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
Network Topologies.
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
1 ESnet Network Measurements ESCC Feb Joe Metzger
ESnet Abilene 3+3 Measurements Presented at the Joint Techs Meeting in Columbus July 19 th 2004 Joe Metzger ESnet Network Engineer
How do loss and delay occur?
Estimating Bandwidth of Mobile Users Sept 2003 Rohit Kapoor CSD, UCLA.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Comparison of Public End-to-End Bandwidth Estimation tools on High-Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
Comparison of Public End-to-End Bandwidth Estimation tools on High- Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
11 Experimental and Analytical Evaluation of Available Bandwidth Estimation Tools Cesar D. Guerrero and Miguel A. Labrador Department of Computer Science.
DoE SciDAC high-performance networking research project: INCITE INCITE.rice.edu 2004 Technical Challenges INCITE R. Baraniuk, E. Knightly, R. Nowak, R.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
IEPM-BW: Bandwidth Change Detection and Traceroute Analysis and Visualization Connie Logg, Joint Techs Workshop February 4-9, 2006.
Scavenger performance Cern External Network Division - Caltech Datagrid WP January, 2002.
1 Measurements of Internet performance for NIIT, Pakistan Jan – Feb 2004 PingER From Les Cottrell, SLAC For presentation by Prof. Arshad Ali, NIIT.
Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.
Iperf Quick Mode Ajay Tirumala & Les Cottrell. Sep 12, 2002 Iperf Quick Mode at LBL – Les Cottrell & Ajay Tirumala Iperf QUICK Mode Problem – Current.
Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,
1 SLAC IEPM PingER and BW monitoring & tools PingER Presented by Les Cottrell, SLAC At LBNL, Jan 21,
IEPM. Warren Matthews (SLAC) Presented at the ESCC Meeting Miami, FL, February 2003.
N. Hu (CMU)L. Li (Bell labs) Z. M. Mao. (U. Michigan) P. Steenkiste (CMU) J. Wang (AT&T) Infocom 2005 Presented By Mohammad Malli PhD student seminar Planete.
1 Capacity Dimensioning Based on Traffic Measurement in the Internet Kazumine Osaka University Shingo Ata (Osaka City Univ.)
PathChirp Spatio-Temporal Available Bandwidth Estimation Vinay Ribeiro Rolf Riedi, Richard Baraniuk Rice University.
A Bandwidth Estimation Method for IP Version 6 Networks Marshall Crocker Department of Electrical and Computer Engineering Mississippi State University.
1 IEPM/PingER Project Les Cottrell, SLAC DoE 2004 PI Network Research Meeting, FNAL Sep ‘04
ICFA/SCIC Aug '05 SLAC Site Report 1 SLAC Site Report Les Cottrell, Gary Buhrmaster, Richard Mount, SLAC For ICFA/SCIC meeting 8/22/05
DoE SciDAC high-performance networking research project: INCITE INCITE.rice.edu 2004 Technical Challenges INCITE R. Baraniuk, E. Knightly, R. Nowak, R.
End-to-end Bandwidth Estimation in the Wide Internet Daniele Croce PhD dissertation, April 16, 2010.
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Internet Measurement and Analysis Vinay Ribeiro Shriram Sarvotham Rolf Riedi Richard Baraniuk Rice University.
PathChirp Efficient Available Bandwidth Estimation Vinay Ribeiro Rice University Rolf Riedi Rich Baraniuk.
Igniting Internet Innovation
A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their original slides that accompany the.
Toward a Measurement Infrastructure. Warren Matthews (SLAC) Presented at the e2e Workshop Miami, FL, February 2003.
Bandwidth Estimation of a Network Path ET-4285 Measuring & Simulating the internet Bandwidth Estimation of a Network Path Group 4: S. Ngabonziza Rugemintwaza.
Rohit Kapoor, Ling-Jyh Chen, M. Y. Sanadidi, Mario Gerla
Connie Logg, Joint Techs Workshop February 4-9, 2006
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
Wide Area Networking at SLAC, Feb ‘03
ABwE: Available Bandwidth Estimator Jiri Navratil R. Les
My Experiences, results and remarks to TCP BW and CT Measurements Tools Jiří Navrátil SLAC.
Experiences in Traceroute and Available Bandwidth Change Analysis
Pong: Diagnosing Spatio-Temporal Internet Congestion Properties
Experiences in Traceroute and Available Bandwidth Change Analysis
SLAC monitoring Web Services
IEPM. Warren Matthews (SLAC)
TCP Congestion Control
Wide-Area Networking at SLAC
Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.
PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.
pathChirp Efficient Available Bandwidth Estimation
pathChirp Efficient Available Bandwidth Estimation
Presentation transcript:

What we have learned from developing and running ABwE Jiri Navratil, Les R.Cottrell (SLAC)

Why E2E tools are needed The scientific community is increasingly dependent on networking as international cooperation grows. HEP users (needs transfer huge amount of data between experimental sites as SLAC, FNAL, CERN, etc. (where data is created) and home institutes spread over the world) What ISPs (as Abilene,Esnet,Geant..) can offer to the users for getting information? (Not too much because they are only in the middle of the path and they don’t cover all parts of connections)

FZU LAN FZU LAN RAL LAN RAL LAN DL LAN DL LAN CESNET JANET IN2P3 LAN IN2P3 LAN CERN LAN CERN LAN RENATER INFN FNAL-LAN GEANT ABILENE ESNET SLAC LAN SLAC LAN MichNET NERSC- LAN CALREN Data sources Users MIB LAN MIB LAN

FZU LAN FZU LAN RAL LAN RAL LAN DL LAN DL LAN CESNET JANET IN2P3 LAN IN2P3 LAN CERN LAN CERN LAN RENATER INFN FNAL-LAN GEANT ABILENE ESNET SLAC LAN SLAC LAN MichNET NERSC- LAN CALREN Data sources Users MIB LAN MIB LAN

There must be always somebody who gives complex information to the users of the community or the users have to have a tool which give them such information How fast I can transfer 20 GB from my experimental site (SLAC,CERN) to my home institute? Can I run graphical 3D visualization program with data located 1000 miles away? How stable is line ? (Can I use it in the same conditions for 5 minutes or 2 hours or whole day ?) All such questions must be replied in few seconds doesn’t matter if for individual user or for Grid brokers Global science has no day and night. To reply this we needed the tools that could be used in continuous mode 24 hours a day 7 days a week which can non intrusively detect changes on multiple path or on demand by any user

ABwE:Basic terminology: Generally: Available bandwidth = Capacity – Load ABwE measure Td – Time dispersion P1-P2 (20x PP) We are trying to distinguish two basic states in our results: - “Dominate (free)” – when Td ~= const -“loaded” with Td = other value Td results from “Dominate” state are used to estimate DBC - Dynamic Bottleneck Capacity Td measured during the “loaded” state is used to estimate the level of XTR (cross traffic) ABw = DBC – XTR f Td

Dbc= L pp /T d domin ”Dominating state” (when sustained load or no load) u = q/(q+1) CT=u*Dbc Abw= Dbc -CT Abing: Estimation principles: Td Tp (pairs) q = Tx/Tn (Tx=Td –Tp) Tx – busy time (transmit time for cross trafic) Tn – transmit time for average packet q – relative queue increment (QDF) during decision interval Td (h-1) TnTn Tx (cross traffic) Td domin Td i = Td i+1 =.. Td i+n “Load state” (when load is changing) Td Examples Td from different paths f f Td

What is DBC DBC characterize instant high capacity bottleneck that DOMINATE on the path It covers situations when routers in the path are overloaded and sending packets back to back with its maximal rates We discovered that in most cases only one node dominates in the instant of our measurements (in our decision interval)

load Empty pipes No impact (in t1) Light source Light beam DBC No impact (in t1) ABwE: Example of narrow link in the path ABW link that has domination effect on bandwidth DBC (Pipes analogy with different diameter and aperture) Abw = DBC – XTR ABW monitor SLAC to UFL

load Empty links (pipes) No impact (in t1) strong XTraffic -> Impact (in t1) Light source Light beam DBC Example of heavy loaded link in the path (Pipes analogy with different diameter and aperture) Heavy load (strong cross traffic) appeared in the path It shows new DBC in the path because this load dominates in whole path ! Normal situation DBC~ 400 Mbits/s Available bandwidthAbilene MRTG graph ATLA to UFL Abw = DBC – XTR ABW monitor SLAC to UFL strong XTR (cross traffic)

Heavy load (xtraffic) appeared in the path (defined new DBC in the path) Normal situation ABwE / MRTG match: TCP test to UFL IPLS shows traffic Mbits/s CALREN shows sending traffic 600 Mbits/s UFL

Confront ABwE results with other tools Iperf,Pathload,Pathchirp

Probe Sender XT gen. Probe Receiver XT rec. DataTag SLAC 1 rtr-gsr-test ms ms ms 2 rtr-dmz1-ger ms ms ms 3 slac-rt4.es.net ms ms ms 4 snv-pos-slac.es.net ms ms ms 5 chicr1-oc192-snvcr1.es.net ms ms ms 6 chirt1-ge0-chicr1.es.net ms ms ms 7 chi-esnet.abilene.iu.edu ms ms ms 8 r04chi-v-187.caltech.datatag.org ms ms ms ES.net path (622 Mbits/s) Chicago, Il Menlo Park, Ca To CERN (Ch) Probing packets Injected Cross traffic Experimental path ES.net NIC-1000Mbps User traffic User traffic (background) SLAC-DataTAG-CERN test environment (4 workstations with NIC1000Mbis/s + OC-12 ES.net path) GbE 2.5 Gbits/s ES.net User traffic

Zoom Level of background traffic Injected CT (cross traffic by Iperf) Measured xt ( cross-traffic) DBC (OC-12 ) The match of the cross traffic (ABW – XT compare to injection traffic generated by Iperf) Available bandwidth Conlusion: Iperf measure own performance which can approach DBC (in best case)

What we learned from CAIDA testbed

CT1 CT3 Packet Length ~ MTU 1. Packet Pair 2. Packet Pair 25 ms Internet HOP/HOPS vers. Testbed CT2 TBed CT I-HOP TBED PP Internet cross traffic Simul. cross traffic PP Initial decision interval Decision interval (12  s for Oc12) Cross traffic sources Probes I n t e r n e t P a t h Decision interval is changing (growing) If CT < 30% abw had detection problem !.. 20 x cause a dispersionRelevant packets Not relevant packets

CT Packet Length ~ MTU 1. Packet Pair2. Packet Pair 25 ms How to improve “detection effectiveness” cause a dispersion Solution LP Solution LP – Long packets (9k) (creates micro-bottlenecks) Solution nP – n dummy Packets (mini-train) Solution nP New initial decision interval Relevant packets decision interval.. 20 x x Measurement time 0.5 s to 2.5 s Solution X

S2 (PP-Packet Pair) S10 (Mini-train with 8 dummy packets) PP versus TRAIN: ABW and DBC merge in TRAIN samples (SLAC-CALTECH path)

s2s3s4 s5 s7 s10 PP versus TRAIN: ABW and DBC merge in TRAIN samples (SLAC-CALTECH path)

Compare long term Bandwidth statistics on real paths ESNET, Abilene, Europe

SLAC - Rice.edu SLAC - Man.ac.uk SLAC - Mib.infn.it SLAC - ANL.gov IEPM-Iperf vers. ABW (24 hours match) IEPM (achievable throughput via Iperf) (red bars) IEPM (achievable throughput via Iperf) (red bars) ABW: Available bandwidth (blue lines)

Scatter plot graphs Achievable throughput via Iperf versus ABw on different paths (range 20–800 Mbits/s) (28 days history)

ABw data Iperf data 28 days bandwidth history During this time we can see several different situations caused by different routing from SLAC to CALTECH to 100 Mbits/s by error Drop to 622 Mbits/s path back to new CENIC path New CENIC path 1000 Mbits/s In all cases the match of results from Iperf and ABw is evident

What we can detect with continues bandwidth monitoring Immediate bandwidth on the path Automatic routing changes when line is broken (move to backup lines) Unexpected Network changes (Routing changes between networks, etc.) Line updates (155 -> 1Giga, etc.) Extreme heavy load

Via Abilene Original path via CALREN/CENIC (Example from SLAC – CENIC path) Problematic link discovered Bandwidth problem discovered (14:00) BW problem resolved (17:00) Routing back on standard path Results of traceroute analysis Standard routing via CALREN/CENIC Available bandwidth Send alarm ABw as Troubleshooting tool ( Discovering Routing problems and initiate alarming ) DBC User traffic

SLAC – CENIC path upgrade from 1 to 10 Gigabit (Current monitoring machines allow monitor traffic in range 1 < 1000 Mbits only) To backup Router (degrading line for while) Skip to new 10GBits/s link (our monitor is on 1GbE)

Upgrade 155Mbits/s line to 1000Mbits/s at dl.uk

via Abilenevia ESNET SLAC changed routing to CESNET

Situation when the cross-traffic extreamly grows, BW decreased SNVA-STTL (line broken) STTL-DNVR DNVR-STTL Abilene – automatic rerouting – June 11,2003 Sending traffic from south branch receiving

Transatlantic line to CERN (green=input) SLAC-ESNET (red output) Seen at Chicago Seen at SLAC Seen at CERN User traffic (bbftp to IN2p3.fr) Additional traffic Iperf Seen by ABW at CERN Fig.12 Typical SLAC traffic (long data transfer when physical experiment ends) MRTG shows only the traffic which pass to IN2p3.fr Additional trafficIperf to Chicago seen also at CERN (common path)

Interactive ( reply < 1 second) Very low impact on the network traffic (40 packets to get value for destination) Simple and robust (responder can be installed on any machine on the network) Keyword function for protecting the client- server communication Measurements in both directions Same resolution as other similar methods Abing new ABwE tool

Thank you References: