1 IEPM-BWIEPM-BW Warren Matthews (SLAC) Presented at the UCL Monitoring Infrastructure Workshop, London, May 15-16, 2003.

Slides:

Advertisements

Similar presentations

Web100 at SLAC Presented at the Web100 Workshop, Boulder, CO, August 2002.

Advertisements

1 High Performance Active End-to- end Network Monitoring Les Cottrell, Connie Logg, Warren Matthews, Jiri Navratil, Ajay Tirumala – SLAC Prepared for the.

MAGGIE Monitoring and Analysis for the Global Grid and Internet End-to-end performance Warren Matthews (SLAC) Presented at the Measurement SIG ESCC/Internet2.

PIPES Web Service Eric Boyd (Internet2) and Warren Matthews (Georgia Tech)

1 End-to-end Monitoring of High Performance Network Paths Les Cottrell, Connie Logg, Jerrod Williams SLAC, for the ESCC meeting, Columbus Ohio, July 2004.

1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.

1 Internet End-to-end Monitoring Project at SLAC Les Cottrell, Connie Logg, Jerrod Williams, Gary Buhrmaster Site visit to SLAC by DoE program managers.

1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil.

1 SLAC Internet Measurement Data Les Cottrell, Jerrod Williams, Connie Logg, Paola Grosso SLAC, for the ISMA Workshop, SDSC June,

MAGGIE NIIT- SLAC On Going Projects Measurement & Analysis of Global Grid & Internet End to end performance.

PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.

GGF-BerlinSLAC Web Services SLAC Web Service Paola Grosso (SLAC) Presented by Eric Boyd (Internet2)

1 Terapaths: Datagrid WAN Network Monitoring Infrastructure Les Cottrell, Connie Logg, Jerrod Williams SLAC, for the DoE 2004 PI Network Research Meeting,

1 IEPM-BW a new network/application throughput performance measurement infrastructure Les Cottrell – SLAC Presented at the GGF4 meeting, Toronto Feb 20-21,

Measurement and Fault-Finding Using MAGGIE and PIPES. Presented at the HENP SIG Internet2 Members Meeting, Indianapolis, October Paola Grosso (SLAC)

Network Monitoring grid network performance measurement, simulation & analysis Presented by Warren Matthews at the Performance.

HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.

What we have learned from developing and running ABwE Jiri Navratil, Les R.Cottrell (SLAC)

User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.

1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.

1 End-to-end Monitoring of High Performance Network Paths Les Cottrell, Connie Logg, Jerrod Williams, Jiri Navratil, SLAC, for the ESCC meeting, Columbus.

LAN and WAN Monitoring at SLAC Connie Logg September 21, 2005.

The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.

1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February

IEPM-BW Deployment Experiences Connie Logg SLAC Joint Techs Workshop February 4-9, 2006.

DoE SciDAC high-performance networking research project: INCITE INCITE.rice.edu 2004 Technical Challenges INCITE R. Baraniuk, E. Knightly, R. Nowak, R.

DataGrid Wide Area Network Monitoring Infrastructure (DWMI) Connie Logg February 13-17, 2005.

Measurement & Analysis of Global Grid & Internet End to end performance (MAGGIE) Network Performance Measurement.

1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.

1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.

1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet.

1 SLAC IEPM PingER and BW monitoring & tools PingER Presented by Les Cottrell, SLAC At LBNL, Jan 21,

IEPM. Warren Matthews (SLAC) Presented at the ESCC Meeting Miami, FL, February 2003.

13-Oct-2003 Internet2 End-to-End Performance Initiative: piPEs Eric Boyd, Matt Zekauskas, Internet2 International.

Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.

Measurement in the Internet Measurement in the Internet Paul Barford University of Wisconsin - Madison Spring, 2001.

1 IEPM/PingER Project Les Cottrell, SLAC DoE 2004 PI Network Research Meeting, FNAL Sep ‘04

DoE SciDAC high-performance networking research project: INCITE INCITE.rice.edu 2004 Technical Challenges INCITE R. Baraniuk, E. Knightly, R. Nowak, R.

1 MAGGIE Monitoring and Analysis for the Global Grid and Internet End-to-end performance Warren Matthews Stanford Linear Accelerator Center (SLAC)

Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.

1 WAN Monitoring Prepared by Les Cottrell, SLAC, for the Joint Engineering Taskforce Roadmap Workshop JLab April 13-15,

1 PingER6 Preliminary PingER Monitoring Results from the 6Bone/6REN. Warren Matthews Les Cottrell.

1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.

1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.

Toward a Measurement Infrastructure. Warren Matthews (SLAC) Presented at the e2e Workshop Miami, FL, February 2003.

IEPM-BW (or PingER on steroids) and the PPDG

Measurements on Internet2

Milestones/Dates/Status Impact and Connections

Warren Matthews and Les Cottrell (SLAC)

Using Netflow data for forecasting

Wide Area Networking at SLAC, Feb ‘03

High Performance Active End-to-end Network Monitoring

Connie Logg February 13 and 17, 2005

Experiences in Traceroute and Available Bandwidth Change Analysis

Jeff Boote, Eric L. Boyd, Rich Carlson, Hyungseok Chung

Experiences in Traceroute and Available Bandwidth Change Analysis

E2E piPEs Overview Eric L. Boyd Internet2 24 February 2019.

SLAC monitoring Web Services

Advanced Networking Collaborations at SLAC

IEPM. Warren Matthews (SLAC)

Wide-Area Networking at SLAC

Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.

PIPE Dreams Trouble Shooting Network Performance for Production Science Data Grids Presented by Warren Matthews at CHEP’03, San Diego March 24-28, 2003.

Interoperable Measurement Frameworks: Internet2 E2E piPEs and NLANR Advisor Eric L. Boyd Internet2 17 April 2019.

An Implementation of the Profile Document.

Internet2 E2E piPEs Project

An Implementation of the Profile Document.

Net Rat Network Reliability and Troubleshooting.

Quantifying the Global Digital Divide

Warren Matthews (SLAC) Presented at the PIPEfitters Breakfast,

Presentation transcript:

1 IEPM-BWIEPM-BW Warren Matthews (SLAC) Presented at the UCL Monitoring Infrastructure Workshop, London, May 15-16, 2003.

2 Overview / Goals IEPM-BW monitoring and results Other measurements Publishing Troubleshooting Tools Further work

3 IEPM-BWIEPM-BW SLAC package for monitoring and analysis Currently 10 monitoring sites SLAC, FNAL, GATech (SOX), INFN (Milan), NIKHEF, APAN (Japan) Manchester, UMich, UCL, Internet targets

4 SNV SLAC CHI ESnet NY Stanford CalREN NERSC LANL JLAB TRIUMF KEK Abilene SLAC SNV FNAL ANL NIKHEF CERN IN2P3 CERN CALTECH SDSC BNL JAnet HSTN SEA ATL CLV IPLS RAL UCL UManc DL NNW NY Rice UTDallas NCSA UMich I2 SOX UFL APAN RIKEN INFN-Roma INFN-Milan CESnet APAN Geant EDG PPDG/GriPhyN Monitoring Site ORNL

5 Measurement Engine Ping, Traceroute Iperf, Bbftp, Bbcp (mem and disk) Abwe Gridftp, UDPmon Web100 Passive (netflow)

6

7 Other Projects (U.S.) PingER (SLAC, FNAL) eJDS (SLAC, ICTP) AMP (NLANR) NIMI (ICIR, PSC) MAGGIE (ICIR, PSC, SLAC, LBL, ANL) NASA, SCNM (LBL) Surveyor (Internet2) E2e PI and PIPES (Internet2) Also SLAC has a RIPE-TT box

8 PublishingPublishing Web Service —SOAP::Lite perl module —Python —Java NMWG OGSA

9 PublishingPublishing NMWG Properties document Path.delay.roundtrip (Demo)Demo Hop.bandwidth.capacity (tracespeed)tracespeed Guthrie (demo)demo Almost 1000 nodes in database PingER Networks Arena

10 AdvisorAdvisor Screenshot taken from the talk by Jim Ferguson at the e2e workshop, Miami Feb 2003.

11 MonaLisaMonaLisa Front-end visualization Vital component for development of the LHC Computing Model JINI/JAVA and WSDL/SOAP demo

12 TroubleshootingTroubleshooting RIPE-TT Testbox Alarm AMP Automatic Event Detection Our approach is diurnal changes

13 Diurnal Changes (1/4) Either Performance varies during the day Or it doesn’t No variation is the special case of variation=0

14 Diurnal Changes (2/4) Either performance (within the bin) is variable Or it isn’t No variation is the special case of variation=0

15 Diurnal Changes (3/4) Parameterize performance in terms of hour and variability within that hourly bin Measurements can be classified in terms of how they differ from historical value Recent problems are flagged due to difference from historical value Compare to measurement in previous bin to reduce false-positives

16 Diurnal Changes (4/4) Calculate Median and standard deviation of last five measurement in bin – e.g. Monday 7pm-8pm “Concerned” if latest measurement is more than 1 s.d. from median “Alarmed” if latest measurement is more than 2 s.d. from median

17 Trouble Detection $ tail maggie.log 04/28/ :58:47 (1:14) gnt Alarm (AThresh=38.33) 04/28/ :25:45 (1:16) gnt Concern (CThresh=87.08) 04/28/ :55:21 (1:17) gnt Within boundaries Date and TimeBinNode Throughput (iperf) Status Only write to the log if an alarm is triggered Keep writing to the log until alarm is cleared

18 Trouble Status Tempted to make color-coded web page All the hard work still left to do Use knowledge to see common point of failure Production table would be >> 36x700 Instead figure out where to flag

19 Net Rat Alarm System – Multiple tools – Multiple measurement points Cross reference – Trigger further measurements – Starting point for human intervention – Informant database hop.performance No measurement is ‘authoritative’ – Cannot even believe a measurement

20 LimitationsLimitations Could be over an hour before alarm is generated More frequent measurements impact the network and measurements overlap Low impact tools allow finer grained measurement

21 Where next ? GLUE, OGSA, CIM Work with Other Projects Publishing and troubleshooting Discovery Security

22 Toward a Monitoring Infrastructure Certainly the need – DOE Science Community – Japanese Earth Simulator – Grid – Troubleshooting / E2Epi Many of the ingredients – Many monitoring projects – PIPES – MAGGIE

23 Summary “It is widely believed that a ubiquitous monitoring infrastructure is required”.

24 LinksLinks This talk IEPM-BW PingER ABwE AMP NIMI MAGGIE RIPE-TT Surveyor E2E PI SLAC Web Services GGF NMWG Arena Monalisa Advisor TroubleShooting

25 CreditsCredits Les Cottrell Connie Logg, Jerrod Williams Jiri Navratil Fabrizio Coccetti Brian Tierney Frank Nagy, Maxim Grigoriev Eric Boyd, Jeff Boote Vern Paxson, Andy Adams Iosif Legrand Jim Ferguson, Steve Englehart Local admins and other volunteers DoE/MICS

26 DemosDemos This is the output from the “Publishing” Demo on slide 9.slide 9 $ more soap_client.pl #!/usr/local/bin/perl use SOAP::Lite; print SOAP::Lite -> service(' -> hopBandwidthCapacity("brdr.slac.stanford.edu:i2-gateway.stanford.edu"); $./soap_client.pl 1000Mb

27 DemosDemos This is the output from the “tracespeed” demo on slide 9.slide 9 $./tracespeed thunderbird.internet2.edu 0 doris 10Mb 1 core ( ) 1000Mb 2 brdr ( ) 1000Mb 3 i2-gateway.stanford.edu ( ) No Data. 4 stan.pos.calren2.net ( ) No Data. 5 sunv--stan.pos.calren2.net ( ) No Data. 6 abilene--qsv.pos.calren2.net ( ) No Data. 7 kscyng-snvang.abilene.ucaid.edu ( ) No Data. 8 iplsng-kscyng.abilene.ucaid.edu ( ) No Data. 9 so-0-2-0x1.aa1.mich.net ( ) No Data. 10 so-0-0-0x0.ucaid2.mich.net ( ) No Data. 11 thunderbird.internet2.edu ( ) No Data.

28 Aside: NetRat (1/5) If last measurement was Within 1sd Mark each hop as Good Hop.performance = good If last measurement was “Concern” Mark each hop as acceptable If last measurement was an “Alarm” Mark Each hop as poor

29 Aside: NetRat (2/5) Measurement generates an alarm Set each hop.performance = poor

30 Aside: NetRat (3/5) Other measurements from same site do not generate alarms. Set each hop.performance = good Immediately ruled out problem in local LAN or host machine

31 Aside: NetRat (4/5) Different site monitors same target No alarm is generated Set each hop.performance = good Pinpointed possible problem in intermediate network.