Download presentation
Presentation is loading. Please wait.
Published byCamilla Reynolds Modified over 9 years ago
1
Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany
2
Outline Motivation Review of perfSONAR PerfSONAR issues New components to address them UNIS Periscope XSP NLMI E2E example with GridFTP Visualizations from SC10 demo Questions & rotten fruit 2/1/20111Internet2 Joint Techs 2011. Clemson, SC
3
Motivating Use-Cases Analyzing PBs of experimental data on an HPC cluster Offloading or disseminating PBs of simulation output Large data transfers source: http://xkcd.com/401/ 2/1/20112Internet2 Joint Techs 2011. Clemson, SC
4
PerfSONAR Overview Infrastructure & software for network performance analysis User or Application AbbrNamePurpose LSLookup serviceFind sources of measurements TopSTopology ServiceDescribe network topology MPMeasurement pointRetrieve/publish measurements MAMeasurement archive Store/publish measurements TSTransformation service Aggregate, sample, smooth measurements Discovery Data 2/1/20113Internet2 Joint Techs 2011. Clemson, SC
5
Motivating questions How can we accurately forecast application performance? How can we detect performance anomalies in real-time? How can we troubleshoot poor application performance? And improve it! ‘Shooting the gap between expectation and reality 2/1/20114Internet2 Joint Techs 2011. Clemson, SC
6
PerfSONAR issues ① Data is hard to find Cannot simply ask “which MPs have data for path” ② Slow Lookups across multiple domains Polling for data = RTT_net + Delay_DB + Delay_WS XML serialization/deserialization ③ E2E analysis is difficult No integrated host, application monitoring Analysis/visualization done client-side and not exported ④ Measurement frequency is static Always-on and lack of aggregation encourages large intervals 2/1/20115Internet2 Joint Techs 2011. Clemson, SC
7
① Data is hard to find 2/1/20116Internet2 Joint Techs 2011. Clemson, SC
8
Unified Network Information Service (UNIS) Merges TS & LS Topology model Tree of nodes at different layers (Network/Node/Port) Relations between arbitrary nodes Node properties ‘GIS for networks’ Relates MPs, MAs to topology 2/1/20117Internet2 Joint Techs 2011. Clemson, SC
9
② Slow 2/1/20118Internet2 Joint Techs 2011. Clemson, SC
10
Periscope: Topologically aware cache PerfSONAR requests have topological locality Pre-fetch and cache relevant perfSONAR information New protocols to indicate interesting sub-topologies Analysis functions domain-specific transformations, e.g. forecasting visualization (whee!) Preserve uniform perfSONAR interface User or Application perfSONAR interface PeriscopeMP/MALS... 2/1/20119Internet2 Joint Techs 2011. Clemson, SC
11
Periscope data representation Follow PerfSONAR data model But use a simpler, more efficient format Many good options: JSON ✔ BSON ✔ Thrift Avro Protobuf NetLogger 2/1/201110Internet2 Joint Techs 2011. Clemson, SC
12
③ E2E Analysis is Difficult 2/1/201111Internet2 Joint Techs 2011. Clemson, SC
13
Missing metrics OSI LayerperfSONAR metrics ApplicationX PresentationX SessionX Transportbandwidth, delay Networkcapacity, bandwidth, delay Data linkavailability, loss, errors Physicalavailability, errors E2E ComponentperfSONAR metrics DiskX Host / ClusterX Network“yes” Network layersEnd-to-end components 2/1/201112Internet2 Joint Techs 2011. Clemson, SC
14
NetLogger Machine Information (NLMI) Basic set of host probes, using /proc Host interface statistics TCP settings CPU, memory Disk I/O Export data in Periscope data model 2/1/201113Internet2 Joint Techs 2011. Clemson, SC
15
④ Measurement frequency is static 2/1/201114Internet2 Joint Techs 2011. Clemson, SC
16
eXtensible Session Protocol (XSP) EEstablishment, termination, and negotiation of a session between end-user application processes SSession = stateful layer over multiple other NE’s IIn-band or OOB signaling of control information OOther metadata can also be forwarded A A B B C C TCP xspd App Session NE Metadata 2/1/201115Internet2 Joint Techs 2011. Clemson, SC
17
Monitoring GridFTP GridFTP’s XIO allows interception of I/O New XIO layer can talk to a local xspd Signaling: open/close Performance: aggregated read/write NetLogger’s nlcalipers library aggregates reads/writes into periodic summaries XIO layer GridFTP server XIO layer Disk and Network operation xspd signaling performance 2/1/201116Internet2 Joint Techs 2011. Clemson, SC XIO/XSP
18
Combining XSP, Periscope, NLMI 2/1/201117Internet2 Joint Techs 2011. Clemson, SC xspd Signaling XIO performance XIO layer XSP layer XIO layer NLMI Host stats GridFTP server XIO layer XSP layer XIO layer NLMI GridFTP server... Periscope perfSONAR services Clients perfSONAR protocols
19
Visualization 2/1/201118Internet2 Joint Techs 2011. Clemson, SC
20
Visualization cont. 2/1/201119Internet2 Joint Techs 2011. Clemson, SC
21
Conclusions Periscope provides a platform for perfSONAR analysis Caching to reduce latency, centralized correlation Integration with XSP provides transparent monitoring and awareness of application state Still polling perfSONAR, though – Publish/Subscribe? D. Martin Swany Faculty, UD Ezra Kissel Grad student, UD Ahmed El-Hassany Grad student, UD Guilty parties 2/1/201120Internet2 Joint Techs 2011. Clemson, SC Guilherme Fernandes Grad student, UD
22
Questions 2/1/201121Internet2 Joint Techs 2011. Clemson, SC Contact: dkgunter@lbl.gov
23
Extra slides 2/1/201122Internet2 Joint Techs 2011. Clemson, SC
24
UNIS example topology id : esnet domain id : urn:ogf:network:domain=ps.es.net, node _id : urn:ogf:network:domain=ps.es.net:node=albu-cr1 name : albu-crl description : Juniper address type : hostname value : albu-crl location latitude: +35.08 longitude : -106.64 2/1/201123Internet2 Joint Techs 2011. Clemson, SC
25
UNIS Example, cont. 134.55.40.186 albucr1-sdn-a-albusdn1.es.net urn:ogf:network:domain=ps.es.net:node=albu-cr1:port=ge- 5/0/0 255.255.255.252 2/1/201124Internet2 Joint Techs 2011. Clemson, SC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.