Download presentation
Presentation is loading. Please wait.
Published byLeo Ross Modified over 9 years ago
1
Scalable Integrated Performance Analaysis of Multi-Gigabit Networks Ezra Kissel, U. Delaware Ahmed El-Hassany, Guilherme Fernandes, Martin Swany, Indiana U. Dan Gunter, Taghrid Samak, LBNL Jen Schopf, WHOI
2
What I hope you learn 1. Why we care about bulk data transfer at multi-gigabit rates 2. Why and how detailed monitoring is helpful 3. How dynamic control of monitoring is related to Session Layer protocols 4/16/12 1
3
Bulk data transfer needs Some domains of interest: –Climate simulation (Earth System Grid) –Genomics (JGI) –High-energy physics (Large Hadron Collider) –Astronomy (Large Synoptic Survey Telescope) –Astrophysics (FLASH) Huge data Analysis sites 4/16/12 2
4
Multi-gigabit rates Networks connecting national labs and universities have 10Gb/s and soon 100Gb/s capability. one PB = one day at 100Gb/s Rarely achieved due to bottlenecks: –Host: Application or Disks –Campus/local networks –Wide area networks Hard to tell why, where, or even if there is a problem 4/16/12 3
5
Solution Monitor all the time Analyze all the time.. but much more when something interesting is happening Use analysis results as feedback 4/16/12 4
6
System components eXtensible Session Protocol (XSP) –Associate multiple TCP connections, L2 circuits, as a "session" –Provide channels for bi-directional metadata NL-Calipers –Summarize in situ timings of every read/write BLiPP –Host and TCP stack info. using XSP channels PerfSONAR –Standard information formats and exchange protocols 4/16/12 5
7
Dynamic Session Monitoring User (1) Start xfer (2) Open session 3) data (3) NL- calipers data (4) Signal TCP (5) data Look at the performance Network engineer 4/16/12 6
8
Bottleneck detection 4/16/12 7 Triangles give "instantaneous" throughput On fixed intervals, summarize all measurements into mean, min, max, variance for both rate and #bytes Instrumentation Analysis: pick lowest mean value as bottleneck, apply t-test
9
TCP throughput Time series of throughput* for representative TCP experiments: (a) 1 stream memory-to-disk with 100ms latency, (b) 1 stream memory-to-memory with no latency, (c) 1 stream disk-to-disk with no latency, (d) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds. 4/16/12 8
10
UDT throughput Time series of throughput* for representative UDT experiments: (a) 4 streams memory-to-disk with 100ms latency, (b) 4 streams memory-to-disk with 100ms latency and 1% loss added at 60 seconds, (c) 4 streams disk-to-disk with 100ms latency, (d) 4 streams memory-to-memory with 100ms latency. 4/16/12 9
11
Wait, what? 4/16/12 10
12
Half as many read()s. Others return zero, not counted Variance Less work being done 4/16/12 11
13
Review Why we care about bulk data transfer at multi-gigabit rates Why and how detailed monitoring is helpful How monitoring is related to Session Layer protocols –and how that might integrate with a management framework Questions? 4/16/12 12
14
Related projects NetLogger netlogger.lbl.gov perfSONAR perfsonar.org XSPdamsl.cis.udel.edu/ GENIgeni.net CEDPScedps-scidac.org 4/16/12 13
15
Topology-aware Monitoring 4/16/12 14
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.