Download presentation
Presentation is loading. Please wait.
Published byKerry Norman Modified over 9 years ago
1
2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP
2
2006 © SWITCH 2 End-to-end Performance Issues Performance seen by end users hasn't followed backbone upgrades “Wizard gap” (ordinary users vs. land speed record heroes) Issues solving multi-domain performance problems Issues solving multi-layer performance problems Lack of performance-oriented network monitoring -> The “ends” must be included in network performance work! endpoints, i.e. hosts, operating systems, applications (users even) campus networks and their administrators
3
2006 © SWITCH 3 Various efforts to improve e2e performance Internet2 “e2epi” (end-to-end performance initiative) – Performance workshops – Web100 kernel instrumentation and other TCP enhancements for Linux enable end-user tools such as NDT (e.g. ndt.switch.ch)ndt.switch.ch auto-tuning for TCP buffers experimental TCP variants (Vegas, Westwood, HS-TCP, BIC, S-TCP, H-TCP...) GN2 – PERT (Performance Enhancement and Response Team) “like a CERT but for performance” chartered to “own” performance issues (no fingerpointing) collect knowledge, produce documentation (to make itself obsolete) – Premium IP and other backbone-specific enhancements
4
2006 © SWITCH 4 Bandwidth is not everything Most transfers over the Internet (including the GTREN) limited by RTT – TCP window-size limitations for “LFNs” (Long Fat Networks) – short flows – delay-sensitive applications (conversational A/V, RPC, games...) -> what works well in the LAN won't always do so over the WAN – help users tune TCP (Web100/NDT very useful here) – provide assistance with application design and engineering alternatives to TCP etc. RTT harder to improve than bandwidth – speed-of-light issue (btw. router hop-count quickly becoming irrelevant) – some inter-continental connections more useful than others e.g. TEIN link through Siberia reduces EU-China RTT by half Other important performance indicators: availability, predictibility... -> using capacity as prime “connectivity” metric no longer justified.
5
2006 © SWITCH 5 Example from right here (how NOT to do it) My traceroute [v0.71] agathe (0.0.0.0) Wed May 24 10:24:32 2006 Keys: Help Display mode Restart statistics Order of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. 10.129.21.252 0.0% 377 5.1 8.3 2.5 181.5 15.5 2. 10.64.1.8 1.3% 377 531.6 507.7 125.1 992.6 152.5 3. 172.28.95.109 2.1% 377 544.3 506.3 98.1 1003. 157.6 4. 172.28.74.22 1.6% 377 499.9 509.9 123.5 1204. 162.7 5. 172.28.76.19 1.6% 377 479.8 512.4 117.8 1155. 160.2 6. 172.28.76.33 2.7% 377 475.0 513.0 110.3 1134. 159.7 7. 172.28.75.17 2.7% 377 421.9 515.9 135.5 1102. 158.2 8. 172.28.87.4 2.9% 376 424.8 517.4 119.1 1067. 154.8 9. 172.28.218.241 2.1% 376 583.6 522.1 113.3 1096. 159.4 10. 193.158.5.13 2.9% 376 536.9 513.6 107.3 919.3 156.1 11. zrh-e4.ZRH.CH.net.DTAG.DE 3.7% 376 556.2 526.1 106.6 1027. 154.3 12. swiix1-g2-1.switch.ch 2.9% 376 511.2 534.6 120.0 1087. 158.8 13. 130.59.36.249 2.9% 376 533.0 529.7 139.7 1053. 152.1 14. swiCS3-10GE-1-1.switch.ch 2.7% 376 527.4 525.6 111.8 1052. 148.1 15. swiNM1-G1-0-25.switch.ch 1.6% 376 529.3 528.9 125.7 1090. 150.4 16. swiLM1-V610.switch.ch 2.4% 375 510.2 526.9 136.2 1037. 153.8 17. diotima.switch.ch 1.9% 375 575.2 526.9 149.9 959.0 152.4
6
2006 © SWITCH 6
7
7 GN2 PERT Part of SA3 (Service Activity – End-to-end Quality of Service) – also called PACE - “Performance and Allocated Capacity for End-Users” PERT Case Managers mostly from several NRENs – duty CMs, rotating weekly (with videoconference briefings) – dedicated CMs for some cases – reachable through PTS (PERT Ticket System) or pert-report@geant2.netPTSpert-report@geant2.net Subject Matter Experts (SMEs) participation – issues of “recruiting” and involvement (on demand vs. interest-based) PERT Knowledge Base (KB) – currently Wiki-based - http://kb.pert.switch.ch/http://kb.pert.switch.ch/ – “Performance Guides” published as deliverables
8
2006 © SWITCH 8 GN2 PERT Ticket System (PTS)
9
2006 © SWITCH 9 PERT Knowledge Base (KB)
10
2006 © SWITCH 10 GN2 PERT Cases (closed) DEISA TCP Throughput Reduction – solved – due to GEANT packet reordering with heavy cross-traffic will partly go away with GEANT2 (some of the routers are upgraded) DEISA-Teragrid Performance (TCP throughput) – closed, but not solved in due time (until demo was over) DEISA TCP Throughput issues with some sites – found RTT dependency, GEANT->GEANT2 changes explain variations Loss of large packets on one of the e-VLBI (-> JIVE) paths – resolved by configuration
11
2006 © SWITCH 11 GN2 PERT cases (ongoing) ITER VPN – information-gathering phase – VPN makes traditional diagnostics hard e-VLBI – ongoing investigation – infrequent tests and network changes over time EU->US routing through Japan – ongoing, but maybe not really a case for PERT? or, should we have all (GTREN) BGP geeks participate as SMEs?
12
2006 © SWITCH 12 GN2 PERT Experience Weaknesses – Few, and often difficult (but interesting!) cases Mostly large groups: DEISA, e-VLBI (JIVE), DESY/FNAL, ITER... Trying to open up to larger customer base – It's hard to close cases! lack of clear success indicators – Friction can be further reduced weekly Case Manager handover, PTS, SME involvements Strengths – Brings users (researchers) closer to NOCs – Mutual learning experience Bodes well for PERT Knowledge BasePERT Knowledge Base Provides vital input on measurement infrastructure requirements – Inspires PERT activities in NRENs
13
2006 © SWITCH 13 SWITCH PERT Example: Opera oberta Opera oberta – high-quality multicast transmissions of opera from Barcelona and Madrid – mostly Spanish participants, but a few in FR, MX, and now CH – currently 9 Mb/s DVB+D5.1, experimenting with HDTV (~15 Mb/s) Customer (EPFL) contacted us – early tests were unsatisfactory (due to problems at source, it turns out) – set up NOC support (awareness, test participation, monitoring) – one transmission still failed (due to misconfigured SWITCH router) – fixed problem, improved NOC support (out-of-hours service) – next transmission (last night) a success – it had to be... -> include aspects of availability and support in “performance” notion
14
2006 © SWITCH 14 Conclusions significant potential for service improvements on current infrastructure – end-host tuning, delay-robust protocols, better NOC cooperation PERT concept really helps – improves customers' “reach” into backbones – “user interface” can still be improved Leverage new developments in the future – backbone measurement instrumentation, e.g. GN2 JRA1 PerfSONAR – Premium IP and other “on-demand” services Long-term benefits – smart users + dumb networks -> unexpected performance and innovation The end-to-end principles are honoured!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.