Download presentation
Presentation is loading. Please wait.
1
1 Evaluating NGI performance Matt Mathis Mathis@psc.edu
2
2 Evaluating NGI Performance How well is the NGI being used? Where can we do better?
3
3 Outline Why is this such a hard problem? –Architectural reasons –Scale A systematic approach
4
4 TCP/IP Layering The good news: –TCP/IP hides the details of the network from users and applications –This is largely responsible for the explosive growth of the Internet
5
5 TCP/IP Layering The bad news: –All bugs and inefficiencies are hidden from users, applications and network administrators –The only legal symptoms for any problem anywhere are connection failures or less than expected performance
6
6 Six performance problems IP Path –Packet routing, round trip time –Packet reordering –Packet losses, Congestion, Lame HW Host or end-system –MSS negotiation, MTU discovery –TCP sender or receiver buffer space –Inefficient applications
7
7 Layering obscures problems Consider: trying to fix the weakest link of an invisible chain Typical users, system and network administrators routinely fail to “tune” their own systems In the future, WEB100 will help…
8
8 NGI Measurement Challenges The NGI is so large and complex that you can not observe all of it directly. We want to assess both network and end- system problems –The problems mask each other –The users & admins can’t even diagnose their own problems
9
9 The Strategy Decouple paths from end-systems –Test some paths using well understood end- systems –Collect packet traces and algorithmically characterize performance problems
10
10 TCP bulk transport (path limitation): Sender or receiver TCP buffer space: Application, CPU or other I/O limit Performance is minimum of:
11
11 Packet trace instrumentation Independent measures of model: –Data rate, MSS, RTT and p –Measure independent distributions for each Detect end system limitations –Whenever the model does not fit
12
12 The Experiments Actively test a (small) collection of paths with carefully tuned systems Passively trace and diagnose all traffic at a small number of points to observe large collections of paths and end systems. [Wanted] Passively observe flow statistics for many NGI paths to take a complete census of all end systems capable of high data rates.
13
13 Active Path Testing Use uniform test systems –Mostly Hans Werner Braun’s AMP systems –Well tuned systems and application –Known TCP properties Star topology from PSC for initial tests –Evolve to multi star and sparse mesh Use passive instrumentation
14
14 Typical (Active) Data 83 paths measured For the moment assume: –All host problems have been eliminated –All bottlenecks are due to the path Use traces to measure path properties –Rate, MSS, and RTT –Estimate window sizes and loss interval Sample has target selection bias
15
15 Data Rate
16
16 Data Rate Observations Only one path performed well –(74 Mbit/s) About 15% of the paths beat 100MB/30s –(27 Mbit/s) About half of the paths were below old Ethernet rates –(10 Mbit/s)
17
17 Round Trip Times
18
18 RTT Observations About 25% of the RTTs are too high (PSC to San Diego is ~70 ms) –Many reflect routing problems –At least a few are queuing (traffic) related
19
19 Loss Interval (1/p)
20
20 Loss Interval Observations Only a few paths do very well –Some low-loss paths have high delay Only paths with fewer losses than 10 per million are ok Finding packet losses at this level can be difficult
21
21 Passive trace diagnose Trace Analysis and Automatic Diagnosis (TAAD) Passively observe user traffic to measure the network These are very early results
22
22 Example Passive Data Traffic is through the Pittsburgh GigaPoP Collected with MCI/NLANR/CAIDA OC3-mon and coralreef software This data set is mostly commodity traffic Future data sets will be self weighted NGI samples
23
23 Observed and Predicted Window Window can be observed by looking at TCP retransmissions Window can be predicted from the observed interval between losses If they agree the flow is path limited –The bulk performance model fits the data If they don’t, the flow is end system limited –Observed window is probably due to buffer limits but may be due to other bottlenecks
24
24
25
25 Window Sizes
26
26 Window Sizes
27
27 Observations 60% of the commodity flows are path limited with window sizes smaller than 5kBytes Huge discontinuity at 8kBytes reflects common default buffer limits About 15% of the flows are affected by this limit
28
28 Need NGI host census Populations of end systems which have reached significant performance plateaus Have solved “all” performance problems Confirm other distributions Best collected within the network itself
29
29 Conclusion TCP/IP layering confounds diagnosis –Especially with multiple problems Many pervasive network and host problems –Multiple problems seem to be the norm Better diagnosis requires better visibility –Ergo WEB100
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.