Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 PathDiag ppt
The Wizard Gap
The non-experts are falling behind YearExpertsNon-expertsRatio Mb/s300 kb/s3: Mb/s Mb/s Gb/s Gb/s 3 Mb/s 3000: Gb/s Why?
TCP tuning requires expert knowledge By design TCP/IP hides the ‘net from upper layers –TCP/IP provides basic reliable data delivery –The “hour glass” between applications and networks This is a good thing, because it allows: –Old applications to use new networks –New application to use old networks –Invisible recovery from data loss, etc But then (nearly) all problems have the same symptom –Less than expected performance –The details are hidden from nearly everyone
TCP tuning is really debugging Application problems: –Inefficient or inappropriate application designs Operating System or TCP problems: –Negotiated TCP features (SACK, WSCALE, etc) –Failed MTU discovery –Too small retransmission or reassembly buffers Network problems: –Packet losses, congestion, etc –Packets arriving out of order or even duplicated –“Scenic” IP routing or excessive round trip times –Improper packet sizes limits (MTU)
TCP tuning is painful debugging All problems reduce performance –But the specific symptoms are hidden But any one problem can prevent good performance –Completely masking all other problems Trying to fix the weakest link of an invisible chain –General tendency is to guess and “fix” random parts –Repairs are sometimes “random walks” –Repair one problem at time at best
The Web100 project When there is a problem, just ask TCP –TCP has the ideal vantage point In between the application and the network –TCP already “measures” key network parameters Round Trip Time (RTT) and available data capacity Can add more –TCP can identify the bottleneck Why did it stop sending data? –TCP can even adjust itself “autotuning” eliminates one major class of bugs See:
Key Web100 components Better instrumentation within TCP –120 internal performance monitors –Poised to become Internet standard “MIB” TCP Autotuning –Selects the ideal buffer sizes for TCP –Eliminate the need for user expertise Basic network diagnostic tools –Requires less expertise than prior tools Excellent for network admins But still not useful for end users
Web100 Status Two year no-cost extension –Can only push standardization after most of the work –Ongoing support of research users Partial adoption –Current Linux includes (most of) autotuning John Heffner is maintaining patches for the rest of Web100 –Microsoft Experimental TCP instrumentation Working on autotuning (to support FTTH) –IBM “z/OS Communications Server” Experimental TCP instrumentation
The next step Web100 tools still require too much expertise –They are not really end user tools –Too easy to overlook problems –Current diagnostic procedures are still cumbersome New insight from web100 experience –Nearly all symptoms scale with round trip time New NSF funding –Network Path and Application Diagnosis (NPAD) –3 Years, we are at the midpoint
Nearly all symptoms scale with RTT For example –TCP Buffer Space, Network loss and reordering, etc –On a short path TCP can compensate for the flaw Local Client to Server: all applications work –Including all standard diagnostics Remote Client to Server: all applications fail –Leading to faulty implication of other components
Examples of flaws that scale Chatty application (e.g., 50 transactions per request) –On 1ms LAN, this adds 50ms to user response time –On 100ms WAN, this adds 5s to user response time Fixed TCP socket buffer space (e.g., 32kBytes) –On a 1ms LAN, limit throughput to 200Mb/s –On a 100ms WAN, limit throughput to 2Mb/s Packet Loss (e.g., 0.1% loss at 1500 bytes) –On a 1ms LAN, models predict 300 Mb/s –On a 100ms WAN, models predict 3 Mb/s
Review For nearly all network flaws –The only symptom is reduced performance –But this reduction is scaled by RTT On short paths many flaws are undetectable –False pass for even the best conventional diagnostics –Leads to faulty inductive reasoning about flaw locations –This is the essence of the “end-to-end” problem –Current state-of-the-art diagnosis relies on tomography and complicated inference techniques
Our new tool: pathdiag Specify End-to-End application performance goal –Round Trip Time (RTT) of the full path –Desired application data rate Measure the performance of a short path section –Use Web100 to collect detailed statistics –Loss, delay, queuing properties, etc Use models to extrapolate results to full path –Assume that the rest of the path is ideal Pass/Fail on the basis of the extrapolated performance
Deploy as a Diagnostic Server Use pathdiag in a Diagnostic Server (DS) in the GigaPop Specify End to End target performance –from server (S) to client (C) (RTT and data rate) Measure the performance from DS to C –Use Web100 in the DS to collect detailed statistics –Extrapolate performance assuming ideal backbone Pass/Fail on the basis of extrapolated performance
Example diagnostic output 1 Tester at IP address: xxx.xxx Target at IP address: xxx.xxx Warning: TCP connection is not using SACK Fail: Received window scale is 0, it should be 2. Diagnosis: TCP on the test target is not properly configured for this path. > See TCP tuning instructions at Pass data rate check: maximum data rate was Mb/s Fail: loss event rate: % (3960 pkts between loss events) Diagnosis: there is too much background (non-congested) packet loss. The events averaged losses each, for a total loss rate of % FYI: To get 4 Mb/s with a 1448 byte MSS on a 200 ms path the total end-to-end loss budget is % (9733 pkts between losses). Warning: could not measure queue length due to previously reported bottlenecks Diagnosis: there is a bottleneck in the tester itself or test target (e.g insufficient buffer space or too much CPU load) > Correct previously identified TCP configuration problems > Localize all path problems by testing progressively smaller sections of the full path. FYI: This path may pass with a less strenuous application: Try rate=4 Mb/s, rtt=106 ms Or if you can raise the MTU: Try rate=4 Mb/s, rtt=662 ms, mtu=9000 Some events in this run were not completely diagnosed.
Example diagnostic output 2 Tester at IP address: Target at IP address: FYI: TCP negotiated appropriate options: WSCALE=8, SACKok, and Timestamps) Pass data rate check: maximum data rate was Mb/s Pass: measured loss rate % (22364 pkts between loss events) FYI: To get 10 Mb/s with a 1448 byte MSS on a 10 ms path the total end-to-end loss budget is % (152 pkts between losses). FYI: Measured queue size, Pkts: 33 Bytes: Drain time: ms Passed all tests! FYI: This path may even pass with a more strenuous application: Try rate=10 Mb/s, rtt=121 ms Try rate=94 Mb/s, rtt=12 ms Or if you can raise the MTU: Try rate=10 Mb/s, rtt=753 ms, mtu=9000 Try rate=94 Mb/s, rtt=80 ms, mtu=9000
Example diagnostic output 3 Tester at IP address: Target at IP address: Fail: Received window scale is 0, it should be 1. Diagnosis: TCP on the test target is not properly configured for this path. > See TCP tuning instructions at Test 1a (7 seconds): Coarse Scan Test 2a (17 seconds): Search for the knee Test 2b (10 seconds): Duplex test Test 3a (8 seconds): Accumulate loss statistics Test 4a (17 seconds): Measure static queue space The maximum data rate was Mb/s This is below the target rate ( ). Diagnosis: there seems to be a hard data rate limit > Double check the path: is it via the route and equipment that you expect? Pass: measured loss rate % (7834 pkts between loss events) FYI: To get 10 Mb/s with a 1448 byte MSS on a 50 ms path the total end-to-end loss budget is % (3802 pkts between losses).
Key DS features Nearly complete coverage for OS and Network flaws –Does not address flawed routing at all –May fail to detect flaws that only affect outbound data Unless you have Web100 in the client or a (future) portable DS –May fail to detect a few rare corner cases –Eliminates all other false pass results Tests becomes more sensitive on shorter paths –Conventional diagnostics become less sensitive –Depending on models, perhaps too sensitive New problem is false fail (queue space tests) Flaws no longer completely mask other flaws –A single test often detects several flaws E.g. both OS and network flaws in the same test –They can be repaired in parallel
Key features, continued Results are specific and less geeky –Intended for end-users –Provides a list of action items to be corrected Failed tests are showstoppers for high performance app. –Details for escalation to network or system admins Archived results include raw data –Can reprocess with updated reporting SW
The future Current service is “pre-alpha” –Please use it so we can validate the tool We can often tell when it got something wrong –Please report confusing results So we can improve the reports –Please get us involved if it is non-helpful We need interesting pathologies Will soon have another server near FRGP –NCAR in Boulder CO Will someday be in a position to deploy more –Should there be one at PSU?
What about flaws in applications? NPAD is also thinking about applications Using an entirely different collection of techniques –Symptom scaling still applies Tools to emulate ideal long paths on a LAN –Prove or bench test applications in the lab Also checks some OS and TCP features –If it fails in the lab, it can not work on a WAN
For example classic ssh & scp Long known performance problems Recently diagnosed –Internal flow control for port forwarding –NOT encryption Chris Rapier developed a patch –Update flow control windows from kernel buffer size –Already running on most PSC systems See:
NPAD Goal Build a minimal tool set that can detect “every” flaw –Pathdiag: all flaws affecting inbound data –Web100 in servers or portable diagnostic servers: All flaws affecting outbound data –Application bench test: All application flaws –Traceroute: routing flaws We believe that this is a complete set