Download presentation
Presentation is loading. Please wait.
Published bySheila Caldwell Modified over 8 years ago
1
Visual Flow Analysis: What do real-world problems look like? Brent Draney NERSC Center Division, LBNL 2/07/06
2
2 What is NERSC DOE scientific computer center Supports ~2000 scientists around the world (mainly DOE and Universities) Supports most major disciplines Combined ~20-TFLOPS, 8.8 Petabytes 10 Gigabit lan backbone and 10 Gigabit ESnet uplink O(100) sockets accounts for ~95% of bytes transferred O(5000) IP addresses in a single building but only 100 desktops
3
3 Network and Security Team(NAST) Enablers and Inhibitors of the network in one group –All responsibility is here Networking is responsible for end-to-end performance –Wherever the customer is –“Not our problem” is not sufficient or acceptable
4
4 Performance tools Optical taps everywhere Mobile crashcart with all types of interfaces Tcpdump, Tcptrace and Xplot A lot of head scratching Note: Analyzing a mult-Gigabyte flow packet by packet is impossible!
5
5 Simple Example Consistent Slope No anomalies Protocol limited
6
6 Simple Example Detail Packets ACK’ed data Sender Advertised Window
7
7 Brick Wall Example Few anomalies Transfer Hangs
8
8 Brick Wall Detail One Dropped packet 3 Dupe ACK’s No Retransmit, Ever
9
9 Brick Wall Example Troubleshooting and Answer Troubleshooting –Sender verifies that retransmits are sent –“Non-tuned” traffic never fails Answer –A stateful firewall tracking TCP sequence numbers didn’t believe that the retransmits were legitimate
10
10 Perverse Example Holy Mackerel! Jumbo Packets Retransmits
11
11 Perverse Example Is PMTU working?Yes [Scratch Head]
12
12 Perverse Example Troubleshooting and Answer Troubleshooting –Review sender configuration –PMTU installed in routing table correctly? Yes –TCPdump on host shows 64K packets leaving a 9k interface –“Large Send” enabled offloading packet creation to NIC Answer –NIC doesn’t have access to routing table Route MTU not honored –Retransmits handled by kernel Route MTU Honored
13
13 Conclusions Diverse problems have the same general feel of poor performance. Flow visualization can isolate problems quickly. Very large flows require visualization. Protocol limits (host buffers, sftp …) are still a major cause but are becoming less so. New and “creative” methods to achieve higher performance can create strangeness and are becoming more of a problem. Seeing is believing. Pictures are convincing (to users, system admins and network admins).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.