NET100 Development of network-aware operating systems Tom Dunigan
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications –many network applications are not utilizing the available bandwidth
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Develop Network Tools Analysis Framework (NTAF) –collect data for network tuning Develop/evaluate/deploy network tools (Enable, NWS, iperf, pipechar, …) aggregate and transform output from tools and Web100 Store/query/archive performance data –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Investigate TCP optimizations –simulate/emulate/deploy –Linux kernel mods Autotune network applications –WAD (workaround daemon)
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) (115 variables!) –settable: buffer sizes GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 GUI “Creating a window into the network”
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE ), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches –TCP is lossy, slow to recover tune it or replace it? Compute/data grids –sense/probe link bandwidths/latencies –schedule/configure distributed application
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts (TCP vegas) Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune (WAD variables) –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Vitual MSS
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery –rate-based Are these fair?
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon) –tunable TCP stack
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 progress Provide Web100 hosts over Internet –ORNL, LBL, NCAR, PSC, UT, NERSC, SLAC –installed Enable/iperf/pipechar/netperf –Web100 daemon to archive link data, instrument iperf/ttcp –develop WAD framework Characterize NERSC/ORNL ESnet link and Probe applications –latency/bandwidth and loss characterization –HSI/pftp/bbftp/iperf studies ( gridFTP soon) –tune HSI transfer with web100 (NERSC/ORNL) –TCP tuning studies with ns and atou –SCTP testbed
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 areas of interest Network characterization tools –Active probes –Passive sensors Auto-tuning –TCP optimizations –non-TCP protocols ?