NET100 Development of network-aware operating systems Tom Dunigan

NET100 Development of network-aware operating systems Tom Dunigan thd@ornl.gov

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –settable: buffer sizes –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 GUI “Creating a window into the network”

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Deploy/enhance Web100 into DOE network applications –collect performance statistics to understand/tune networks and applications Passive (web100, snmp,…) Active (pipechar, NWS, ping, iperf, …) –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Develop Network Tools Analysis Framework (NTAF) –Develop/evaluate network tools (Enable, NWS, iperf, pipechar, …) –aggregate and transform output from tools and Web100 –Store/query/archive performance data Autotune network applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE …), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches Compute/data grids

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Vitual MSS

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery Are these fair?

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Usage

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory NTAF Use Case The NTAF is configured to run the following network tests every few hours over a period of several days: –ping -- measure network delay –pipechar -- actively measure speed of the bottleneck link –iperf -- actively measure TCP throughput. Multiple iperf tests could be run with different parameters for the number of parallel streams {e.g.: 1,2,4} and the method of tuning the TCP buffers {auto-tuned, hand-tuned} –Collect passive data from web100 (other?) –Measure/predict network delay/bandwidth –Format/store/archive performance data Use data to tune/schedule network applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 areas of interest Network characterization tools –Active probes –Passive sensors Auto-tuning http://www.net100.org

NET100 Development of network-aware operating systems Tom Dunigan

Similar presentations

Presentation on theme: "NET100 Development of network-aware operating systems Tom Dunigan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NET100 Development of network-aware operating systems Tom Dunigan

Similar presentations

Presentation on theme: "NET100 Development of network-aware operating systems Tom Dunigan"— Presentation transcript:

Similar presentations

About project

Feedback