Presentation is loading. Please wait.

Presentation is loading. Please wait.

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.

Similar presentations


Presentation on theme: "UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections."— Presentation transcript:

1 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections  IMPACT:  increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet)  select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP)  provide network performance data base from active and passive monitoring  CONNECTIONS:  SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking  Base:Network Monitoring, Data Grid, Transport Protocols Milestones/Dates/Status  Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03  Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 Net100 Novel Ideas  Net100 will tune network-UNaware applications based on recent and current link characteristics  Net100 will tune more than just transport buffer sizes, such as  TCP AIMD parameters  DUP threshold  Delayed ACK  Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths  Net100 kernel utilizes passive monitoring from the Web100 kernel NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon www.net100.org Date Prepared: 1/7/02 High-Performance Network Research- SciDAC/Base MICS Program Manager: Thomas Ndousse

2 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications –many network applications are not utilizing the available bandwidth

3 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Develop Network Tools Analysis Framework (NTAF) –collect data for network tuning Develop/evaluate/deploy network tools (Enable, NWS, iperf, pipechar, …) aggregate and transform output from tools and Web100 Store/query/archive performance data –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Investigate TCP optimizations –simulate/emulate/deploy –Linux kernel mods Autotune network applications –WAD (workaround daemon)

4 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) (115 variables!) –settable: buffer sizes GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities

5 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE ), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches –TCP is lossy, slow to recover tune it or replace it? Compute/data grids –sense/probe link bandwidths/latencies –schedule/configure distributed application

6 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

7 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts (TCP vegas) Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune (WAD variables) –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Virtual MSS

8 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery –rate-based Are these fair?

9 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Work-around Daemon (WAD) Version 0 –passively collect flow data –tune unknowing sender/receiver –config file with “tuning info” ? –Based on Web100/Linux 2.4 To be done –collecting tuning info –adding more knobs to kernel Related work –Feng’s Dynamic Right Sizing –Linux 2.4 auto-tuning/caching –Mathis TCP buffer tunning

10 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS/Web100 counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon) –tunable TCP stack

11 UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 interactions Net100 is both a producer and consumer of network performance data –Active probes (Claffy Bandwidth Estimation, INCITE) –Passive sensors (LBL Network monitoring) Auto-tuning –TCP optimizations (Feng/LANL, Linux 2.4) –smart transfer (IQecho, Logistical networking) –non-TCP protocols (DCP, STP, SCTP, rate-based, ?) Net100 tuning could be applied to distributed applications –Climate/Probe, SuperNova, DataGrids –interact with Grid metaware (forecasting, scheduling, tuning) http://www.net100.org


Download ppt "UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections."

Similar presentations


Ads by Google