NET100 Development of network-aware operating systems Tom Dunigan

Slides:



Advertisements
Similar presentations
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Advertisements

Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
ORNL Net100 status July 31, UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory ORNL Net100 Focus Areas (first year) –TCP optimizations.
Grid Monitoring Discussion Dantong Yu BNL. Overview Goal Concept Types of sensors User Scenarios Architecture Near term project Discuss topics.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented.
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Rich Wolski, Neil Spring, and Jim Hayes, Journal.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
All rights reserved © 2006, Alcatel Accelerating TCP Traffic on Broadband Access Networks  Ing-Jyh Tsang 
NDT Tools Tutorial: How-To setup your own NDT server Rich Carlson Summer 04 Joint Tech July 19, 2004.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Grid simulation (AliEn) Network data transfer model Eugen Mudnić Technical university Split -FESB.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
1 ESnet Network Measurements ESCC Feb Joe Metzger
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
Development of network-aware operating systems Tom Dunigan
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
An Integrated Instrumentation Architecture for NGI Applications Ian Foster, Darcy Quesnel, Steven Tuecke Argonne National Laboratory The University of.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
GridNM Network Monitoring Architecture (and a bit about my phd) Yee-Ting Li, 1 st Year UCL, 17 th June 2002.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
1 Project Goals Project Elements Future Plans Scheduled Accomplishments Project Title: Net Developing Network-Aware Operating Systems PI: G. Huntoon,
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Requirements for Simulation and Modeling Tools Sally Floyd NSF Workshop August 2005.
1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 4/24/06 PathDiag ppt.
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
Measuring End-to-end Bandwidth with Iperf using Web100 Presented by Warren Matthews (SLAC) on behalf of Ajay Tirumala (U of Illinois), Les Cottrell (SLAC)
NET100 Development of network-aware operating systems Tom Dunigan
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.
Wide Area Network Performance Analysis Methodology Wenji Wu, Phil DeMar, Mark Bowden Fermilab ESCC/Internet2 Joint Techs Workshop 2007
Network-aware OS DOE/MICS Project Final Review September 16, 2004 Tom Dunigan Matt Mathis Brian Tierney ORNL.
Web100/Net100 at Oak Ridge National Lab Tom Dunigan August 1, 2002.
NetLogger Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System Data Intensive Distributed Computing Group Lawrence.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Roll Out I2 Members Meeting.
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
NET100 Development of network-aware operating systems Tom Dunigan
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 and Logistical Networking.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.
Precision Measurements with the EVERGROW Traffic Observatory Péter Hága István Csabai.
IETF 62 NSIS WG1 Porgress Report: Metering NSLP (M-NSLP) Georg Carle, Falko Dressler, Changpeng Fan, Ali Fessi, Cornelia Kappler, Andreas Klenk, Juergen.
9/29/04 GGF Random Thoughts on Application Performance and Network Characteristics Distributed Systems Department Lawrence Berkeley National Laboratory.
Network-aware OS ESCC Miami February 5, 2003 Tom Dunigan Matt Mathis Brian Tierney
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney CSM lunch.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
Samuel Wood Manikandan Punniyakotti Supervisors: Brad Smith, Katia Obraczka, JJ Garcia-Luna-Aceves
Advanced Network Diagnostic Tools Richard Carlson EVN-NREN workshop.
A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney
Network Monitoring Sebastian Büttrich, NSRC / IT University of Copenhagen Last edit: February 2012, ICTP Trieste
Transport Protocols over Circuits/VCs
Networking for grid Network capacity Network throughput
Wide Area Networking at SLAC, Feb ‘03
SDM workshop Strawman report History and Progress and Goal.
Measuring End-to-end Bandwidth with Iperf using Web100
Anant Mudambi, U. Virginia
Using NetLogger and Web100 for TCP analysis
Presentation transcript:

NET100 Development of network-aware operating systems Tom Dunigan

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the PSC/Web100 team (Janet Brown, Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL –Rich Wolski, UCSB –collaborators: Basil Irwin, Bill Wing, Nageswara Rao Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” – Web100 to Web1000? Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Deploy/enhance Web100 into DOE network applications –auto-tune network applications to optimize performance –collect performance statistics to understand/tune networks and applications –evaluate network applications over DOE’s ESnet (OC12, OC48?) bulk transfers over high bandwidth/delay network distributed applications (grid) Develop Network Tools Analysis Framework (NTAF) –configure/launch network tools ( NWS, pathrate, pipechar, …) –aggregate and transform output from tools and Web100 Develop Network Analysis Information Base (NAIB) –repository for NTAF data –API to collect and query

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Bulk transfers ORNL/NERSC Probe project –wide-area distributed storage testbed (HPSS) –investigate protocols, software, devices climate model data transfers were slow – OC3 with 60 ms RTT –classic TCP tuning problem – also broken TCP stacks – developed (almost) TCP-over-UDP test harness instrumented and tunable Recent upgrade (?) to OC12, 100 ms RTT

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 to the rescue? Avoid losses –retain/probe for “optimal” buffer sizes –autotuning (Web100/Net100) –ECN capable routers/hosts –reduce bursts Faster recovery –shorter RTT (“fix” routes) –bigger MSS (jumbo frames) –speculative recovery –modified congestion avoidance? SCTP, out-of-order delivery

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Bulk transfer speedups Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –Web100 autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Are these fair?

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS counters) –forecast bandwidth/latency (NWS) for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive (NAIB) for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Use NetLogger to format and send data to NAIB

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory NetLogger End-to-end performance monitoring tool Modify application to log interesting events Support for distributed applications (NTP timestamps) Identify application/network bottlenecks Components –IETF draft standard message format (ULM) –API for event logging –tools for collecting/sorting log files –visualization tool for monitoring/playback

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory NetLogger

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Analysis Information Base (NAIB) Extensible infrastructure for performance data Collect data from active and passive Net100 probes via NetLogger Gather and serve data via programmatic and graphical interfaces Based on Network Weather Service –distributed set of performance sensors latency/bandwidth/memory/CPU robust periodic data collection –forecasting module –database of sensor configurations and sensor data –Needed for selecting/scheduling grid resources

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory NTAF Use Case The NTAF is configured to run the following network tests every few hours over a period of several days: –ping -- measure network delay –pipechar -- actively measure speed of the bottleneck link –iperf -- actively measure TCP throughput. Multiple iperf tests could be run with different parameters for the number of parallel streams {e.g.: 1,2,4} and the method of tuning the TCP buffers {Web100 auto-tuned, hand-tuned} –NWS -- measure and predict network delay and bandwidth using NWS’ own sensors All tools will use the Web100 TCP-KIS interface to collect TCP information from the Web100 kernel, and then use NetLogger to format and send this data to the NWS NAIB database.

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Use Case (cont.) Analysis based on this test configuration includes : –The ability to compare Web100 tuned throughput to hand-tuned throughput. –The ability to compare NWS predicted bandwidth with application and iperf bandwidth. –The ability to determine the advantage, if any, of parallel data streams, using both hand-tuned and autotuned (Web100-tuned) TCP. –The ability to see the variability of the results over time. –The ability to compare pipechar and pathrate to see which is most accurate. –The ability to measure the impact of tuned TCP streams on non-tuned streams.

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Usage

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 outreach Web pages describing current results Downloadable Net100 software NAIB data available Tutorials, talks, and papers Interact with DOE grid projects and Data Grid projects