NET100 Development of network-aware operating systems Tom Dunigan

Slides:



Advertisements
Similar presentations
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Advertisements

Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Using NetLogger and Web100 for TCP analysis Data Intensive Distributed Computing Group Lawrence Berkeley National Laboratory Brian L. Tierney.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
ORNL Net100 status July 31, UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory ORNL Net100 Focus Areas (first year) –TCP optimizations.
Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.
Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
Congestion Control on High-Speed Networks
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
TCP Congestion Control TCP sources change the sending rate by modifying the window size: Window = min {Advertised window, Congestion Window} In other words,
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
Courtesy: Nick McKeown, Stanford 1 TCP Congestion Control Tahir Azim.
Development of network-aware operating systems Tom Dunigan
Transport Layer 4 2: Transport Layer 4.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
NLANR, Internet2, and End-to-End performance Scot Colburn National Center for Atmospheric Research May Florianópolis, Brasil.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
1 Project Goals Project Elements Future Plans Scheduled Accomplishments Project Title: Net Developing Network-Aware Operating Systems PI: G. Huntoon,
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Requirements for Simulation and Modeling Tools Sally Floyd NSF Workshop August 2005.
1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.
NET100 Development of network-aware operating systems Tom Dunigan
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.
TCP performance Sven Ubik FTP throughput capacity load ftp.uninett.no 12.3 Mb/s 1.2 Gb/s 80 Mb/s (6.6%) ftp.stanford.edu 1.3 Mb/s 600.
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
NET100 Development of network-aware operating systems Tom Dunigan
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.
Network-aware OS DOE/MICS Project Final Review September 16, 2004 Tom Dunigan Matt Mathis Brian Tierney ORNL.
Lecture 9 – More TCP & Congestion Control
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
1 Sonia FahmyPurdue University TCP Congestion Control Sonia Fahmy Department of Computer Sciences Purdue University
Web100/Net100 at Oak Ridge National Lab Tom Dunigan August 1, 2002.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Thoughts on the Evolution of TCP in the Internet (version 2) Sally Floyd ICIR Wednesday Lunch March 17,
 Last Class  This Class  Chapter 6.3. ~ 6.4.  TCP congestion control.
Web100 Basil Irwin National Center for Atmospheric Research Matt Mathis Pittsburgh Supercomputing Center Halloween, 2000.
Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Peer-to-Peer Networks 13 Internet – The Underlay Network
9/29/04 GGF Random Thoughts on Application Performance and Network Characteristics Distributed Systems Department Lawrence Berkeley National Laboratory.
Network-aware OS ESCC Miami February 5, 2003 Tom Dunigan Matt Mathis Brian Tierney
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney CSM lunch.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Samuel Wood Manikandan Punniyakotti Supervisors: Brad Smith, Katia Obraczka, JJ Garcia-Luna-Aceves
A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney
Chapter 3 outline 3.1 transport-layer services
Chapter 6 TCP Congestion Control
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Transport Protocols over Circuits/VCs
Wide Area Networking at SLAC, Feb ‘03
Chapter 6 TCP Congestion Control
TCP flow and congestion control
Anant Mudambi, U. Virginia
High-Performance Data Transport for Grid Applications
Using NetLogger and Web100 for TCP analysis
Presentation transcript:

NET100 Development of network-aware operating systems Tom Dunigan

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications –many network applications are not utilizing the available bandwidth

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Develop Network Tools Analysis Framework (NTAF) –collect data for network tuning Develop/evaluate/deploy network tools (Enable, NWS, iperf, pipechar, …) aggregate and transform output from tools and Web100 Store/query/archive performance data –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Investigate TCP optimizations –simulate/emulate/deploy –Linux kernel mods Autotune network applications –WAD (workaround daemon)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) (115 variables!) –settable: buffer sizes GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 GUI “Creating a window into the network”

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE ), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches –TCP is lossy, slow to recover tune it or replace it? Compute/data grids –sense/probe link bandwidths/latencies –schedule/configure distributed application

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts (TCP vegas) Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune (WAD variables) –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Vitual MSS

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery –rate-based Are these fair?

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon) –tunable TCP stack

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 progress Provide Web100 hosts over Internet –ORNL, LBL, NCAR, PSC, UT, NERSC, SLAC –installed Enable/iperf/pipechar/netperf –Web100 daemon to archive link data, instrument iperf/ttcp –develop WAD framework Characterize NERSC/ORNL ESnet link and Probe applications –latency/bandwidth and loss characterization –HSI/pftp/bbftp/iperf studies ( gridFTP soon) –tune HSI transfer with web100 (NERSC/ORNL) –TCP tuning studies with ns and atou –SCTP testbed

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 areas of interest Network characterization tools –Active probes –Passive sensors Auto-tuning –TCP optimizations –non-TCP protocols ?