NET100 Development of network-aware operating systems Tom Dunigan

Slides:



Advertisements
Similar presentations
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Advertisements

Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Using NetLogger and Web100 for TCP analysis Data Intensive Distributed Computing Group Lawrence Berkeley National Laboratory Brian L. Tierney.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
ORNL Net100 status July 31, UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory ORNL Net100 Focus Areas (first year) –TCP optimizations.
TCP Congestion Control Dina Katabi & Sam Madden nms.csail.mit.edu/~dina 6.033, Spring 2014.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
TCP Congestion Control TCP sources change the sending rate by modifying the window size: Window = min {Advertised window, Congestion Window} In other words,
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
Courtesy: Nick McKeown, Stanford 1 TCP Congestion Control Tahir Azim.
Development of network-aware operating systems Tom Dunigan
Transport Layer 4 2: Transport Layer 4.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
Implementing High Speed TCP (aka Sally Floyd’s) Yee-Ting Li & Gareth Fairey 1 st October 2002 DataTAG CERN (Kinda!)
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
1 Project Goals Project Elements Future Plans Scheduled Accomplishments Project Title: Net Developing Network-Aware Operating Systems PI: G. Huntoon,
27th, Nov 2001 GLOBECOM /16 Analysis of Dynamic Behaviors of Many TCP Connections Sharing Tail-Drop / RED Routers Go Hasegawa Osaka University, Japan.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Requirements for Simulation and Modeling Tools Sally Floyd NSF Workshop August 2005.
1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.
NET100 Development of network-aware operating systems Tom Dunigan
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney
Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 4/24/06 PathDiag ppt.
TCP with Variance Control for Multihop IEEE Wireless Networks Jiwei Chen, Mario Gerla, Yeng-zhong Lee.
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.
Wide Area Network Performance Analysis Methodology Wenji Wu, Phil DeMar, Mark Bowden Fermilab ESCC/Internet2 Joint Techs Workshop 2007
Network-aware OS DOE/MICS Project Final Review September 16, 2004 Tom Dunigan Matt Mathis Brian Tierney ORNL.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
1 Sonia FahmyPurdue University TCP Congestion Control Sonia Fahmy Department of Computer Sciences Purdue University
Web100/Net100 at Oak Ridge National Lab Tom Dunigan August 1, 2002.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Low Latency Adaptive Streaming over TCP Authors Ashvin Goel Charles Krasic Jonathan Walpole Presented By Sudeep Rege Sachin Edlabadkar.
Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.
NET100 Development of network-aware operating systems Tom Dunigan
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.
Peer-to-Peer Networks 13 Internet – The Underlay Network
9/29/04 GGF Random Thoughts on Application Performance and Network Characteristics Distributed Systems Department Lawrence Berkeley National Laboratory.
Network-aware OS ESCC Miami February 5, 2003 Tom Dunigan Matt Mathis Brian Tierney
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney CSM lunch.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Samuel Wood Manikandan Punniyakotti Supervisors: Brad Smith, Katia Obraczka, JJ Garcia-Luna-Aceves
A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney
Chapter 3 outline 3.1 transport-layer services
Chapter 6 TCP Congestion Control
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Chapter 3 outline 3.1 Transport-layer services
Transport Protocols over Circuits/VCs
Lecture 19 – TCP Performance
Wide Area Networking at SLAC, Feb ‘03
Chapter 6 TCP Congestion Control
Chapter 3 outline 3.1 Transport-layer services
TCP flow and congestion control
Anant Mudambi, U. Virginia
Using NetLogger and Web100 for TCP analysis
Presentation transcript:

NET100 Development of network-aware operating systems Tom Dunigan

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –settable: buffer sizes –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 GUI “Creating a window into the network”

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Deploy/enhance Web100 into DOE network applications –collect performance statistics to understand/tune networks and applications Passive (web100, snmp,…) Active (pipechar, NWS, ping, iperf, …) –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Develop Network Tools Analysis Framework (NTAF) –Develop/evaluate network tools (Enable, NWS, iperf, pipechar, …) –aggregate and transform output from tools and Web100 –Store/query/archive performance data Autotune network applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE …), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches Compute/data grids

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Vitual MSS

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery Are these fair?

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Usage

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory NTAF Use Case The NTAF is configured to run the following network tests every few hours over a period of several days: –ping -- measure network delay –pipechar -- actively measure speed of the bottleneck link –iperf -- actively measure TCP throughput. Multiple iperf tests could be run with different parameters for the number of parallel streams {e.g.: 1,2,4} and the method of tuning the TCP buffers {auto-tuned, hand-tuned} –Collect passive data from web100 (other?) –Measure/predict network delay/bandwidth –Format/store/archive performance data Use data to tune/schedule network applications

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 areas of interest Network characterization tools –Active probes –Passive sensors Auto-tuning