UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.

Slides:

Advertisements

Similar presentations

Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.

Advertisements

Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.

TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot

ORNL Net100 status July 31, UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory ORNL Net100 Focus Areas (first year) –TCP optimizations.

1 TCP Congestion Control. 2 TCP Segment Structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement.

Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.

Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.

Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.

Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.

Transport Layer3-1 Congestion Control. Transport Layer3-2 Principles of Congestion Control Congestion: r informally: “too many sources sending too much.

High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.

1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.

Data Communication and Networks

Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:

Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.

The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.

Development of network-aware operating systems Tom Dunigan

Transport Layer 4 2: Transport Layer 4.

Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.

Transport Layer3-1 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles.

Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.

Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.

Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!

UDT: UDP based Data Transfer Yunhong Gu & Robert Grossman Laboratory for Advanced Computing University of Illinois at Chicago.

1 Project Goals Project Elements Future Plans Scheduled Accomplishments Project Title: Net Developing Network-Aware Operating Systems PI: G. Huntoon,

Logistical Networking Micah Beck, Research Assoc. Professor Director, Logistical Computing & Internetworking (LoCI) Lab Computer.

High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.

1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.

HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.

1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.

NET100 Development of network-aware operating systems Tom Dunigan

Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney

TCP with Variance Control for Multihop IEEE Wireless Networks Jiwei Chen, Mario Gerla, Yeng-zhong Lee.

NET100 … as seen from ORNL Tom Dunigan November 8, 2001.

NET100 Development of network-aware operating systems Tom Dunigan

National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.

Network-aware OS DOE/MICS Project Final Review September 16, 2004 Tom Dunigan Matt Mathis Brian Tierney ORNL.

Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March

1 Sonia FahmyPurdue University TCP Congestion Control Sonia Fahmy Department of Computer Sciences Purdue University

Web100/Net100 at Oak Ridge National Lab Tom Dunigan August 1, 2002.

Transport Layer 3- Midterm score distribution. Transport Layer 3- TCP congestion control: additive increase, multiplicative decrease Approach: increase.

TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.

Thoughts on the Evolution of TCP in the Internet (version 2) Sally Floyd ICIR Wednesday Lunch March 17,

Advance Computer Networks Lecture#09 & 10 Instructor: Engr. Muhammad Mateen Yaqoob.

NET100 Development of network-aware operating systems Tom Dunigan

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.

Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.

Peer-to-Peer Networks 13 Internet – The Underlay Network

9/29/04 GGF Random Thoughts on Application Performance and Network Characteristics Distributed Systems Department Lawrence Berkeley National Laboratory.

Network-aware OS ESCC Miami February 5, 2003 Tom Dunigan Matt Mathis Brian Tierney

Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney CSM lunch.

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.

Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.

A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney

Chapter 3 outline 3.1 transport-layer services

Chapter 6 TCP Congestion Control

TCP Vegas: New Techniques for Congestion Detection and Avoidance

Chapter 3 outline 3.1 Transport-layer services

Transport Protocols over Circuits/VCs

Milestones/Dates/Status Impact and Connections

Wide Area Networking at SLAC, Feb ‘03

Chapter 6 TCP Congestion Control

Transport Layer: Congestion Control

Chapter 3 outline 3.1 Transport-layer services

TCP flow and congestion control

Anant Mudambi, U. Virginia

Using NetLogger and Web100 for TCP analysis

Presentation transcript:

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections  IMPACT:  increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet)  select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP)  provide network performance data base from active and passive monitoring  CONNECTIONS:  SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking  Base:Network Monitoring, Data Grid, Transport Protocols Milestones/Dates/Status  Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03  Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 Net100 Novel Ideas  Net100 will tune network-UNaware applications based on recent and current link characteristics  Net100 will tune more than just transport buffer sizes, such as  TCP AIMD parameters  DUP threshold  Delayed ACK  Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths  Net100 kernel utilizes passive monitoring from the Web100 kernel NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon Date Prepared: 1/7/02 High-Performance Network Research- SciDAC/Base MICS Program Manager: Thomas Ndousse

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 project New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) Principal investigators –Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) –Brian Tierney, LBNL –Tom Dunigan, ORNL Objective: develop network aware operating systems – optimize and understand end-to-end network and application performance – eliminate the “wizard gap” Motivation –DOE has a large investment in high speed networks (ESnet) and distributed applications –many network applications are not utilizing the available bandwidth

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 approach Develop Network Tools Analysis Framework (NTAF) –collect data for network tuning Develop/evaluate/deploy network tools (Enable, NWS, iperf, pipechar, …) aggregate and transform output from tools and Web100 Store/query/archive performance data –evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) bulk transfers over high bandwidth/delay network distributed applications (grid) Investigate TCP optimizations –simulate/emulate/deploy –Linux kernel mods Autotune network applications –WAD (workaround daemon)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Web100 summary NSF funded (NCAR/PSC) web100.org Modified Linux kernel (2.4.9) instrumented kernel to read/set TCP variables for a specific flow –readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) (115 variables!) –settable: buffer sizes GUI to display/modify a flow’s TCP variables, real-time API for network-aware applications Early evaluators: ANL,SLAC, LBNL, ORNL, universities

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation bulk transfers are slow – faster links (OC12, OC48, 10GigE ), but long delay –classic TCP tuning problem – also broken TCP stacks –Under-provisioned routers/switches –TCP is lossy, slow to recover tune it or replace it? Compute/data grids –sense/probe link bandwidths/latencies –schedule/configure distributed application

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous average Packet loss Early packet drops

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning (workarounds) Avoid losses –retain/probe for “optimal” buffer sizes –ECN capable routers/hosts –reduce bursts (TCP vegas) Faster recovery –bigger MSS (jumbo frames) –speculative recovery (D-SACK) –modified congestion avoidance? Autotune (WAD variables) –Buffer size –Dupthresh –Del ACK, Nagle –AIMD –Virtual MSS

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning opportunities Parallel streams ( psockets ) –how to choose number of streams, buffer sizes? –autotune ? Application routing daemons –indirect TCP –alternate path (Wolski, UCSB) –multipath (Rao, ORNL) Other protocols (SCTP, DCP) –Out of order delivery –rate-based Are these fair?

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Work-around Daemon (WAD) Version 0 –passively collect flow data –tune unknowing sender/receiver –config file with “tuning info” ? –Based on Web100/Linux 2.4 To be done –collecting tuning info –adding more knobs to kernel Related work –Feng’s Dynamic Right Sizing –Linux 2.4 auto-tuning/caching –Mathis TCP buffer tunning

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Network Tool Analysis Framework (NTAF) Configure and launch network tools –measure bandwidth/latency ( iperf, pchar, pipechar ) –collect passive data (SNMP from routers, OS/Web100 counters) –forecast bandwidth/latency for grid resource scheduling –augment tools to report Web100 data Collect and transform tool results into a common format Save results for short-term auto-tuning and archive for later analysis –compare predicted to actual performance –measure effectiveness of tools and auto-tuning Auto-tune network applications –WAD (WorkAround Daemon) –tunable TCP stack

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 interactions Net100 is both a producer and consumer of network performance data –Active probes (Claffy Bandwidth Estimation, INCITE) –Passive sensors (LBL Network monitoring) Auto-tuning –TCP optimizations (Feng/LANL, Linux 2.4) –smart transfer (IQecho, Logistical networking) –non-TCP protocols (DCP, STP, SCTP, rate-based, ?) Net100 tuning could be applied to distributed applications –Climate/Probe, SuperNova, DataGrids –interact with Grid metaware (forecasting, scheduling, tuning)