A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney

Slides:



Advertisements
Similar presentations
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Advertisements

Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Using NetLogger and Web100 for TCP analysis Data Intensive Distributed Computing Group Lawrence Berkeley National Laboratory Brian L. Tierney.
ORNL Net100 status July 31, UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory ORNL Net100 Focus Areas (first year) –TCP optimizations.
1 TCP Congestion Control. 2 TCP Segment Structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement.
TCP Congestion Control Dina Katabi & Sam Madden nms.csail.mit.edu/~dina 6.033, Spring 2014.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Chapter 3 Transport Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 12.
Transport Layer 3-1 outline r TCP m segment structure m reliable data transfer m flow control m congestion control.
Transport Layer 3-1 Fast Retransmit r time-out period often relatively long: m long delay before resending lost packet r detect lost segments via duplicate.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Data Communication and Networks
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Development of network-aware operating systems Tom Dunigan
Transport Layer 4 2: Transport Layer 4.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
CSE 461 University of Washington1 Topic How TCP implements AIMD, part 1 – “Slow start” is a component of the AI portion of AIMD Slow-start.
1 Project Goals Project Elements Future Plans Scheduled Accomplishments Project Title: Net Developing Network-Aware Operating Systems PI: G. Huntoon,
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Parallel TCP Bill Allcock Argonne National Laboratory.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Rate Control Rate control tunes the packet sending rate. No more than one packet can be sent during each packet sending period. Additive Increase: Every.
NET100 Development of network-aware operating systems Tom Dunigan
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections.
TCP performance Sven Ubik FTP throughput capacity load ftp.uninett.no 12.3 Mb/s 1.2 Gb/s 80 Mb/s (6.6%) ftp.stanford.edu 1.3 Mb/s 600.
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
NET100 Development of network-aware operating systems Tom Dunigan
Network-aware OS DOE/MICS Project Final Review September 16, 2004 Tom Dunigan Matt Mathis Brian Tierney ORNL.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
Computer Networking Lecture 18 – More TCP & Congestion Control.
Web100/Net100 at Oak Ridge National Lab Tom Dunigan August 1, 2002.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer 3- Midterm score distribution. Transport Layer 3- TCP congestion control: additive increase, multiplicative decrease Approach: increase.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
NET100 Development of network-aware operating systems Tom Dunigan
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Peer-to-Peer Networks 13 Internet – The Underlay Network
Network-aware OS ESCC Miami February 5, 2003 Tom Dunigan Matt Mathis Brian Tierney
Network-aware OS DOE/MICS Project Review August 18, 2003 Tom Dunigan Matt Mathis Brian Tierney CSM lunch.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Transport Layer session 1 TELE3118: Network Technologies Week 11: Transport Layer TCP Some slides have been taken from: r Computer Networking:
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Transport Layer CS 381 3/7/2017.
Chapter 3 outline 3.1 transport-layer services
Chapter 6 TCP Congestion Control
TCP Vegas: New Techniques for Congestion Detection and Avoidance
Chapter 3 outline 3.1 Transport-layer services
Transport Protocols over Circuits/VCs
Lecture 19 – TCP Performance
Chapter 6 TCP Congestion Control
Sven Ubik TCP performance Sven Ubik
CS640: Introduction to Computer Networks
CS4470 Computer Networking Protocols
TCP flow and congestion control
Anant Mudambi, U. Virginia
Review of Internet Protocols Transport Layer
Using NetLogger and Web100 for TCP analysis
Presentation transcript:

A TCP Tuning Daemon SC2002 November 19, 2002 Tom Dunigan Matt Mathis Brian Tierney

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Roadmap Motivation Net100 project –Web100 –network probes & sensors –protocol analysis A TCP tuning daemon Tuning experiments rg … and now a word from our sponsors DOE-funded project (Office of Science) $1M/yr, 3 yrs beginning 9/01 LBL, ORNL, PSC, NCAR Net100 project objectives: (network-aware operating systems) measure, understand, and improve end-to-end network/application performance tune network protocols and applications (grid and bulk transfer) first year emphasis: TCP bulk transfer over high delay/bandwidth nets

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Motivation Poor network application performance –High bandwidth paths, but app’s slow – Is it application? OS? network? … Yes –Often need a network “wizard” Changing : bandwidths –9.6 Kbs… 1.5 Mbs..45 …100…1000…? Gbs Unchanging: TCP –speed of light (RTT) –MTU (still 1500 bytes) –TCP congestion avoidance TCP is lossy by design ! –2x overshoot at startup, sawtooth –recovery after a loss can be very slow on today’s high delay/bandwidth links –Recovery proportional to MSS/RTT 2 Linear recovery at 0.5 Mb/s! Instantaneous bandwidth Average bandwidth Early startup losses ORNL to NERSC ftp 8 Mbs GigE/OC12 80ms RTT

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP tuning set optimal (?) buffer size –need buffer = bandwidth*RTT ORNL/NERSC (80 ms, OC12) need 6 MB avoid losses –modified slow-start –reduce bursts –anticipate loss (ECN,Vegas?) –reorder threshold speed recovery –bigger MTU or “virtual MSS” –modified AIMD (0.5,1) –delayed ACKs and initial window avoid congestion collapse be fair (?) … intranets, QoS ns simulation: 500 mbs link, 80 ms RTT Packet loss early in slow start. Standard TCP with del ACK takes 10 minutes to recover!

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 components for tuning TCP protocol analysis –simulation/emulation –kernel tuning extensions Web100 Linux kernel (NSF) –instrumented TCP stack (IETF MIB draft) –100+ variables per flow (/proc/web100) –socket open/close event notification –API and tools for tracing and tuning, e.g., bw tester: firebird.ccs.ornl.gov:7123 Path characterization –Network Tuning and Analysis Framework (NTAF) –both active and passive measurement iperf, pipechar –schedule probes and distribute/archive results –data base of measurements –NTAF/Net100 hosts at PSC, NCAR,LBL,ORNL, NERSC,CERN,UT,SLAC TCP tuning daemon

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory TCP Tuning Daemon Work-around Daemon (WAD) –tune unknowing sender/receiver at startup and/or during flow –Web100 kernel extensions pre-set windowscale to allow dynamic tuning uses netlink to alert daemon of socket open/close (or poll) besides existing Web100 buffer tuning, new tuning options using WAD_* variables knobs to disable Linux 2.4 caching, burst mgt., and sendstall –config file with static tuning data mode specifies dynamic tuning (Floyd AIMD, NTAF buffer size, concurrent streams) –daemon periodically polls NTAF for fresh tuning data –written in C (also python version) WAD config file [bob] src_addr: src_port: 0 dst_addr: dst_port: 0 mode: 1 sndbuf: rcvbuf: wadai: 6 wadmd: 0.3 maxssth: 100 divide: 1 reorder: 9 sendstall: 0 delack: 0 floyd: 1

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Experimental results Evaluating the tuning daemon in the wild –emphasis: bulk transfers over high delay/bandwidth nets (Internet2, ESnet) –tests over: 10GigE,OC48, OC12, OC3, ATM/VBR, GigE,FDDI,100/10T,cable, ISDN,wireless (802.11b),dialup –tests over NistNET 100T testbed Various TCP tuning options –buffer tuning –AIMD mods (including Floyd, both in-kernel and in WAD) –slow-start mods –parallel vs single Results are anecdotal –more systematic testing is on-going –Your mileage may vary …. Network professionals on a closed course. Do not attempt this at home.

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory WAD tuning results Classic buffer tuning ORNL to PSC, OC12, 80ms RTT network-challenged app. gets 10 Mbs same app., WAD/NTAF tuned buffer gets 143 Mbs Virtual MSS tune TCP’s additive increase (WAD_AI) add k segments per RTT during recovery k =6 like GigE jumbo frame, but: interrupt rate not reduced doesn’t do k segments for initial window

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Tuning around Linux (2.4) TCP Tunable ssthresh caching Tunable “sendstall” (TXQUELEN) 600 mbs Amsterdam-Chicago GigE via 10GigE, 100 ms RTT sendstalls UDP event Floyd AIMD Standard AIMD Floyd AIMD : as cwnd grows increase AI and decrease MD, do the reverse when cwnd shrinks Added to Net100 kernel and to WAD (WAD tunable)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory WAD tuning Modified slow-start and AI ORNL to NERSC, OC12, 80 ms RTT often losses in slow-start WAD tuned Floyd slow-start and fixed AI (6) WAD-tuned AIMD and slow-start ORNL to CERN, OC12, 150ms RTT parallel streams AIMD (1/(2k),k) WAD-tuned single stream (0.125,4)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory GridFTP tuning Can tuned single stream compete with parallel streams? Mostly not with “equivalence” tuning, but sometimes…. Parallel streams have slow-start advantage. WAD can divide buffer among concurrent flows—fairer/faster? Tests inconclusive so far…. Testing on real Internet is problematic. Is there a “congestion metric”? Per unit of time? Flow Mbs congestion re-xmits untuned tuned parallel untuned tuned parallel Data/plots from Web100 tracer Buffers: 64K I/O, 4MB TCP (untuned 64K TCP: 8 mbs, 200s)

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Future TCP tuning Reorder threshold seeing more out of order packets WAD tune a bigger reorder threshold for path 40x improvement! Linux 2.4 does a good job already adjusts and caches reorder threshold “undo” congestion avoidance Delayed ACKs WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slow-start Linux 2.4 already turns off delayed ACKs for initial slow- start ns simulation: 500 mbs link, 80 ms RTT Packet loss early in slow-start. Standard TCP with del ACK takes 10 minutes to recover! NOTE aggressive static AIMD (Floyd pre-tune) LBL to ORNL (using our TCP-over-UDP) : dup3 case had 289 retransmits, but all were unneeded!

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Futures Net100 –analyze effectiveness/fairness of current tuning options simulation emulation on the net (systematic tests) –NTAF probes -- characterizing a path to tune a flow router data (passive) monitoring applications with Web100 –additional tuning algorithms Vegas,ECN non-TCP identify non-congestive loss? –parallel/multipath selection/tuning –WAD-to-WAD tuning –jumbo frames experiments… the quest for bigger and bigger MTUs –more user -friendly Web100 extensions –refine user interface and API –port to other OS’s

UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Summary Novel approaches –non-invasive dynamic tuning of legacy applications –using TCP to tune TCP (Web100) –tuning on a per flow/destination Effective evaluation framework –protocol analysis and tuning + net/app/OS debugging –out-of-kernel tuning Beneficial interactions –TCP protocols (Floyd, Wu Feng (DRS), Web100, parallel/non-TCP) –Path characterization research (SciDAC, CAIDA, Pinger) –Scientific application and Data grids (SciDAC, CERN) Performance improvements