S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?

Slides:



Advertisements
Similar presentations
Gigabyte Bandwidth Enables Global Co-Laboratories Prof. Harvey Newman, Caltech Jim Gray, Microsoft Presented at Windows Hardware Engineering Conference.
Advertisements

Storage System Integration with High Performance Networks Jon Bakken and Don Petravick FNAL.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Cheng Jin David Wei Steven Low FAST TCP: design and experiments.
LCG TCP performance optimization for 10 Gb/s LHCOPN connections 1 on behalf of M. Bencivenni, T.Ferrari, D. De Girolamo, Stefano.
Shawn P. McKee University of Michigan University of Michigan UltraLight Meeting, NSF January 26, 2005 Network Working Group Report.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Iperf Tutorial Jon Dugan Summer JointTechs 2010, Columbus, OH.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 5, 2001.
16 September The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN NEC’2005.
GNEW 2004 CERN, Geneva, Switzerland March 16th, 2004Shawn McKee The UltraLight Program UltraLight: An Overview for GNEW2004 Shawn McKee University of Michigan.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
10GbE WAN Data Transfers for Science High Energy/Nuclear Physics (HENP) SIG Fall 2004 Internet2 Member Meeting Yang Xia, HEP, Caltech
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
31 October The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN Swiss ICT Task.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Large File Transfer on 20,000 km - Between Korea and Switzerland Yusung Kim, Daewon Kim, Joonbok Lee, Kilnam Chon
These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
J. Bunn, D. Nae, H. Newman, S. Ravot, X. Su, Y. Xia California Institute of Technology High speed WAN data transfers for science Session Recent Results.
J. Bunn, D. Nae, H. Newman, S. Ravot, X. Su, Y. Xia California Institute of Technology State of the art in the use of long distance network International.
Shawn McKee / University of Michigan USATLAS Tier1 & Tier2 Network Planning Meeting December 14, BNL UltraLight Overview.
Technology for Using High Performance Networks or How to Make Your Network Go Faster…. Robin Tasker UK Light Town Meeting 9 September.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
LambdaStation Monalisa DoE PI meeting September 30, 2005 Sylvain Ravot.
workshop eugene, oregon Network Performance Metrics Unix/IP Preparation Course July 19, 2009 Eugene, Oregon, USA Original Materials.
1 CHAPTER 8 TELECOMMUNICATIONSANDNETWORKS. 2 TELECOMMUNICATIONS Telecommunications: Communication of all types of information, including digital data,
Delivering Circuit Services to Researchers: The HOPI Testbed Rick Summerhill Director, Network Research, Architecture, and Technologies, Internet2 Joint.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Shawn McKee University of Michigan University of Michigan UltraLight: A Managed Network Infrastructure for HEP CHEP06, Mumbai, India February 14, 2006.
Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London.
FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Rate Control Rate control tunes the packet sending rate. No more than one packet can be sent during each packet sending period. Additive Increase: Every.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
TCP with Variance Control for Multihop IEEE Wireless Networks Jiwei Chen, Mario Gerla, Yeng-zhong Lee.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
First of ALL Big appologize for Kei’s absence Hero of this year’s LSR achievement Takeshi in his experiment.
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Project Results Thanks to the exceptional cooperation spirit between the European and North American teams involved in the DataTAG project,
The Design and Demonstration of the UltraLight Network Testbed Presented by Xun Su GridNets 2006, Oct.
Topic 3 Analysing network traffic
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.
The Macroscopic behavior of the TCP Congestion Avoidance Algorithm.
2010 kigali, rwanda Network Performance Metrics Unix/IP Preparation Course May 23, 2010 Kigali, Rwanda Original Materials by.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu GNEW, CERN, March 2004.
An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.
HENP SIG Austin, TX September 27th, 2004Shawn McKee The UltraLight Program UltraLight: An Overview and Update Shawn McKee University of Michigan.
Masaki Hirabaru (NICT) and Jin Tanaka (KDDI) Impact of Bottleneck Queue on Long Distant TCP Transfer August 25, 2005 NOC-Network Engineering Session Advanced.
Fall 2005 Internet2 Member Meeting International Task Force Julio Ibarra, PI Heidi Alvarez, Co-PI Chip Cox, Co-PI John Silvester, Co-PI September 19, 2005.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Network Congestion Control HEAnet Conference 2005 (David Malone for Doug Leith)
Joint Genome Institute
R. Hughes-Jones Manchester
Transport Protocols over Circuits/VCs
Wide Area Networking at SLAC, Feb ‘03
The UltraLight Program
High-Performance Data Transport for Grid Applications
Presentation transcript:

S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier? High Speed WAN data transfers for Science Monalisa CERN

Agenda u TCP limitations u Internet2 Land speed record u New TCP implementations u Disk to disk transfers: Breaking the 1 GByte/sec Barrier u Light paths : UltraLight

Single TCP stream performance under periodic losses Loss rate =0.01%: è LAN BW utilization= 99% è WAN BW utilization=1.2% Bandwidth available = 1 Gbps  TCP throughput much more sensitive to packet loss in WANs than LANs  TCP’s congestion control algorithm (AIMD) is not well-suited to gigabit networks  The effect of packets loss can be disastrous  TCP is inefficient in high bandwidth*delay networks  The future performance-outlook for computational grids looks bad if we continue to rely solely on the widely-deployed TCP RENO

Single TCP stream between Caltech and CERN u Available (PCI-X) Bandwidth=8.5 Gbps u RTT=250ms (16’000 km) u 9000 Byte MTU u 15 min to increase throughput from 3 to 6 Gbps u Sending station:  Tyan S2882 motherboard, 2x Opteron 2.4 GHz, 2 GB DDR. u Receiving station:  CERN OpenLab:HP rx4640, 4x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter:  S2IO 10 GbE Burst of packet losses Single packet loss CPU load = 100%

ResponsivenessPathBandwidth RTT (ms) MTU (Byte) Time to recover LAN 10 Gb/s ms Geneva–Chicago 10 Gb/s hr 32 min Geneva-Los Angeles 1 Gb/s min Geneva-Los Angeles 10 Gb/s hr 51 min Geneva-Los Angeles 10 Gb/s min Geneva-Los Angeles 10 Gb/s k (TSO) 5 min Geneva-Tokyo 1 Gb/s hr 04 min  Large MTU accelerates the growth of the window  Time to recover from a packet loss decreases with large MTU  Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets)  C. RTT 2. MSS 2 C : Capacity of the link  Time to recover from a single packet loss:

 IPv6 record: 4.0 Gbps between Geneva and Phoenix (SC2003)  Single Stream 6.6 Gbps X 16 kkm with Linux  We have exceeded 100 Petabit- m/sec with both Linux & Windows   Distance has no significant effect on performance Monitoring of the Abilene traffic in LA: Internet 2 Land Speed Record (LSR) Petabitmeter (10^15 bit*meter) LSR History – IPv4 single stream June 2004 Record Network Monitoring of the Abilene Traffic in LA

TCP Variants  HSTCP, Scalable, Westwood+, Bic TCP, HTCP  FAST TCP  Based on TCP Vegas  Uses end-to-end delay and loss to dynamically adjust the congestion window  Defines an explicit equilibrium  7.3 Gbps between CERN and Caltech for hours  FAST vs RENO: 1G BW utilization 19% BW utilization 27% BW utilization 95% capacity = 1Gbps; 180 ms round trip latency; 1 flow 3600s Linux TCP Linux TCP (Optimized) FAST

TCP variants performance  Tests between CERN and Caltech  Capacity = OC Gbps; 264 ms round trip latency; 1 flow u Sending station: Tyan S2882 motherboard, 2 x Opteron 2.4 GHz, 2 GB DDR. u Receiving station: (CERN OpenLab): HP rx4640, 4 x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter: S2io 10 GE Linux TCP Linux Westwood+ Linux BIC TCP FAST 3.0 Gbps 5.0 Gbps7.3 Gbps 4.1 Gbps

New TCP implementations  Web100 patch   HSTCP, Scalable TCP, FAST (soon)  New stacks part of Linux kernel as options  Westwood+: sysctl –w /proc/sys/net/ipv4/tcp_westwood=1  Bic TCP: sysctl –w /proc/sys/net/ipv4/tcp_bic=1  TCP Vegas: sysctl –w /proc/sys/net/ipv4/tcp_vegas=1  Sender modification

High Throughput Disk to Disk Transfers: From 0.1 to 1GByte/sec  Server Hardware (Rather than Network) Bottlenecks:  Write/read and transmit tasks share the same limited resources: CPU, PCI-X bus, memory, IO chipset  PCI-X bus bandwidth: 8.5 Gbps [133MHz x 64 bit]  Link aggregation (802.3ad): Logical interface with two physical interfaces on two independent PCI-X buses.  LAN test: 11.1 Gbps (memory to memory) Performance in this range (from 100 MByte/sec up to 1 GByte/sec) is required to build a responsive Grid-based Processing and Analysis System for LHC

Transferring a TB from Caltech to CERN in 64-bit MS Windows  Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; 1TB file  3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks  Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization)  S2io SR 10GE NIC  10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization)  2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory)  Memory to Memory WAN data flow, and local Memory to Disk read/write flow, are not matched when combining the two operations  Quad Opteron AMD GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots.  Interrupt Affinity Filter: allows a user to change the CPU-affinity of the interrupts in a system.  Overcome packet loss with re-connect logic.  Proposed Internet2 Terabyte File Transfer Benchmark

UltraLight: Developing Advanced Network Services for Data Intensive HEP Applications  UltraLight: a next-generation hybrid packet- and circuit- switched network infrastructure  Packet switched: cost effective solution; requires ultrascale protocols to share 10G efficiently and fairly  Circuit-switched: Scheduled or sudden “overflow” demands handled by provisioning additional wavelengths; Use path diversity, e.g. across the US, Atlantic, Canada,…  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component  Using MonALISA to monitor and manage global systems  Partners: Caltech, UF, FIU, UMich, SLAC, FNAL, MIT/Haystack; CERN, NLR, CENIC, Internet2; Translight, UKLight, Netherlight; UvA, UCL, KEK, Taiwan  Strong support from Cisco

UltraLight Optical Exchange Point u L1, L2 and L3 services u Interfaces  1GE and 10GE  10GE WAN-PHY (SONET friendly) u Hybrid packet- and circuit-switched PoP  Interface between packet- & circuit-switched networks u Control plane is L3 Photonic switch Calient Photonic Cross Connect Switch

UltraLight MPLS Network u Compute path from one given node to another such that the path does not violate any constraints (bandwidth/administrative requirements) u Ability to set the path the traffic will take through the network (with simple configuration, management, and provisioning mechanisms)  Take advantage of the multiplicity of waves/L2 channels across the US (NLR, HOPI, Ultranet and Abilene/ESnet MPLS services) u EoMPLS will be used to build layer2 paths u natural step toward the deployment of GMPLS

Tenth of a Terabit/sec Challenge  Joint Caltech, FNAL, CERN, SLAC, UF….  Gbps Waves to HEP’s Booth  Bandwidth Challenge: Aggregate thruput of 100 Gbps  FAST TCP

Summary  For many years the Wide Area Network has been the bottleneck; this is no longer the case in many countries thus making deployment of a data intensive Grid infrastructure possible!  Recent I2LSR records show for the first time ever that the network can be truly transparent and that throughputs are limited by the end hosts  Challenge shifted from getting adequate bandwidth to deploying adequate infrastructure to make effective use of it!  Some transport protocol issues still need to be resolved; however there are many encouraging signs that practical solutions aremay now be in sight.  1GByte/sec disk to disk challenge. Today: 1 TB at 536 MB/sec from CERN to Caltech  Still in Early Stages; Expect Substantial Improvements  Next generation network and Grid system: UltraLight  Deliver the critical missing component for future eScience: the integrated, managed network  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component.