S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?

S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier? High Speed WAN data transfers for Science Monalisa CERN

Agenda u TCP limitations u Internet2 Land speed record u New TCP implementations u Disk to disk transfers: Breaking the 1 GByte/sec Barrier u Light paths : UltraLight

Single TCP stream performance under periodic losses Loss rate =0.01%: è LAN BW utilization= 99% è WAN BW utilization=1.2% Bandwidth available = 1 Gbps  TCP throughput much more sensitive to packet loss in WANs than LANs  TCP’s congestion control algorithm (AIMD) is not well-suited to gigabit networks  The effect of packets loss can be disastrous  TCP is inefficient in high bandwidth*delay networks  The future performance-outlook for computational grids looks bad if we continue to rely solely on the widely-deployed TCP RENO

Single TCP stream between Caltech and CERN u Available (PCI-X) Bandwidth=8.5 Gbps u RTT=250ms (16’000 km) u 9000 Byte MTU u 15 min to increase throughput from 3 to 6 Gbps u Sending station:  Tyan S2882 motherboard, 2x Opteron 2.4 GHz, 2 GB DDR. u Receiving station:  CERN OpenLab:HP rx4640, 4x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter:  S2IO 10 GbE Burst of packet losses Single packet loss CPU load = 100%

ResponsivenessPathBandwidth RTT (ms) MTU (Byte) Time to recover LAN 10 Gb/s 11500 430 ms Geneva–Chicago 10 Gb/s 1201500 1 hr 32 min Geneva-Los Angeles 1 Gb/s 1801500 23 min Geneva-Los Angeles 10 Gb/s 1801500 3 hr 51 min Geneva-Los Angeles 10 Gb/s 1809000 38 min Geneva-Los Angeles 10 Gb/s 180 64k (TSO) 5 min Geneva-Tokyo 1 Gb/s 3001500 1 hr 04 min  Large MTU accelerates the growth of the window  Time to recover from a packet loss decreases with large MTU  Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets)  C. RTT 2. MSS 2 C : Capacity of the link  Time to recover from a single packet loss:

 IPv6 record: 4.0 Gbps between Geneva and Phoenix (SC2003)  Single Stream 6.6 Gbps X 16 kkm with Linux  We have exceeded 100 Petabit- m/sec with both Linux & Windows  http://www.guinnessworldrecords.com/ http://www.guinnessworldrecords.com/  Distance has no significant effect on performance Monitoring of the Abilene traffic in LA: Internet 2 Land Speed Record (LSR) Petabitmeter (10^15 bit*meter) LSR History – IPv4 single stream June 2004 Record Network Monitoring of the Abilene Traffic in LA

TCP Variants  HSTCP, Scalable, Westwood+, Bic TCP, HTCP  FAST TCP  Based on TCP Vegas  Uses end-to-end delay and loss to dynamically adjust the congestion window  Defines an explicit equilibrium  7.3 Gbps between CERN and Caltech for hours  FAST vs RENO: 1G BW utilization 19% BW utilization 27% BW utilization 95% capacity = 1Gbps; 180 ms round trip latency; 1 flow 3600s Linux TCP Linux TCP (Optimized) FAST

TCP variants performance  Tests between CERN and Caltech  Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow u Sending station: Tyan S2882 motherboard, 2 x Opteron 2.4 GHz, 2 GB DDR. u Receiving station: (CERN OpenLab): HP rx4640, 4 x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter: S2io 10 GE Linux TCP Linux Westwood+ Linux BIC TCP FAST 3.0 Gbps 5.0 Gbps7.3 Gbps 4.1 Gbps

New TCP implementations  Web100 patch  www.web100.org  HSTCP, Scalable TCP, FAST (soon)  New stacks part of Linux 2.6.6 kernel as options  Westwood+: sysctl –w /proc/sys/net/ipv4/tcp_westwood=1  Bic TCP: sysctl –w /proc/sys/net/ipv4/tcp_bic=1  TCP Vegas: sysctl –w /proc/sys/net/ipv4/tcp_vegas=1  Sender modification

High Throughput Disk to Disk Transfers: From 0.1 to 1GByte/sec  Server Hardware (Rather than Network) Bottlenecks:  Write/read and transmit tasks share the same limited resources: CPU, PCI-X bus, memory, IO chipset  PCI-X bus bandwidth: 8.5 Gbps [133MHz x 64 bit]  Link aggregation (802.3ad): Logical interface with two physical interfaces on two independent PCI-X buses.  LAN test: 11.1 Gbps (memory to memory) Performance in this range (from 100 MByte/sec up to 1 GByte/sec) is required to build a responsive Grid-based Processing and Analysis System for LHC

Transferring a TB from Caltech to CERN in 64-bit MS Windows  Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; 1TB file  3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks  Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization)  S2io SR 10GE NIC  10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization)  2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory)  Memory to Memory WAN data flow, and local Memory to Disk read/write flow, are not matched when combining the two operations  Quad Opteron AMD848 2.2GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots.  Interrupt Affinity Filter: allows a user to change the CPU-affinity of the interrupts in a system.  Overcome packet loss with re-connect logic.  Proposed Internet2 Terabyte File Transfer Benchmark

UltraLight: Developing Advanced Network Services for Data Intensive HEP Applications  UltraLight: a next-generation hybrid packet- and circuit- switched network infrastructure  Packet switched: cost effective solution; requires ultrascale protocols to share 10G efficiently and fairly  Circuit-switched: Scheduled or sudden “overflow” demands handled by provisioning additional wavelengths; Use path diversity, e.g. across the US, Atlantic, Canada,…  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component  Using MonALISA to monitor and manage global systems  Partners: Caltech, UF, FIU, UMich, SLAC, FNAL, MIT/Haystack; CERN, NLR, CENIC, Internet2; Translight, UKLight, Netherlight; UvA, UCL, KEK, Taiwan  Strong support from Cisco

UltraLight Optical Exchange Point u L1, L2 and L3 services u Interfaces  1GE and 10GE  10GE WAN-PHY (SONET friendly) u Hybrid packet- and circuit-switched PoP  Interface between packet- & circuit-switched networks u Control plane is L3 Photonic switch Calient Photonic Cross Connect Switch

UltraLight MPLS Network u Compute path from one given node to another such that the path does not violate any constraints (bandwidth/administrative requirements) u Ability to set the path the traffic will take through the network (with simple configuration, management, and provisioning mechanisms)  Take advantage of the multiplicity of waves/L2 channels across the US (NLR, HOPI, Ultranet and Abilene/ESnet MPLS services) u EoMPLS will be used to build layer2 paths u natural step toward the deployment of GMPLS

Tenth of a Terabit/sec Challenge  Joint Caltech, FNAL, CERN, SLAC, UF….  11 10 Gbps Waves to HEP’s Booth  Bandwidth Challenge: Aggregate thruput of 100 Gbps  FAST TCP

Summary  For many years the Wide Area Network has been the bottleneck; this is no longer the case in many countries thus making deployment of a data intensive Grid infrastructure possible!  Recent I2LSR records show for the first time ever that the network can be truly transparent and that throughputs are limited by the end hosts  Challenge shifted from getting adequate bandwidth to deploying adequate infrastructure to make effective use of it!  Some transport protocol issues still need to be resolved; however there are many encouraging signs that practical solutions aremay now be in sight.  1GByte/sec disk to disk challenge. Today: 1 TB at 536 MB/sec from CERN to Caltech  Still in Early Stages; Expect Substantial Improvements  Next generation network and Grid system: UltraLight  Deliver the critical missing component for future eScience: the integrated, managed network  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component.

S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?

Similar presentations

Presentation on theme: "S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?

Similar presentations

Presentation on theme: "S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?"— Presentation transcript:

Similar presentations

About project

Feedback