Presentation is loading. Please wait.

Presentation is loading. Please wait.

S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?

Similar presentations


Presentation on theme: "S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?"— Presentation transcript:

1 S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier? High Speed WAN data transfers for Science Monalisa CERN

2 Agenda u TCP limitations u Internet2 Land speed record u New TCP implementations u Disk to disk transfers: Breaking the 1 GByte/sec Barrier u Light paths : UltraLight

3 Single TCP stream performance under periodic losses Loss rate =0.01%: è LAN BW utilization= 99% è WAN BW utilization=1.2% Bandwidth available = 1 Gbps  TCP throughput much more sensitive to packet loss in WANs than LANs  TCP’s congestion control algorithm (AIMD) is not well-suited to gigabit networks  The effect of packets loss can be disastrous  TCP is inefficient in high bandwidth*delay networks  The future performance-outlook for computational grids looks bad if we continue to rely solely on the widely-deployed TCP RENO

4 Single TCP stream between Caltech and CERN u Available (PCI-X) Bandwidth=8.5 Gbps u RTT=250ms (16’000 km) u 9000 Byte MTU u 15 min to increase throughput from 3 to 6 Gbps u Sending station:  Tyan S2882 motherboard, 2x Opteron 2.4 GHz, 2 GB DDR. u Receiving station:  CERN OpenLab:HP rx4640, 4x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter:  S2IO 10 GbE Burst of packet losses Single packet loss CPU load = 100%

5 ResponsivenessPathBandwidth RTT (ms) MTU (Byte) Time to recover LAN 10 Gb/s 11500 430 ms Geneva–Chicago 10 Gb/s 1201500 1 hr 32 min Geneva-Los Angeles 1 Gb/s 1801500 23 min Geneva-Los Angeles 10 Gb/s 1801500 3 hr 51 min Geneva-Los Angeles 10 Gb/s 1809000 38 min Geneva-Los Angeles 10 Gb/s 180 64k (TSO) 5 min Geneva-Tokyo 1 Gb/s 3001500 1 hr 04 min  Large MTU accelerates the growth of the window  Time to recover from a packet loss decreases with large MTU  Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets)  C. RTT 2. MSS 2 C : Capacity of the link  Time to recover from a single packet loss:

6  IPv6 record: 4.0 Gbps between Geneva and Phoenix (SC2003)  Single Stream 6.6 Gbps X 16 kkm with Linux  We have exceeded 100 Petabit- m/sec with both Linux & Windows  http://www.guinnessworldrecords.com/ http://www.guinnessworldrecords.com/  Distance has no significant effect on performance Monitoring of the Abilene traffic in LA: Internet 2 Land Speed Record (LSR) Petabitmeter (10^15 bit*meter) LSR History – IPv4 single stream June 2004 Record Network Monitoring of the Abilene Traffic in LA

7 TCP Variants  HSTCP, Scalable, Westwood+, Bic TCP, HTCP  FAST TCP  Based on TCP Vegas  Uses end-to-end delay and loss to dynamically adjust the congestion window  Defines an explicit equilibrium  7.3 Gbps between CERN and Caltech for hours  FAST vs RENO: 1G BW utilization 19% BW utilization 27% BW utilization 95% capacity = 1Gbps; 180 ms round trip latency; 1 flow 3600s Linux TCP Linux TCP (Optimized) FAST

8 TCP variants performance  Tests between CERN and Caltech  Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow u Sending station: Tyan S2882 motherboard, 2 x Opteron 2.4 GHz, 2 GB DDR. u Receiving station: (CERN OpenLab): HP rx4640, 4 x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter: S2io 10 GE Linux TCP Linux Westwood+ Linux BIC TCP FAST 3.0 Gbps 5.0 Gbps7.3 Gbps 4.1 Gbps

9 New TCP implementations  Web100 patch  www.web100.org  HSTCP, Scalable TCP, FAST (soon)  New stacks part of Linux 2.6.6 kernel as options  Westwood+: sysctl –w /proc/sys/net/ipv4/tcp_westwood=1  Bic TCP: sysctl –w /proc/sys/net/ipv4/tcp_bic=1  TCP Vegas: sysctl –w /proc/sys/net/ipv4/tcp_vegas=1  Sender modification

10 High Throughput Disk to Disk Transfers: From 0.1 to 1GByte/sec  Server Hardware (Rather than Network) Bottlenecks:  Write/read and transmit tasks share the same limited resources: CPU, PCI-X bus, memory, IO chipset  PCI-X bus bandwidth: 8.5 Gbps [133MHz x 64 bit]  Link aggregation (802.3ad): Logical interface with two physical interfaces on two independent PCI-X buses.  LAN test: 11.1 Gbps (memory to memory) Performance in this range (from 100 MByte/sec up to 1 GByte/sec) is required to build a responsive Grid-based Processing and Analysis System for LHC

11 Transferring a TB from Caltech to CERN in 64-bit MS Windows  Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; 1TB file  3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks  Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization)  S2io SR 10GE NIC  10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization)  2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory)  Memory to Memory WAN data flow, and local Memory to Disk read/write flow, are not matched when combining the two operations  Quad Opteron AMD848 2.2GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots.  Interrupt Affinity Filter: allows a user to change the CPU-affinity of the interrupts in a system.  Overcome packet loss with re-connect logic.  Proposed Internet2 Terabyte File Transfer Benchmark

12 UltraLight: Developing Advanced Network Services for Data Intensive HEP Applications  UltraLight: a next-generation hybrid packet- and circuit- switched network infrastructure  Packet switched: cost effective solution; requires ultrascale protocols to share 10G efficiently and fairly  Circuit-switched: Scheduled or sudden “overflow” demands handled by provisioning additional wavelengths; Use path diversity, e.g. across the US, Atlantic, Canada,…  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component  Using MonALISA to monitor and manage global systems  Partners: Caltech, UF, FIU, UMich, SLAC, FNAL, MIT/Haystack; CERN, NLR, CENIC, Internet2; Translight, UKLight, Netherlight; UvA, UCL, KEK, Taiwan  Strong support from Cisco

13 UltraLight Optical Exchange Point u L1, L2 and L3 services u Interfaces  1GE and 10GE  10GE WAN-PHY (SONET friendly) u Hybrid packet- and circuit-switched PoP  Interface between packet- & circuit-switched networks u Control plane is L3 Photonic switch Calient Photonic Cross Connect Switch

14 UltraLight MPLS Network u Compute path from one given node to another such that the path does not violate any constraints (bandwidth/administrative requirements) u Ability to set the path the traffic will take through the network (with simple configuration, management, and provisioning mechanisms)  Take advantage of the multiplicity of waves/L2 channels across the US (NLR, HOPI, Ultranet and Abilene/ESnet MPLS services) u EoMPLS will be used to build layer2 paths u natural step toward the deployment of GMPLS

15 Tenth of a Terabit/sec Challenge  Joint Caltech, FNAL, CERN, SLAC, UF….  11 10 Gbps Waves to HEP’s Booth  Bandwidth Challenge: Aggregate thruput of 100 Gbps  FAST TCP

16 Summary  For many years the Wide Area Network has been the bottleneck; this is no longer the case in many countries thus making deployment of a data intensive Grid infrastructure possible!  Recent I2LSR records show for the first time ever that the network can be truly transparent and that throughputs are limited by the end hosts  Challenge shifted from getting adequate bandwidth to deploying adequate infrastructure to make effective use of it!  Some transport protocol issues still need to be resolved; however there are many encouraging signs that practical solutions aremay now be in sight.  1GByte/sec disk to disk challenge. Today: 1 TB at 536 MB/sec from CERN to Caltech  Still in Early Stages; Expect Substantial Improvements  Next generation network and Grid system: UltraLight  Deliver the critical missing component for future eScience: the integrated, managed network  Extend and augment existing grid computing infrastructures (currently focused on CPU/storage) to include the network as an integral component.


Download ppt "S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?"

Similar presentations


Ads by Google