High Speed File Replication Les Cottrell & Davide Salomoni – SLAC www.slac.stanford.edu/grp/scs/net/talk/thru-escc-nov00/ Presented at ESCC meeting, San Ramon, CA Nov. 2000 Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
Overview PPDG & replication High thruput measurements Methodology LAN WAN SC2000 Packet reordering Impact on others How do we measure the QoS Introduction to PingER and active end-to-end measurement methodology Problem areas illustrated by results from PingER: Generally, e.g. S. America, Spain, China, Germany to .edu & .ca How do E. Europe & Russia look? How does performance affect applications Validating ping measurements and impact on FTP & Web performance Overview of impact of performance on applications including email, web, FTP, interactive apps Detailed look at bulk data transfer expectations for HENP sites Detailed look at critical performance metrics (RTT, loss, jitter, availability) and impact on VoIP What can be done to improve QoS: More bandwidth Reserved bandwidth Differentiated services
Participants in PPDG & GriPhyN Projects Nets Caltech SLAC CalREN NTON ESNet Abilene SDSC MREN Fermilab Wisconsin Indiana Boston BNL JLAB ANL Florida LBNL/UCB Sites
Site to site replication service PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot SECONDARY SITE CPU, Disk, Tape Robot Network Protocols Tuned for High Throughput Use of DiffServ for (1) Predictable high priority delivery of high - bandwidth data streams (2) Reliable background transfers Use of integrated instrumentation to detect/diagnose/correct problems in long-lived high speed transfers [NetLogger + DoE/NGI developments] Coordinated reservaton/allocation techniques for storage-to-storage performance
Measurement methodology Iperf with multiple windows & streams Selected 10 sites Of critical interest to have high performance with SLAC Can get iperf servers installed Production links, do not control utilization
What does thruput depend on ? Bandwidth end to end, i.e. min(BWlinks) AKA bottleneck bandwidth Round Trip Time (RTT) For TCP keep pipe full window ~ RTT*BW Thruput ~ 1/(sqrt(loss)) Competing utilization Src Rcv RTT ACK Time Pipe=RTT*BW
LAN thruput vs windows & streams Hi-perf = big windows & multiple streams Default window size
LAN throughput measurements Sun/Solaris vs PIII/Linux
Progress towards goal: 100 Mbytes/s Site-to-Site Focus on SLAC – Caltech over NTON; Using NTON wavelength division fibers up & down W. Coast US; Replaced Exemplar with 8*OC3 & Suns with Pentium IIIs & OC12 (622Mbps) SLAC Cisco 12000 with OC48 (2.4Gbps) and 2 × OC12; Caltech Juniper M160 & OC48 ~500 Mbits/s single stream achieved recently over OC12.
Intercontinental high performance thruput on production networks SLAC (California US) to CERN (Switzerland)
SLAC to CERN thruput vs windows & streams Hi-perf = big windows & multiple streams 1MB Improves ~ linearly with streams for small windows 100kB 64kB 16kB 32kB 8kB Default window size
SLAC to CERN thruput vs windows & streams >8Mbps >10Mbps Window size (kB) <2Mbps Parallel streams Animate1 Animate2
E.g. thruput vs windows & streams ANL Caltech Colorado Window IN2P3, FR CERN, CH Mbits/s Daresbury, UK Streams I NFN, IT Mbits/s Mbits/s
Measured WAN thruput Poor agreement improvement w move to I2 Improve with time (upgrades)
Iperf throughput conclusions Pathchar does a poor job of predicting thruput at these rates Need > 1 stream Can get close to max thruput with small (<=32Mbyte) with sufficient (5-10) streams Improvements of 5 to 60 in thruput by using multiple streams & larger windows Increasing streams often more effective than increasing windows See www-iepm.slac.stanford.edu/monitoring/bulk/
File transfer + Compression Bbftp tool from Gilles Farrache, IN2P3 10 streams & 256 kByte window SLAC > Lyon, FR got 25Mbps with NO compression With compression cpu power is important Sun E4500 4 cpu @ 336Mhz best could do was 13.6Mbps with 5 streams, more streams go slower (e.g. 10 streams =>7.4Mbps) Sun E450 4 cpu @ 450MHz 26MHz with 10 streams Compression factor 2-3 times So data rate boosted to 27 – 41 Mbps (E4500) or 52 - 78 Mbps (E450)
SC2000 WAN Challenge SC2000, Dallas to SLAC RTT ~ 48msec SLAC/FNAL booth: Dell PowerEdge PIII 2 * 550MHz with 64bit PCI + Dell 850MHz both running Linux, each with GigE, connected to Cat 6009 with 2GigE bonded to SCInet Extreme switch NTON/SLAC: OC48 to GSR to Cat 5500 Gig E to Sun E4500 4*460MHz and Sun E4500 6*336MHz Internet 2: 300 Mbits/s NTON: measured 990Mbits/s ==200TBytes in 20 days (BaBar yearly production ==250MBytes in 2 seconds (12K BaBar events or 1% error stats) ==copy 10 minute QuickTime movie in ~ 1 second == 50K simultaneous VoIP calls (enough for 500 sites like SLAC)
Packet reordering Impacts TCP congestion avoidance algorithms Is more common than had been thought Took 256 PingER sites Measured with pings 5 * 1 sec separations 50 back-to-back Look for out of sequence
Impact on Others Make ping measurements with & without iperf loading Loss loaded(unloaded) RTT
Effect of load on other traffic Measured ping RTT for 60 secs on normally unloaded link with & without iperf load Difference ~30-50ms
Possible alleviation Less than Best Effort QoS Choose streams to optimize thruput/impact Measure RTT and use to control number of streams
WAN thruput conclusions High FTP performance across WAN links is possible Even with 20-30Mbps bottleneck can do > 100Gbytes/day OS must support big windows selectable by application Need multiple parallel streams Loss is important in particular interval between losses Compression looks promising, but needs cpu power Can get close to max thruput with small (<=32Mbyte) with sufficient (5-10) streams Improvements of 5 to 60 in thruput by using multiple streams & larger windows Impacts others users, need Less than Best Effort QoS service or friendlier applications
More Information This talk: IEPM/PingER home site Packet reordering www.slac.stanford.edu/grp/scs/net/talk/thru-escc-nov00/ IEPM/PingER home site www-iepm.slac.stanford.edu/ Bulk throughput measurements: www-iepm.slac.stanford.edu/monitoring/bulk/ Effect of load on thruput & loss www-iepm.slac.stanford.edu/monitoring/load/ Windows vs. streams www-iepm.slac.stanford.edu/monitoring/bulk/window-vs-streams.html SC2000 thruput to SLAC www-iepm.slac.stanford.edu/monitoring/bulk/sc2k.html Packet reordering www-iepm.slac.stanford.edu/monitoring/reorder/