Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,

Similar presentations


Presentation on theme: "© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,"— Presentation transcript:

1 © 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,

2 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 2 ESLEA and UKLight at SC|05

3 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 3 ESLEA and UKLight  6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR  Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer / application to give ~850Mbit/s One stream of data 840 Mbit/s  Stream UDP VLBI data UK to Seattle 620 Mbit/s  No packet loss worked well Reverse TCP

4 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 4 SC|05 HEP: Moving data with bbcp  What is the end-host doing with your network protocol?  Look at the PCI-X  3Ware 9000 controller RAID0  1 Gbit Ethernet link  2.4 GHz dual Xeon  ~660 Mbit/s  Power needed in the end hosts  Careful Application design PCI-X bus with RAID Controller PCI-X bus with Ethernet NIC Read from disk for 44 ms every 100ms Write to Network for 72 ms

5 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 5 SC2004: Disk-Disk bbftp  bbftp file transfer program uses TCP/IP  UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0  MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off  Move a 2 Gbyte file  Web100 plots:  Standard TCP  Average 825 Mbit/s  (bbcp: 670 Mbit/s)  Scalable TCP  Average 875 Mbit/s  (bbcp: 701 Mbit/s ~4.5s of overhead)  Disk-TCP-Disk at 1Gbit/s works !!

6 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 6 Network & Disk Interactions  Hosts: Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size  Measure memory to RAID0 transfer rates with & without UDP traffic Disk write 1735 Mbit/s Disk write + 1500 MTU UDP 1218 Mbit/s Drop of 30% Disk write + 9000 MTU UDP 1400 Mbit/s Drop of 19% % CPU kernel mode

7 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 7 Remote Computing Farms in the ATLAS TDAQ Experiment

8 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 8 ATLAS Remote Farms – Network Connectivity

9 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 9 ATLAS Remote Computing: Application Protocol  Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes  Processing of event  Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation  tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. Send OK Send event data Request event ●●●●●● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter Daemon EFD SFI and SFO

10 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 10 TCP Activity Manc-CERN Req-Resp Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms TCP Congestion window gets re-set on each Request TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms Transfer achievable throughput 120 Mbit/s

11 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 11 TCP Activity Manc-cern Req-Resp TCP stack no cwnd reduction Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 19 rtt or ~ 380 ms TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) Transfer achievable throughput grows to 800 Mbit/s Data transferred WHEN the application requires the data 3 Round Trips 2 Round Trips

12 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 12 Round trip time 150 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 11 rtt or ~ 1.67 s TCP Activity Alberta-CERN Req-Resp TCP stack no Cwnd reduction TCP Congestion window in slow start to ~1.8s then congestion avoidance Response in 2 rtt after ~2.5s Rate 2.2/s (with 50ms wait) Transfer achievable throughput grows slowly from 250 to 800 Mbit/s

13 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 13 Moving Constant Bit-rate Data in Real-Time for Very Long Baseline Interferometry Stephen Kershaw, Ralph Spencer, Matt Strong, Simon Casey, Richard Hughes-Jones, The University of Manchester

14 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 14 What is VLBI ?  Data wave front send over the network to the Correlator  VLBI signal wave front Resolution Baseline Sensitivity Bandwidth B is as important as time τ : Can use as many Gigabits as we can get!

15 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 15 Dedicated DWDM link Onsala Sweden Gbit link Jodrell Bank UK Dwingeloo Netherland s Medicina Italy Chalmers University of Technolog y, Gothenbur g Torun Poland Gbit link Metsähovi Finland European e-VLBI Test Topology 2* 1 Gbit links

16 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 16 CBR Test Setup

17 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 17 CBR over TCP Timely arrival of data Effect of loss rate on message arrival time. TCP buffer 1.8 MB (BDP) RTT 27 ms When there is packet loss TCP decreases the rate. TCP buffer 0.9 MB (BDP) RTT 15.2 ms Can TCP deliver the data on time?

18 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 18 Message number / Time Packet loss Delay in stream Expected arrival time at CBR Arrival time Resynchronisation

19 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 19  Message size: 1448 Bytes  Data Rate: 525 Mbit/s  Route: Manchester - JIVE  RTT 15.2 ms  TCP buffer 160 MB  Drop 1 in 1.12 million packets  Throughput increases Peak throughput ~ 734 Mbit/s Min. throughput ~ 252 Mbit/s CBR over TCP – Large TCP Buffer

20 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 20  Message size: 1448 Bytes  Data Rate: 525 Mbit/s  Route: Manchester - JIVE  RTT 15.2 ms  TCP buffer 160 MB  Drop 1 in 1.12 million packets  Peak Delay ~2.5s CBR over TCP – Message Delay

21 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 21

22 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 22  Standard TCP not optimum for high throughput long distance links  Packet loss is a killer for TCP Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert  New stacks are stable and give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements”  TCP tries to be fair Large MTU has an advantage Short distances, small RTT, have an advantage  TCP does not share bandwidth well with other streams  The End Hosts themselves Plenty of CPU power is required for the TCP/IP stack as well and the application Packets can be lost in the IP stack due to lack of processing power Interaction between HW, protocol processing, and disk sub-system complex  Application architecture & implementation are also important The TCP protocol dynamics strongly influence the behaviour of the Application.  Users are now able to perform sustained 1 Gbit/s transfers Summary & Conclusions

23 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 23 Any Questions?

24 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 24 Network switch limits behaviour  End2end UDP packets from udpmon Only 700 Mbit/s throughput Lots of packet loss Packet loss distribution shows throughput limited

25 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 25 LightPath Topologies

26 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 26 Switched LightPaths [1]  Lightpaths are a fixed point to point path or circuit Optical links (with FEC) have a BER 10 -16 i.e. a packet loss rate 10 -12 or 1 loss in about 160 days In SJ5 LightPaths known as Bandwidth Channels  Host to host Lightpath One Application No congestion Advanced TCP stacks for large Delay Bandwidth Products  Lab to Lab Lightpaths Many application share Classic congestion points TCP stream sharing and recovery Advanced TCP stacks

27 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 27 Switched LightPaths [2]  Some applications suffer when using TCP may prefer to use UDP DCCP XCP …  E.g. With e-VLBI the data wave-front gets distorted and correlation fails  User Controlled Lightpaths Grid Scheduling of CPUs & Network Many Application flows No congestion on each path Lightweight framing possible

28 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 28  Chose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms)  Used iperf/TCP and UDT/UDP to generate traffic  Each run was 16 minutes, in 7 regions Test of TCP Sharing: Methodology (1Gbit/s) Ping 1/s Iperf or UDT ICMP/ping traffic TCP/UDP bottleneck iperf SLAC CERN 2 mins 4 mins Les Cottrell & RHJ PFLDnet 2005

29 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 29  Low performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor throughput Unequal sharing TCP Reno single stream Congestion has a dramatic effect Recovery is slow Increase recovery rate SLAC to CERN RTT increases when achieves best throughput Les Cottrell & RHJ PFLDnet 2005 Remaining flows do not take up slack when flow removed

30 © 2007 Open Grid Forum GHPN-RG OGF20 Manchester, May 2007 R. Hughes-Jones Manchester 30 Hamilton TCP  One of the best performers Throughput is high Big effects on RTT when achieves best throughput Flows share equally Appears to need >1 flow to achieve best throughput Two flows share equally SLAC-CERN > 2 flows appears less stable


Download ppt "© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,"

Similar presentations


Ads by Google