Download presentation
Presentation is loading. Please wait.
1
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” www.hep.man.ac.uk/~rich/
2
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 2 Outline uSC|05 TCP and UDP memory-2-memory & disk-2-disk flows 10 Gbit Ethernet uVLBI Jodrell Mark5 problem – see Matt’s Talk Data delay on a TCP link – How suitable is TCP? 4th Year MPhys Project Stephen Kershaw & James Keenan Throughput on the 630Mbit JB-JIVE UKLight Link 10 Gbit in FABRIC uATLAS Network tests on Manchester T2 farm The Manc-Lanc UKLight Link ATLAS Remote Farms uRAID Tests HEP server 8 lane PCIe RAID card
3
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 3 uSCINet uCaltech Booth uThe BWC at the SLAC Booth Collaboration at SC|05 uESLEA Boston Ltd. & Peta-Cache Sun uStorcloud
4
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 4 Bandwidth Challenge wins Hat Trick uThe maximum aggregate bandwidth was >151 Gbits/s 130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies in real-time u22 10Gigabit Ethernet waves Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes uShowed real-time particle event analysis uSLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight: transatlantic HEP disk to disk VLBI streaming 2 10 Gbit Links to SALC: rootd low-latency file access application for clusters Fibre Channel StorCloud 4 10 Gbit links to Fermi Dcache data transfers SC2004 101 Gbit/s In to booth Out of booth
5
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 5 ESLEA and UKLight u6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR uDisk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application to give ~850Mbit/s One stream of data 840-620 Mbit/s uStream UDP VLBI data UK to Seattle 620 Mbit/s Reverse TCP
6
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 6 SLAC 10 Gigabit Ethernet u2 Lightpaths: Routed over ESnet Layer 2 over Ultra Science Net u6 Sun V20Z systems per λ udcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s uUsed Netweion NICs & Chelsio TOE uData also sent to StorCloud using fibre channel links uTraffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk
7
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 7 VLBI Work TCP Delay and VLBI Transfers Manchester 4th Year MPhys Project by Stephen Kershaw & James Keenan
8
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 8 VLBI Network Topology
9
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 9 VLBI Application Protocol u VLBI data is Constant Bit Rate u tcpdelay instrumented TCP program emulates sending CBR Data. Records relative 1-way delay Data1 ●●● Timestamp1 Time TCP & Network Receiver Timestamp2 Sender Data2 Timestamp4 Timestamp5 Data4 Timestamp3 Data3 Packet loss RTT Time Sender Receiver ACK Segment time on wire = bits in segment/BW uRemember Bandwidth*Delay Product BDP = RTT*BW
10
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 10 Send time – 10000 packets Check the Send Time u10,000 Messages uMessage size: 1448 Bytes uWait time: 0 uTCP buffer 64k uRoute: Man-ukl-JIVE-prod-Man uRTT ~26 ms uSlope 0.44 ms/message uFrom TCP buffer size & RTT Expect ~42 messages/RTT ~0.6ms/message Send time sec 1 sec Message number
11
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 11 Send Time Detail 100 ms Message 102 Message 76 About 25 us One rtt Send time sec 26 messages Message number uTCP Send Buffer limited uAfter SlowStart Buffer full upackets sent out in bursts each RTT uProgram blocked on sendto()
12
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 12 1 way delay – 10000 packets 1-Way Delay 1 way delay 100 ms Message number 100 ms u10,000 Messages uMessage size: 1448 Bytes uWait time: 0 uTCP buffer 64k uRoute: Man-ukl-JIVE-prod-Man uRTT ~26 ms
13
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 13 = 1.5 x RTT = 1 x RTT 26 ms Message number ≠ 0.5 x RTT 1 way delay 10 ms 10 ms uWhy not just 1 RTT? uAfter SlowStart TCP Buffer Full uMessages at front of TCP Send Buffer have to wait for next burst of ACKs – 1 RTT later uMessages further back in the TCP Send Buffer wait for 2 RTT 1-Way Delay Detail
14
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 14 5 ms Message number uRoute: LAN gig8-gig1 uPing 188 μs u10,000 Messages uMessage size: 1448 Bytes uWait times: 0 μs uDrop 1 in 1000 uManc-JIVE tests show times increasing with a “saw-tooth” around 10 s 1-Way Delay with packet drop 800 us 28 ms 1 way delay 10 ms
15
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 15 10 Gbit in FABRIC
16
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 16 FABRIC 4Gbit Demo 4 Gbit Lightpath Between GÉANT PoPs Collaboration with Dante Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP tests
17
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 17 10 Gigabit Ethernet: UDP Data transfer on PCI-X uSun V20z 1.8GHz to 2.6 GHz Dual Opterons uConnect via 6509 uXFrame II NIC uPCI-X mmrbc 2048 bytes 66 MHz uOne 8000 byte packets 2.8us for CSRs 24.2 us data transfer effective rate 2.6 Gbit/s u2000 byte packet, wait 0us ~200ms pauses u8000 byte packet, wait 0us ~15ms between data blocks CSR Access 2.8us Data Transfer
18
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 18 ATLAS
19
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 19 ESLEA: ATLAS on UKLight 1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd
20
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 20 udpmon: Lanc-Manc Throughput Lanc Manc Plateau ~640 Mbit/s wire rate No packet Loss Manc Lanc ~800 Mbit/s but packet loss Send times Pause 695 μs every 1.7ms So expect ~600 Mbit/s Receive times (Manc end) No corresponding gaps
21
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 21 udpmon: Manc-Lanc Throughput Manc Lanc Plateau ~890 Mbit/s wire rate Packet Loss Large frames 10% when at line rate Small frames 60% when at line rate 1way delay
22
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 22 ATLAS Remote Computing: Application Protocol u Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes u Processing of event u Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation u tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. Send OK Send event data Request event ●●● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter Daemon EFD SFI and SFO
23
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 23 tcpmon: TCP Activity Manc-CERN Req-Resp Web100 hooks for TCP status Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms TCP Congestion window gets re-set on each Request TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms Transfer achievable throughput 120 Mbit/s Event rate very low Application not happy!
24
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 24 tcpmon: TCP Activity Manc-cern Req-Resp no cwnd reduction Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 19 rtt or ~ 380 ms TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) Transfer achievable throughput grows to 800 Mbit/s Data transferred WHEN the application requires the data 3 Round Trips 2 Round Trips
25
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 25 Recent RAID Tests Manchester HEP Server
26
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 26 “Server Quality” Motherboards u Boston/Supermicro H8DCi u Two Dual Core Opterons 1.8 GHz u 550 MHz DDR Memory u HyperTransport uChipset: nVidia nForce Pro 2200/2050 u AMD 8132 PCI-X Bridge u PCI 2 16 lane PCIe buses 1 4 lane PCIe 133 MHz PCI-X u 2 Gigabit Ethernet u SATA
27
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 27 Disk_test: areca PCI-Express 8 port Maxtor 300 GB Sata disks RAID0 5 disks Read 2.5 G bit/s Write 1.8 Gbit/s RAID5 5 data disks Read 1.7 Gbit/s Write 1.48 Gbit/s RAID6 5 data disks Read 2.1 Gbit/s Write 1.0 Gbit/s
28
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 28 Any Questions?
29
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 29 More Information Some URLs 1 uUKLight web site: http://www.uklight.ac.uk uMB-NG project web site: http://www.mb-ng.net/ uDataTAG project web site: http://www.datatag.org/ uUDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net uMotherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ uTCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 uPFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ uDante PERT http://www.geant2.net/server/show/nav.00d00h002
30
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 30 uLectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm uEncylopaedia http://www.freesoft.org/CIE/index.htm uTCP/IP Resources www.private.org.il/tcpip_rl.html uUnderstanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html uConfiguring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt uAssigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols http://www.es.net/pub/rfcs/rfc1010.txt More Information Some URLs 2
31
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 31 Backup Slides
32
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 32 SuperComputing
33
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 33 SC2004: Disk-Disk bbftp ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s is here!
34
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 34 SC|05 HEP: Moving data with bbcp uWhat is the end-host doing with your network protocol? uLook at the PCI-X u3Ware 9000 controller RAID0 u1 Gbit Ethernet link u2.4 GHz dual Xeon u~660 Mbit/s PCI-X bus with RAID Controller PCI-X bus with Ethernet NIC Read from disk for 44 ms every 100ms Write to Network for 72 ms uPower needed in the end hosts uCareful Application design
35
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 35 10 Gigabit Ethernet: UDP Throughput u1500 byte MTU gives ~ 2 Gbit/s uUsed 16144 byte MTU max user length 16080 uDataTAG Supermicro PCs uDual 2.2 GHz Xenon CPU FSB 400 MHz uPCI-X mmrbc 512 bytes uwire rate throughput of 2.9 Gbit/s uCERN OpenLab HP Itanium PCs uDual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.7 Gbit/s uSLAC Dell PCs giving a uDual 3.0 GHz Xenon CPU FSB 533 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.4 Gbit/s
36
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 36 10 Gigabit Ethernet: Tuning PCI-X u16080 byte packets every 200 µs uIntel PRO/10GbE LR Adapter uPCI-X bus occupancy vs mmrbc Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes 5.7Gbit/s mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update
37
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 37 10 Gigabit Ethernet: TCP Data transfer on PCI-X uSun V20z 1.8GHz to 2.6 GHz Dual Opterons uConnect via 6509 uXFrame II NIC uPCI-X mmrbc 4096 bytes 66 MHz uTwo 9000 byte packets b2b uAve Rate 2.87 Gbit/s uBurst of packets length 646.8 us uGap between bursts 343 us u2 Interrupts / burst CSR Access Data Transfer
38
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 38 TCP on the 630 Mbit Link Jodrell – UKLight – JIVE
39
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 39 TCP Throughput on 630 Mbit UKLight uManchester gig7 – JBO mk5 606 u4 Mbyte TCP buffer test 0 Dup ACKs seen Other Reductions test 1 test 2
40
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 40 Message 102 Message 76 100 ms Send time sec 26 messages Comparison of Send Time & 1-way delay Message number
41
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 41 uRoute: Man-ukl-ams-prod-man uRtt 27ms u10,000 Messages uMessage size: 1448 Bytes uWait times: 0 μs uDBP = 3.4MByte uTCP buffer 10MByte 1-Way Delay 1448 byte msg 50 ms Message number uWeb100 plot uStarts after 5.6 Sec due to Clock Sync. u~400 pkts/10ms uRate similar to iperf
42
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 42 Related Work: RAID, ATLAS Grid uRAID0 and RAID5 tests 4 th Year MPhys project last semester Throughput and CPU load Different RAID parameters Number of disks Stripe size User read / write size Different file systems Ext2 ext3 XSF Sequential File Write, Read Sequential File Write, Read with continuous background read or write uStatus Need to check some results & document Independent RAID controller tests planned.
43
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 43 uObjective: demo 1 Gbit/s aggregate bandwidth between RAL and 4 Tier 2 sites uRAL has SuperJANET4 and UKLight links: uRAL Capped firewall traffic at 800 Mbit/s uSuperJANET Sites: Glasgow Manchester Oxford QMWL uUKLight Site: Lancaster uMany concurrent transfers from RAL to each of the Tier 2 sites HEP: Service Challenge 4 ~700 Mbit UKLight Peak 680 Mbit SJ4 uApplications able to sustain high rates. uSuperJANET5, UKLight & new access links very timely
44
ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 44 Network switch limits behaviour uEnd2end UDP packets from udpmon Only 700 Mbit/s throughput Lots of packet loss Packet loss distribution shows throughput limited
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.