Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk.

Similar presentations


Presentation on theme: "Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk."— Presentation transcript:

1 Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk sub-systems Richard Hughes-Jones, Stephen Dallison The University of Manchester MB - NG

2 Slide: 2 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 2 Introduction uAIMD and High Bandwidth – Long Distance networks the assumption that packet loss means congestion is well known uFocus Data moving applications with different TCP stacks and network environments The interaction between network hardware, protocol stack and disk sub-system Almost a user view uWe studied Different TCP stacks: standard, HSTCP, Scalable, H-TCP, BIC, Westward Several Applications: bbftp, bbcp, Apache, gridftp 3 Networks: MB-NG, SuperJANET4, UKLight RAID0 and RAID5 controllers

3 Slide: 3 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 3 Topology of the MB – NG Network Key Gigabit Ethernet 2.5 Gbit POS Access MPLS Admin. Domains UCL Domain Edge Router Cisco 7609 man01 man03 Boundary Router Cisco 7609 RAL Domain Manchester Domain lon02 man02 ral01 UKERNA Development Network Boundary Router Cisco 7609 ral02 lon03 lon01 HW RAID

4 Slide: 4 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 4 Topology of the Production Network Key Gigabit Ethernet 2.5 Gbit POS Access 10 Gbit POS man01 RAL Domain Manchester Domain ral01 HW RAID routers switches 3 routers 2 switches

5 Slide: 5 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 5 SC2004 UKLIGHT Overview MB-NG 7600 OSR Manchester ULCC UKLight UCL HEP UCL network K2 Ci Chicago Starlight Amsterdam SC2004 Caltech Booth UltraLight IP SLAC Booth Cisco 6509 UKLight 10G Four 1GE channels UKLight 10G Surfnet/ EuroLink 10G Two 1GE channels NLR Lambda NLR-PITT-STAR-10GE-16 K2 Ci Caltech 7600

6 Slide: 6 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 6 Packet Loss with new TCP Stacks uTCP Response Function Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms

7 Slide: 7 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 7 Packet Loss and new TCP Stacks uTCP Response Function UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel Agreement with theory good

8 Slide: 8 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 8 iperf Throughput + Web100 u SuperMicro on MB-NG network u HighSpeed TCP u Linespeed 940 Mbit/s u DupACK ? <10 (expect ~400) u BaBar on Production network u Standard TCP u 425 Mbit/s u DupACKs 350-400 – re-transmits

9 Slide: 9 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 9 End Systems: NICs & Disks

10 Slide: 10 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 10 End Hosts & NICs SuperMicro P4DP6 Latency Throughput Bus Activity uUse UDP packets to characterise Host & NIC SuperMicro P4DP6 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 66 MHz 64 bit PCI bus

11 Slide: 11 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 11 RAID Controller Performance uRAID5 (stripped with redundancy) u3Ware 7506 Parallel 66 MHz 3Ware 7505 Parallel 33 MHz u3Ware 8506 Serial ATA 66 MHz ICP Serial ATA 33/66 MHz uTested on Dual 2.2 GHz Xeon Supermicro P4DP8-G2 motherboard uDisk: Maxtor 160GB 7200rpm 8MB Cache uRead ahead kernel tuning: /proc/sys/vm/max-readahead = 512 uRates for the same PC RAID0 (stripped) Read 1040 Mbit/s, Write 800 Mbit/s Disk – Memory Read Speeds Memory - Disk Write Speeds

12 Slide: 12 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 12 SC2004 RAID Controller Performance uSupermicro X5DPE-G2 motherboards loaned from Boston Ltd. uDual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory u3Ware 8506-8 controller on 133 MHz PCI-X bus uConfigured as RAID0 64k byte stripe size uSix 74.3 GByte Western Digital Raptor WD740 SATA disks 75 Mbyte/s disk-buffer 150 Mbyte/s buffer-memory uScientific Linux with 2.6.6 Kernel + altAIMD patch (Yee) + packet loss patch uRead ahead kernel tuning: /sbin/blockdev --setra 16384 /dev/sda uRAID0 (stripped) 2 GByte file: Read 1500 Mbit/s, Write 1725 Mbit/s Disk – Memory Read Speeds Memory - Disk Write Speeds

13 Slide: 13 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 13 Data Transfer Applications

14 Slide: 14 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 14 bbftp: Host & Network Effects u 2 Gbyte file RAID5 Disks: 1200 Mbit/s read 600 Mbit/s write u Scalable TCP u BaBar + SuperJANET Instantaneous 220 - 625 Mbit/s u SuperMicro + SuperJANET Instantaneous 400 - 665 Mbit/s for 6 sec Then 0 - 480 Mbit/s u SuperMicro + MB-NG Instantaneous 880 - 950 Mbit/s for 1.3 sec Then 215 - 625 Mbit/s

15 Slide: 15 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 15 bbftp: What else is going on? u Scalable TCP u BaBar + SuperJANET u SuperMicro + SuperJANET u Congestion window – dupACK u Variation not TCP related? Disk speed / bus transfer Application

16 Slide: 16 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 16 Applications: Throughput Mbit/s u HighSpeed TCP u 2 GByte file RAID5 u SuperMicro + SuperJANET u bbcp u bbftp u Apachie u Gridftp u Previous work used RAID0 (not disk limited)

17 Slide: 17 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 17 Average Transfer Rates Mbit/s AppTCP StackSuperMicro on MB-NG SuperMicro on SuperJANET4 BaBar on SuperJANET4 SC2004 on UKLight IperfStandard940350-370425940 HighSpeed940510570940 Scalable940580-650605940 bbcpStandard434290-310290 HighSpeed435385360 Scalable432400-430380 bbftpStandard400-410325320825 HighSpeed370-390380 Scalable430345-532380875 apacheStandard425260300-360 HighSpeed430370315 Scalable428400317 GridftpStandard405240 HighSpeed320 Scalable335 New stacks give more throughput Rate decreases

18 Slide: 18 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 18 Sc2004 & Transfers with UKLight

19 Slide: 19 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 19 SC2004 Disk-Disk bbftp (work in progress) ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s

20 Slide: 20 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 20 SC2004 Disk-Disk bbftp (work in progress) uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uHS TCP  Don ’ t believe this is a protocol problem !

21 Slide: 21 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 21 Network & Disk Interactions (work in progress) uHosts: Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size uMeasure memory to RAID0 transfer rates with & without UDP traffic Disk write 1735 Mbit/s Disk write + 1500 MTU UDP 1218 Mbit/s Drop of 30% Disk write + 9000 MTU UDP 1400 Mbit/s Drop of 19% % CPU kernel mode

22 Slide: 22 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 22 Network & Disk Interactions uDisk Write mem-disk: 1735 Mbit/s Tends to be in 1 die uDisk Write + UDP 1500 mem-disk : 1218 Mbit/s Both dies at ~80% uDisk Write + CPU  mem mem-disk : 1341 Mbit/s 1 CPU at ~60% other 20% Large user mode usage Below Cut = hi BW Hi BW = die1 used uDisk Write + CPUload mem-disk : 1334 Mbit/s 1 CPU at ~60% other 20% All CPUs saturated in user mode Total CPU load Kernel CPU load

23 Slide: 23 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 23 uHost is critical: Motherboards NICs, RAID controllers and Disks matter The NICs should be well designed: NIC should use 64 bit 133 MHz PCI-X (66 MHz PCI can be OK) NIC/drivers: CSR access / Clean buffer management / Good interrupt handling Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses 32 bit 33 MHz is too slow for Gigabit rates 64 bit 33 MHz > 80% used Choose a modern high throughput RAID controller Consider SW RAID0 of RAID5 HW controllers uNeed plenty of CPU power for sustained 1 Gbit/s transfers uPacket loss is a killer Check on campus links & equipment, and access links to backbones uNew stacks are stable give better response & performance Still need to set the tcp buffer sizes ! Check other kernel settings e.g. window-scale, uApplication architecture & implementation is also important uInteraction between HW, protocol processing, and disk sub-system complex Summary, Conclusions & Thanks MB - NG

24 Slide: 24 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 24 More Information Some URLs uUKLight web site: http://www.uklight.ac.uk uMB-NG project web site: http://www.mb-ng.net/ uDataTAG project web site: http://www.datatag.org/ uUDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net uMotherboard and NIC Tests: www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 uTCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004

25 Slide: 25 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 25 Backup Slides

26 Slide: 26 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 26 High Throughput Demonstrations Manchester (Geneva) man03lon01 2.5 Gbit SDH MB-NG Core 1 GEth Cisco GSR Cisco 7609 Cisco 7609 London (Chicago) Dual Zeon 2.2 GHz Send data with TCP Drop Packets Monitor TCP with Web100

27 Slide: 27 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 27 uDrop 1 in 25,000 urtt 6.2 ms uRecover in 1.6 s High Performance TCP – MB-NG StandardHighSpeed Scalable

28 Slide: 28 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 28 High Performance TCP – DataTAG uDifferent TCP stacks tested on the DataTAG Network u rtt 128 ms uDrop 1 in 10 6 uHigh-Speed Rapid recovery uScalable Very fast recovery uStandard Recovery would take ~ 20 mins

29 Slide: 29 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 29 SC2004 RAID Controller Performance uSupermicro X5DPE-G2 motherboards uDual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory u3Ware 8506-8 controller on 133 MHz PCI-X bus Configured as RAID0 64k byte stripe size usix 74.3 GByte Western Digital Raptor WD740 SATA disks 75 Mbyte/s disk-buffer 150 Mbyte/s buffer-memory uScientific Linux with 2.4.20 Kernel + altAIMD patch (Yee) + packet loss patch uRead ahead kernel tuning: /proc/sys/vm/max-readahead = 512 uRAID0 (stripped) 2Gbyte file: Read 1460 Mbit/s, Write 1320 Mbit/s Disk – Memory Read Speeds Memory - Disk Write Speeds

30 Slide: 30 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 30 The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity u3Ware 7500-8 RAID5 parallel EIDE u3Ware forces PCI bus to 33 MHz uBaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s uDisk – disk throughput bbcp 40-45 Mbytes/s (320 – 360 Mbit/s) uPCI bus effectively full! uUser throughput ~ 250 Mbit/s Read from RAID5 Disks Write to RAID5 Disks

31 Slide: 31 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 31 Gridftp Throughput + Web100 u RAID0 Disks: 960 Mbit/s read 800 Mbit/s write u Throughput Mbit/s: u See alternate 600/800 Mbit and zero u Data Rate: 520 Mbit/s u Cwnd smooth u No dup Ack / send stall / timeouts

32 Slide: 32 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 32 http data transfers HighSpeed TCP u Same Hardware u RAID0 Disks u Bulk data moved by web servers u Apachie web server out of the box! uprototype client - curl http library u1Mbyte TCP buffers u2Gbyte file u Throughput ~720 Mbit/s u Cwnd - some variation u No dup Ack / send stall / timeouts

33 Slide: 33 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 33 Bbcp & GridFTP Throughput u RAID5 - 4disks Manc – RAL u 2Gbyte file transferred u bbcp u Mean 710 Mbit/s u GridFTP u See many zeros Mean ~710 Mean ~620 u DataTAG altAIMD kernel in BaBar & ATLAS


Download ppt "Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk."

Similar presentations


Ads by Google