Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations.

Similar presentations


Presentation on theme: "GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations."— Presentation transcript:

1 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations including: MB - NG Close links with: SLAC UKERNA, SURFNET and other NRNs Dante Internet2 Starlight, Netherlight GGF Ripe Industry …

2 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 2 Network Monitoring [1] Architecture DataGrid WP7 code extended by Gareth Manc Technology transfer to UK e-Science Developed by Mark Lees DL Fed back into DataGrid by Gareth Links to: GGF NM-WG, Dante, Internet2 Characteristics, Schema & web services Success

3 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 3 Network Monitoring [2] 24 Jan to 4 Feb 04 TCP iperf RAL to HEP Only 2 sites >80 Mbit/s 24 Jan to 4 Feb 04 TCP iperf DL to HEP HELP !

4 RIPE-47, Amsterdam, 29 January 2004 High bandwidth, Long distance…. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

5 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 5 Throughput… Whats the problem? One Terabyte of data transferred in less than an hour On February 27-28 2003, the transatlantic DataTAG network was extended, i.e. CERN - Chicago - Sunnyvale (>10000 km). For the first time, a terabyte of data was transferred across the Atlantic in less than one hour using a single TCP (Reno) stream. The transfer was accomplished from Sunnyvale to Geneva at a rate of 2.38 Gbits/s

6 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 6 Internet2 Land Speed Record On October 1 2003, DataTAG set a new Internet2 Land Speed Record by transferring 1.1 Terabytes of data in less than 30 minutes from Geneva to Chicago across the DataTAG provision, corresponding to an average rate of 5.44 Gbits/s using a single TCP (Reno) stream

7 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 7 So how did we do that? Management of the End-to-End Connection Memory-to-Memory transfer; no disk system involved Processor speed and system bus characteristics TCP Configuration – window size and frame size (MTU) Network Interface Card and associated driver and their configuration End-to-End no loss environment from CERN to Sunnyvale! At least a 2.5 Gbits/s capacity pipe on the end-to-end path A single TCP connection on the end-to-end path No real user application Thats to say - not the usual User experience!

8 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 8 Realistically – whats the problem & why do network research? End System Issues Network Interface Card and Driver and their configuration TCP and its configuration Operating System and its configuration Disk System Processor speed Bus speed and capability Network Infrastructure Issues Obsolete network equipment Configured bandwidth restrictions Topology Security restrictions (e.g., firewalls) Sub-optimal routing Transport Protocols Network Capacity and the influence of Others! Many, many TCP connections Mice and Elephants on the path Congestion

9 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 9 End Hosts: Buses, NICs and Drivers Latency Throughput Bus Activity Use UDP packets to characterise Intel PRO/10GbE Server Adaptor SuperMicro P4DP8-G2 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 133 MHz PCI-X bus

10 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 10 End Hosts: Understanding NIC Drivers Linux driver basics – TX Application system call Encapsulation in UDP/TCP and IP headers Enqueue on device send queue Driver places information in DMA descriptor ring NIC reads data from main memory via DMA and sends on wire NIC signals to processor that TX descriptor sent Linux driver basics – RX NIC places data in main memory via DMA to a free RX descriptor NIC signals RX descriptor has data Driver passes frame to IP layer and cleans RX descriptor IP layer passes data to application Linux NAPI driver model On receiving a packet, NIC raises interrupt Driver switches off RX interrupts and schedules RX DMA ring poll Frames are pulled off DMA ring and is processed up to application When all frames are processed RX interrupts are re-enabled Dramatic reduction in RX interrupts under load Improving the performance of a Gigabit Ethernet driver under Linux http://datatag.web.cern.ch/datatag/papers/drafts/linux_kernel_map/

11 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 11 Protocols: TCP (Reno) – Performance AIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½

12 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 12 Protocols: HighSpeed TCP & Scalable TCP Adjusting the AIMD Algorithm – TCP Reno For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ High Speed TCP a and b vary depending on current cwnd where a increases more rapidly with larger cwnd and as a consequence returns to the optimal cwnd size sooner for the network path; and b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd such that the increase is greater than TCP Reno, and the decrease on loss is less than TCP Reno

13 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 13 Protocols: HighSpeed TCP & Scalable TCP HighSpeed TCP Scalable TCP HighSpeed TCP implemented by Gareth Manc Scalable TCP implemented by Tom Kelly Camb Integration of stacks into DataTAG Kernel Yee UCL + Gareth Success

14 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 14 Some Measurements of Throughput CERN -SARA Using the GÉANT Backup Link 1 GByte file transfers Blue Data Red TCP ACKs Standard TCP Average Throughput 167 Mbit/s Users see 5 - 50 Mbit/s! High-Speed TCP Average Throughput 345 Mbit/s Scalable TCP Average Throughput 340 Mbit/s

15 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 15 Users, The Campus & the MAN [1] NNW – to – SJ4 Access 2.5 Gbit PoS Hits 1 Gbit 50 % Man – NNW Access 2 * 1 Gbit Ethernet Pete White Pat Meyrs

16 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 16 Users, The Campus & the MAN [2] LMN to site 1 Access 1 Gbit Ethernet LMN to site 2 Access 1 Gbit Ethernet Message: Continue to work with your network group Understand the traffic levels Understand the Network Topology

17 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 17 10 GigEthernet: Tuning PCI-X

18 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 18 10 GigEthernet at SC2003 BW Challenge (Phoenix) Three Server systems with 10 GigEthernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Streams From SLAC/FNAL booth in Phoenix to: Pal Alto PAIX 17 ms rtt Chicago Starlight 65 ms rtt Amsterdam SARA 175 ms rtt

19 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 19 Helping Real Users [1] Radio Astronomy VLBI PoC with NRNs & GEANT 1024 Mbit/s 24 on 7 NOW

20 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 20 1472 byte Packets man -> JIVE FWHM 22 µs (B2B 3 µs ) VLBI Project: Throughput Jitter & 1-way Delay 1-way Delay – note the packet loss (points with zero 1 –way delay) 1472 byte Packets Manchester -> Dwingeloo JIVE

21 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 21 Measure the time between lost packets in the time series of packets sent. Lost 1410 in 0.6s Is it a Poisson process? Assume Poisson is stationary λ(t) = λ Use Prob. Density Function: P(t) = λ e -λt Mean λ = 2360 / s [426 µs] Plot log: slope -0.0028 expect -0.0024 Could be additional process involved VLBI Project: Packet Loss Distribution

22 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 22 VLBI Traffic Flows – Only testing! Manchester – NetNorthWest - SuperJANET Access links Two 1 Gbit/s Access links: SJ4 to GÉANT GÉANT to SurfNet

23 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 23 Throughput & PCI transactions on the Mark5 PC: Read / Write n bytes Wait time time Mark5 uses Supermicro P3TDLE 1.2 GHz PIII Mem bus 133/100 MHz 2 *64bit 66 MHz PCI 4 32bit 33 MHz PCI SuperStor NIC Input Card IDE Disc Pack Ethernet Logic Analyser Display

24 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 24 PCI Activity: Read Multiple data blocks 0 wait Read 999424 bytes Each Data block: Setup CSRs Data movement Update CSRs For 0 wait between reads: Data blocks ~600µs long take ~6 ms Then 744µs gap PCI transfer rate 1188Mbit/s (148.5 Mbytes/s) Read_sstor rate 778 Mbit/s (97 Mbyte/s) PCI bus occupancy: 68.44% Concern about Ethernet Traffic 64 bit 33 MHz PCI needs ~ 82% for 930 Mbit/s Expect ~360 Mbit/s Data transfer CSR Access PCI Burst 4096 bytes Data Block131,072 bytes

25 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 25 PCI Activity: Read Throughput Flat then 1/t dependance ~ 860 Mbit/s for Read blocks >= 262144 bytes CPU load ~20% Concern about CPU load needed to drive Gigabit link

26 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 26 Helping Real Users [2] HEP BaBar & CMS Application Throughput

27 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 27 BaBar Case Study: Disk Performace BaBar Disk Server Tyan Tiger S2466N motherboard 1 64bit 66 MHz PCI bus Athlon MP2000+ CPU AMD-760 MPX chipset 3Ware 7500-8 RAID5 8 * 200Gb Maxtor IDE 7200rpm disks Note the VM parameter readahead max Disk to memory (read) Max throughput 1.2 Gbit/s 150 MBytes/s) Memory to disk (write) Max throughput 400 Mbit/s 50 MBytes/s) [not as fast as Raid0]

28 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 28 BaBar: Serial ATA Raid Controllers 3Ware 66 MHz PCI ICP 66 MHz PCI

29 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 29 BaBar Case Study: RAID Throughput & PCI Activity 3Ware 7500-8 RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s Disk – disk throughput bbcp 40-45 Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! Read from RAID5 Disks Write to RAID5 Disks

30 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 30 PC MB – NG SuperJANET4 Development Network BaBar Case Study RAL OSM- 1OC48- POS-SS MCC OSM- 1OC48- POS-SS MAN Gigabit Ethernet 2.5 Gbit POS Access 2.5 Gbit POS core MPLS Admin. Domains SJ4 Dev PC 3ware RAID5 BarBar PC 3ware RAID5 MB - NG Status / Tests: Manc host has DataTAG TCP stack RAL Host now available BaBar-BaBar mem-mem BaBar-BaBar real data MB-NG BaBar-BaBar real data SJ4 Mbng-mbng real data MB-NG Mbng-mbng real data SJ4 Different TCP stacks already installed

31 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 31 PC Study of Applications MB – NG SuperJANET4 Development Network UCL OSM- 1OC48- POS-SS MCC OSM- 1OC48- POS-SS MAN Gigabit Ethernet 2.5 Gbit POS Access 2.5 Gbit POS core MPLS Admin. Domains SJ4 Dev PC 3ware RAID0 PC 3ware RAID0 MB - NG

32 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 32 24 Hours HighSpeed TCP mem-mem TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 64 Rx-abs 128 941.5 Mbit/s +- 0.5 Mbit/s MB - NG

33 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 33 Gridftp Throughput HighSpeedTCP Int Coal 64 128 Txqueuelen 2000 TCP buffer 1 M byte (rtt*BW = 750kbytes) Interface throughput Acks received Data moved 520 Mbit/s Same for B2B tests So its not that simple! MB - NG

34 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 34 Gridftp Throughput + Web100 Throughput Mbit/s: See alternate 600/800 Mbit and zero Cwnd smooth No dup Ack / send stall / timeouts MB - NG

35 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 35 http data transfers HighSpeed TCP Apachie web server out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput 72 MBytes/s Cwnd - some variation No dup Ack / send stall / timeouts MB - NG

36 GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 36 More Information Some URLs MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests: www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ TCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html


Download ppt "GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations."

Similar presentations


Ads by Google