Presentation is loading. Please wait.

Presentation is loading. Please wait.

GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 1 TCP/IP Masterclass or So TCP works … but still the users ask: Where is.

Similar presentations


Presentation on theme: "GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 1 TCP/IP Masterclass or So TCP works … but still the users ask: Where is."— Presentation transcript:

1 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 1 TCP/IP Masterclass or So TCP works … but still the users ask: Where is my throughput? Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” www.hep.man.ac.uk/~rich/

2 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 2 Layers & IP

3 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 3 The Transport Layer 4: TCP uTCP RFC 768 RFC 1122 Provides : Connection orientated service over IP During setup the two ends agree on details Explicit teardown Multiple connections allowed Reliable end-to-end Byte Stream delivery over unreliable network It takes care of: Lost packets Duplicated packets Out of order packets TCP provides Data buffering Flow control Error detection & handling Limits network congestion

4 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 4 Code Source portDestination port Sequence number 0 816 31 24 Acknowledgement number 4 Hlen 10 ResvWindow Urgent ptrChecksum Options (if any)Padding The TCP Segment Format Frame header Application data FCS IP header TCP header 20 Bytes

5 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 5 TCP Segment Format – cont. uSource/Dest port: TCP port numbers to ID applications at both ends of connection uSequence number: First byte in segment from sender’s byte stream uAcknowledgement: identifies the number of the byte the sender of this (ACK) segment expects to receive next uCode: used to determine segment purpose, e.g. SYN, ACK, FIN, URG uWindow: Advertises how much data this station is willing to accept. Can depend on buffer space remaining. uOptions: used for window scaling, SACK, timestamps, maximum segment size etc. Code Source portDestination port Sequence number Acknowledgement number HlenResvWindow Urgent ptrChecksum Options (if any) Padding

6 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 6 TCP – providing reliability uPositive acknowledgement (ACK) of each received segment Sender keeps record of each segment sent Sender awaits an ACK – “I am ready to receive byte 2048 and beyond” Sender starts timer when it sends segment – so can re-transmit Segment n ACK of Segment n RTT Time Sender Receiver Sequence 1024 Length 1024 Ack 2048 Segment n+1 ACK of Segment n +1 RTT Sequence 2048 Length 1024 Ack 3072 uInefficient – sender has to wait

7 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 7 Flow Control: Sender – Congestion Window uUses Congestion window, cwnd, a sliding window to control the data flow Byte count giving highest byte that can be sent with out an ACK Transmit buffer size and Advertised Receive buffer size important. ACK gives next sequence no to receive AND The available space in the receive buffer Timer kept for each packet Unsent Data may be transmitted immediately Sent Data buffered waiting ACK TCP Cwnd slides Data to be sent, waiting for window to open. Application writes here Data sent and ACKed Sending host advances marker as data transmitted Received ACK advances trailing edge Receiver’s advertised window advances leading edge

8 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 8 Flow Control: Receiver – Lost Data Received but not ACKed ACKed but not given to user Window slides Lost data Data given to application Last ACK given Next byte expected Expected sequence no. Receiver’s advertised window advances leading edge Application reads here uIf new data is received with a sequence number ≠ next byte expected Duplicate ACK is send with the expected sequence number

9 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 9 How it works: TCP Slowstart uProbe the network - get a rough estimate of the optimal congestion window size uThe larger the window size, the higher the throughput Throughput = Window size / Round-trip Time uexponentially increase the congestion window size until a packet is lost cwnd initially 1 MTU then increased by 1 MTU for each ACK received Send 1 st packet get 1 ACK increase cwnd to 2 Send 2 packets get 2 ACKs increase cwnd to 4 Time to reach cwnd size W T W = RTT*log 2 (W) (not exactly slow!) Rate doubles each RTT CWND slow start: exponential increase congestion avoidance: linear increase packet loss time retransmit: slow start again timeout

10 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 10 uadditive increase: starting from the rough estimate, linearly increase the congestion window size to probe for additional available bandwidth cwnd increased by 1 segment per rtt cwnd increased by 1 /cwnd for each ACK – linear increase in rate uTCP takes packet loss as indication of congestion ! umultiplicative decrease: cut the congestion window size aggressively if a packet is lost Standard TCP reduces cwnd by 0.5 Slow start to Congestion Avoidance transition determined by ssthresh CWND slow start: exponential increase congestion avoidance: linear increase packet loss time retransmit: slow start again timeout How it works: TCP Congestion Avoidance

11 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 11 TCP Fast Retransmit & Recovery uDuplicate ACKs are due to lost segments or segments out of order. uFast Retransmit: If the receiver transmits 3 duplicate ACKs (i.e. it received 3 additional segments without getting the one expected) Sender re-transmits the missing segment Set ssthresh to 0.5*cwnd – so enter congestion avoidance phase Set cwnd = (0.5*cwnd +3 ) – the 3 dup ACKs Increase cwnd by 1 segment when get duplicate ACKs Keep sending new data if allowed by cwnd Set cwnd to half original value on new ACK no need to go into “slow start” again uAt the steady state, cwnd oscillates around the optimal window size uWith a retransmission timeout, slow start is triggered again

12 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 12 TCP: Simple Tuning - Filling the Pipe uRemember, TCP has to hold a copy of data in flight uOptimal (TCP buffer) window size depends on: Bandwidth end to end, i.e. min(BW links ) AKA bottleneck bandwidth Round Trip Time (RTT) uThe number of bytes in flight to fill the entire path: Bandwidth*Delay Product BDP = RTT*BW Can increase bandwidth by orders of magnitude uWindows also used for flow control RTT Time Sender Receiver ACK Segment time on wire = bits in segment/BW

13 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 13 Standard TCP (Reno) – What’s the problem? uTCP has 2 phases: Slowstart Probe the network to estimate the Available BW Exponential growth Congestion Avoidance Main data transfer phase – transfer rate glows “slowly” uAIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm. For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ uPacket loss is a killer !!

14 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 14 TCP (Reno) – Details of problem #1 uTime for TCP to recover its throughput from 1 lost 1500 byte packet given by: u for rtt of ~200 ms @ 1 Gbit/s: 2 min UK 6 ms Europe 25 ms USA 150 ms 1.6 s 26 s 28min

15 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 15 Investigation of new TCP Stacks uThe AIMD Algorithm – Standard TCP (Reno) For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ uHigh Speed TCP a and b vary depending on current cwnd using a table a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. uScalable TCP a and b are fixed adjustments for the increase and decrease of cwnd a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed. uFast TCP Uses round trip time as well as packet loss to indicate congestion with rapid convergence to fair equilibrium for throughput. uHSTCP-LP, H-TCP, BiC-TCP

16 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 16 Lets Check out this theory about new TCP stacks Does it matter ? Does it work?

17 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 17 Packet Loss with new TCP Stacks uTCP Response Function Throughput vs Loss Rate – further to right: faster recovery Drop packets in kernel MB-NG rtt 6ms DataTAG rtt 120 ms

18 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 18 Packet Loss and new TCP Stacks uTCP Response Function UKLight London-Chicago-London rtt 177 ms 2.6.6 Kernel Agreement with theory good Some new stacks good at high loss rates

19 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 19 High Throughput Demonstrations Manchester rtt 6.2 ms (Geneva) rtt 128 ms man03lon01 2.5 Gbit SDH MB-NG Core 1 GEth Cisco GSR Cisco 7609 Cisco 7609 London (Chicago) Dual Zeon 2.2 GHz Send data with TCP Drop Packets Monitor TCP with Web100

20 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 20 uDrop 1 in 25,000 urtt 6.2 ms uRecover in 1.6 s High Performance TCP – MB-NG StandardHighSpeed Scalable

21 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 21 High Performance TCP – DataTAG uDifferent TCP stacks tested on the DataTAG Network u rtt 128 ms uDrop 1 in 10 6 uHigh-Speed Rapid recovery uScalable Very fast recovery uStandard Recovery would take ~ 20 mins

22 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 22 FAST demo via OMNInet and Datatag J. Mambretti, F. Yeh (Northwestern) t OMNInet Nortel Passport 8600 Nortel Passport 8600 Photonic Switch NU-E (Leverone) Workstations 2 x GE StarLight-Chicago CalTech Cisco 7609 2 x GE Photonic Switch Alcatel 1670 10GE Alcatel 1670 2 x GE OC-48 DataTAG 2 x GE Workstations CERN -Geneva San Diego FAST display CERN Cisco 7609 7,000 km A. Adriaanse, C. Jin, D. Wei (Caltech) S. Ravot (Caltech/CERN) FAST Demo Cheng Jin, David Wei Caltech Layer 2 path Layer 2/3 path

23 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 23 FAST TCP vs newReno è Traffic flow Channel #1 : newReno è Traffic flowChannel #2: FAST è Traffic flow Channel #2: FAST Utilization: 70% Utilization: 90% 90%

24 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 24 Problem #2 Is TCP fair? look at Round Trip Times & Max Transfer Unit

25 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 25 MTU and Fairness uTwo TCP streams share a 1 Gb/s bottleneck uRTT=117 ms uMTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/s uMTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/s uLink utilization : 70,7 % Starlight (Chi) CERN (GVA) RR GbE Switch Host #1 POS 2.5 Gbps 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck Sylvain Ravot DataTag 2003

26 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 26 RTT and FairnessSunnyvale Starlight (Chi) CERN (GVA) RR GbE Switch Host #1 POS 2.5 Gb/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck R POS 10 Gb/s R 10GE uTwo TCP streams share a 1 Gb/s bottleneck uCERN Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s uCERN Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s uMTU = 9000 bytes uLink utilization = 71,6 % Sylvain Ravot DataTag 2003

27 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 27 Problem #n Do TCP Flows Share the Bandwidth ?

28 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 28 uChose 3 paths from SLAC (California) Caltech (10ms), Univ Florida (80ms), CERN (180ms) uUsed iperf/TCP and UDT/UDP to generate traffic uEach run was 16 minutes, in 7 regions Test of TCP Sharing: Methodology (1Gbit/s) Ping 1/s Iperf or UDT ICMP/ping traffic TCP/UDP bottleneck iperf SLAC Caltech/UFL/CERN 2 mins 4 mins Les Cottrell PFLDnet 2005

29 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 29 uLow performance on fast long distance paths AIMD (add a=1 pkt to cwnd / RTT, decrease cwnd by factor b=0.5 in congestion) Net effect: recovers slowly, does not effectively use available bandwidth, so poor throughput Unequal sharing TCP Reno single stream Congestion has a dramatic effect Recovery is slow Increase recovery rate SLAC to CERN RTT increases when achieves best throughput Les Cottrell PFLDnet 2005 Remaining flows do not take up slack when flow removed

30 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 30 Fast uAs well as packet loss, FAST uses RTT to detect congestion RTT is very stable: σ(RTT) ~ 9ms vs 37±0.14ms for the others SLAC-CERN Big drops in throughput which take several seconds to recover from 2 nd flow never gets equal share of bandwidth

31 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 31 Hamilton TCP uOne of the best performers Throughput is high Big effects on RTT when achieves best throughput Flows share equally Appears to need >1 flow to achieve best throughput Two flows share equally SLAC-CERN > 2 flows appears less stable

32 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 32 Problem #n+1 To SACK or not to SACK ?

33 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 33 The SACK Algorithm uSACK Rational Non-continuous blocks of data can be ACKed Sender transmits just lost packets Helps when multiple packets lost in one TCP window uThe SACK Processing is inefficient for large bandwidth delay products Sender write queue (linked list) walked for: Each SACK block To mark lost packets To re-transmit Processing so long input Q becomes full Get Timeouts SACKs updated rtt 150ms Standard SACKs rtt 150ms HS-TCP Dell 1650 2.8 GHz PCI-X 133 MHz Intel Pro/1000 Doug Leith Yee-Ting Li

34 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 34 SACK … uLook into what’s happening at the algorithmic level with web100: uStrange hiccups in cwnd  only correlation is SACK arrivals Scalable TCP on MB-NG with 200mbit/sec CBR Background Yee-Ting Li

35 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 35 Real Applications on Real Networks Disk-2-disk applications on real networks Memory-2-memory tests Transatlantic disk-2-disk at Gigabit speeds Remote Computing Farms The effect of TCP The effect of distance Radio Astronomy e-VLBI Leave for Ralph’s talk

36 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 36 iperf Throughput + Web100 u SuperMicro on MB-NG network u HighSpeed TCP u Linespeed 940 Mbit/s u DupACK ? <10 (expect ~400) u BaBar on Production network u Standard TCP u 425 Mbit/s u DupACKs 350-400 – re-transmits

37 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 37 Applications: Throughput Mbit/s u HighSpeed TCP u 2 GByte file RAID5 u SuperMicro + SuperJANET u bbcp u bbftp u Apachie u Gridftp u Previous work used RAID0 (not disk limited)

38 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 38 bbftp: What else is going on? Scalable TCP u SuperMicro + SuperJANET Instantaneous 0 - 550 Mbit/s u Congestion window – duplicate ACK u Throughput variation not TCP related? Disk speed / bus transfer Application architecture u BaBar + SuperJANET Instantaneous 200 – 600 Mbit/s u Disk-mem ~ 590 Mbit/s remember the end host

39 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 39 Transatlantic Disk to Disk Transfers With UKLight SuperComputing 2004

40 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 40 SC2004 UKLIGHT Overview MB-NG 7600 OSR Manchester ULCC UKLight UCL HEP UCL network K2 Ci Chicago Starlight Amsterdam SC2004 Caltech Booth UltraLight IP SLAC Booth Cisco 6509 UKLight 10G Four 1GE channels UKLight 10G Surfnet/ EuroLink 10G Two 1GE channels NLR Lambda NLR-PITT-STAR-10GE-16 K2 Ci Caltech 7600

41 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 41 Transatlantic Ethernet: TCP Throughput Tests uSupermicro X5DPE-G2 PCs uDual 2.9 GHz Xenon CPU FSB 533 MHz u1500 byte MTU u2.6.6 Linux Kernel uMemory-memory TCP throughput uStandard TCP uWire rate throughput of 940 Mbit/s uFirst 10 sec uWork in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing

42 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 42 SC2004 Disk-Disk bbftp ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 GByte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s

43 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 43 Network & Disk Interactions (work in progress) uHosts: Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size uMeasure memory to RAID0 transfer rates with & without UDP traffic Disk write 1735 Mbit/s Disk write + 1500 MTU UDP 1218 Mbit/s Drop of 30% Disk write + 9000 MTU UDP 1400 Mbit/s Drop of 19% % CPU kernel mode

44 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 44 Remote Computing Farms in the ATLAS TDAQ Experiment

45 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 45 ATLAS Remote Farms – Network Connectivity

46 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 46 ATLAS Application Protocol u Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes u Processing of event u Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation u tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. Send OK Send event data Request event ●●● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter Daemon EFD SFI and SFO

47 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 47 tcpmon: TCP Activity Manc-CERN Req-Resp Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms TCP Congestion window gets re-set on each Request TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms Transfer achievable throughput 120 Mbit/s

48 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 48 tcpmon: TCP Activity Manc-cern Req-Resp TCP stack tuned Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 19 rtt or ~ 380 ms TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) Transfer achievable throughput grows to 800 Mbit/s Data transferred WHEN the application requires the data

49 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 49 Round trip time 150 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 11 rtt or ~ 1.67 s tcpmon: TCP Activity Alberta-CERN Req-Resp TCP stack tuned TCP Congestion window in slow start to ~1.8s then congestion avoidance Response in 2 rtt after ~2.5s Rate 2.2/s (with 50ms wait) Transfer achievable throughput grows slowly from 250 to 800 Mbit/s

50 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 50 uStandard TCP not optimum for high throughput long distance links uPacket loss is a killer for TCP Check on campus links & equipment, and access links to backbones Users need to collaborate with the Campus Network Teams Dante Pert uNew stacks are stable and give better response & performance Still need to set the TCP buffer sizes ! Check other kernel settings e.g. window-scale maximum Watch for “TCP Stack implementation Enhancements” uTCP tries to be fair Large MTU has an advantage Short distances, small RTT, have an advantage uTCP does not share bandwidth well with other streams uThe End Hosts themselves Plenty of CPU power is required for the TCP/IP stack as well and the application Packets can be lost in the IP stack due to lack of processing power Interaction between HW, protocol processing, and disk sub-system complex uApplication architecture & implementation are also important The TCP protocol dynamics strongly influence the behaviour of the Application. uUsers are now able to perform sustained 1 Gbit/s transfers Summary & Conclusions

51 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 51 More Information Some URLs 1 uUKLight web site: http://www.uklight.ac.uk uMB-NG project web site: http://www.mb-ng.net/ uDataTAG project web site: http://www.datatag.org/ uUDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net uMotherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ uTCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 uPFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ uDante PERT http://www.geant2.net/server/show/nav.00d00h002

52 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 52 uLectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm uEncylopaedia http://www.freesoft.org/CIE/index.htm uTCP/IP Resources www.private.org.il/tcpip_rl.html uUnderstanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html uConfiguring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt uAssigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols http://www.es.net/pub/rfcs/rfc1010.txt More Information Some URLs 2

53 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 53 Any Questions?

54 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 54 Backup Slides

55 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 55 uUDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program uLatency uRound trip times measured using Request-Response UDP frames uLatency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies uHistograms of ‘singleton’ measurements uTells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Latency Measurements

56 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 56 Throughput Measurements uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals n bytes Number of packets Wait time time  Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver

57 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 57 PCI Bus & Gigabit Ethernet Activity uPCI Activity uLogic Analyzer with PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display PCI bus Possible Bottlenecks

58 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 58 u SuperMicro P4DP8-2G (P4DP6) uDual Xeon u 400/522 MHz Front side bus u 6 PCI PCI-X slots u 4 independent PCI buses 64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X u Dual Gigabit Ethernet u Adaptec AIC-7899W dual channel SCSI u UDMA/100 bus master/EIDE channels data transfer rates of 100 MB/sec burst “Server Quality” Motherboards

59 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 59 “Server Quality” Motherboards u Boston/Supermicro H8DAR u Two Dual Core Opterons u 200 MHz DDR Memory Theory BW: 6.4Gbit u HyperTransport u 2 independent PCI buses 133 MHz PCI-X u 2 Gigabit Ethernet u SATA u ( PCI-e )

60 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 60 Network switch limits behaviour uEnd2end UDP packets from udpmon Only 700 Mbit/s throughput Lots of packet loss Packet loss distribution shows throughput limited

61 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 61 10 Gigabit Ethernet: UDP Throughput u1500 byte MTU gives ~ 2 Gbit/s uUsed 16144 byte MTU max user length 16080 uDataTAG Supermicro PCs uDual 2.2 GHz Xenon CPU FSB 400 MHz uPCI-X mmrbc 512 bytes uwire rate throughput of 2.9 Gbit/s uCERN OpenLab HP Itanium PCs uDual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.7 Gbit/s uSLAC Dell PCs giving a uDual 3.0 GHz Xenon CPU FSB 533 MHz uPCI-X mmrbc 4096 bytes uwire rate of 5.4 Gbit/s

62 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 62 10 Gigabit Ethernet: Tuning PCI-X u16080 byte packets every 200 µs uIntel PRO/10GbE LR Adapter uPCI-X bus occupancy vs mmrbc Measured times Times based on PCI-X times from the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes 5.7Gbit/s mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update

63 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 63 UDP Datagram format uSource/destination port: port numbers identify sending & receiving processes Port number & IP address allow any application on Internet to be uniquely identified Ports can be static or dynamic Static (< 1024) assigned centrally, known as well known ports Dynamic uMessage length: in bytes includes the UDP header and data (min 8 max 65,535) 81631 24 Source portDestination port UDP message lenChecksum (opt.) 0 Frame header Application data FCS IP header UDP header 8 Bytes

64 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 64 Congestion control: ACK clocking

65 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 65 End Hosts & NICs CERN-nat-Manc. Request-Response Latency Throughput Packet Loss Re-Order uUse UDP packets to characterise Host, NIC & Network SuperMicro P4DP8 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 64 bit 66 MHz PCI / 133 MHz PCI-X bus uThe network can sustain 1Gbps of UDP traffic uThe average server can loose smaller packets uPacket loss caused by lack of power in the PC receiving the traffic uOut of order packets due to WAN routers uLightpaths look like extended LANS have no re-ordering

66 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 66 tcpdump / tcptrace utcpdump: dump all TCP header information for a specified source/destination ftp://ftp.ee.lbl.gov/ utcptrace: format tcpdump output for analysis using xplot http://www.tcptrace.org/ NLANR TCP Testrig : Nice wrapper for tcpdump and tcptrace tools http://www.ncne.nlanr.net/TCP/testrig/ uSample use: tcpdump -s 100 -w /tmp/tcpdump.out host hostname tcptrace -Sl /tmp/tcpdump.out xplot /tmp/a2b_tsg.xpl

67 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 67 tcptrace and xplot uX axis is time uY axis is sequence number uthe slope of this curve gives the throughput over time. uxplot tool make it easy to zoom in

68 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 68 Zoomed In View uGreen Line: ACK values received from the receiver uYellow Line tracks the receive window advertised from the receiver uGreen Ticks track the duplicate ACKs received. uYellow Ticks track the window advertisements that were the same as the last advertisement. uWhite Arrows represent segments sent. uRed Arrows (R) represent retransmitted segments

69 GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 69 TCP Slow Start


Download ppt "GEANT2 Network Performance Workshop, 11-12 Jan 200, R. Hughes-Jones Manchester 1 TCP/IP Masterclass or So TCP works … but still the users ask: Where is."

Similar presentations


Ads by Google