Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications Back to Basics Richard Hughes-Jones The University of Manchester then “Talks” then look for “Brasov”
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 2 The aim is to give you a picture of how researchers are using high performance networks to support their work. uBack to Basics Simple Introduction to Networking uTCP/IP on High Bandwidth Long Distance Networks But TCP/IP works ! The effect of packet loss Advanced TCP Stacks Fairness uReal Applications on Real Networks Disk-2-disk applications on real networks Memory-2-memory tests Transatlantic disk-2-disk at Gigabit speeds Remote Computing Farms The effect of distance Radio Astronomy e-VLBI Thanks for allowing me to use their slides to: Sylvain Ravot CERN, Les Cottrell SLAC, Brian Tierney LBL, Robin Tasker DL Structure of the Talks
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 3 Simple Introduction to Networking
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 4 What is a Protocol Stack ? uISO OSI (Open Systems Interconnection) Seven Layer Model defines a framework allowing development of real network protocols uA layer… performs unique and specific tasks only has knowledge of those layers immediately above and below uses services of layer below, and provides services to layer above the services defined by a layer are implementation independent – it’s a definition of how things work conceptually communicates with its peer in the remote system
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 5 The Layering Principle uEncapsulation: Each protocol layer N adds a Header to the data unit from layer N+1 Header contains control information Layer 7: Application user processes Layer 6: Presentation data interpretation, code transformation Layer 5: Session Connection, negotiation control Layer 4: Transport End-2-end data transfer & integrity Packet sequencing, flow control Layer 3: Network Addressing, Routing Packet sequencing, flow control Layer 2: Data Link Packet assembly/disassembly Transmission control, Error checking Layer 1: Physical Electrical, Optical, Mechanical DH App data FCS NH TH PH SH App data NH TH PH SH App data TH PH SH App data PH SH App data PH App data Bits on the “wire” Frame Packet Segment
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 6 What do the Layers do? uTransport Layer: acts as a go-between for the user and network Provides end-to-end data movement & control Gives the level of reliability/integrity need by the application Can ensure a reliable service (which network layer cannot), e.g. assigns sequence numbers to identify “lost” packets uNetwork Layer: deals with logical addressing & the transmission of packets, mechanism for routing. uData Link Layer: provides the synchronization and error checking for the data transmitted over a single physical link (may ensure correct delivery of frames) Going down: fits packets from the network layer above into frames. Going up: Groups bits from the physical layer into frames. uPhysical Layer: concerned with the transmission of individual bits.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 7 How do the “IP” Protocols fit together? Application ( Presentation Session) Transport Network Data Link Physical File Transfer Protocol (FTP) RFC 559 Simple Mail Transfer Protocol (SMTP) RFC 821 TELNET RFC 854 TFTP RFC 783 NFS RFC 1024, 1057 and 1094 SNMP RFC 1157 Transmission Control Protocol (TCP) RFC 793 User Datagram Protocol (UDP) RFC 768 Address Resolution Protocols ARP: RFC 826 RARP: RFC 903 Internet Protocol IP RFC 791 Internet Control Message Protocol (ICMP) RFC 792 EthernetToken Ring ISDNFDDISMDSATM SDH/SONET xDSL Transmission Mode TP Copper Fibre Optic Satellite Microwave DWDM CWDM etc Network Interface Cards Routing OSPF, BGP ssh HTTP POP3/IMAP DNS ping traceroute
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 8 Some of the “IP” Protocols uTransmission Control Protocol. TCP provides application programs access to the network using a reliable, connection-oriented transport layer service. uUser Datagram Protocol. UDP provides unreliable, connection- less delivery service using the IP protocol to transport messages between machines. It adds the ability to distinguish among multiple destinations on a single host computer. uInternet Protocol. IP receives datagrams from the upper-layer software and transmits it to the destination host based upon a best effort, connection-less delivery service. uInternet Control Message Protocol. ICMP allows internet routers to transmit error messages and test messages. uInternet Group Message Protocol. IGMP is used with multicast to send UDP datagrams to multiple hosts. uAddress Resolution Protocol. ARP translates between the 32 bit IP address and a 48 bit LAN address. uReverse Address Resolution Protocol. RARP translates between the 48 bit LAN address and the 32 bit IP address.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 9 The Physical Layer 1: Ethernet
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 10 The Link Layer 2: Ethernet Frame Preamble, which is comprised of 56 bits of alternating 0s and 1s. The preamble provides all the nodes on the network a signal against which to synchronize. Media Access Control (MAC) Address Every Ethernet network card has, built into its hardware, a unique six-octet (48-bit) hexadecimal number that differentiates it from all other Ethernet cards in the universe. The DA and SA define the path across the link Start Frame delimiter, which marks the start of a frame. The start frame delimiter is 8 bits long with the pattern Data, the reason the frame exists. MTU Maximum Transport Unit Frame Check Sequence to protect the frame contents Length/Type field two octets long. If the value =< 1500 (0x05dc hex) indicates the length of data If the value > 1500 indicates network-layer protocol : “Ethernet Types” Frame headerIP DatagramFCS 12 bytes Inter Frame Gap
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 11 The Link Layer: Ethernet VLANs VLANS are logical networks built over the same physical cable plant. Distinguishes Ethernet frames between their logical networks using VLAN header VLAN is defined by the use of value 0x8100 in the Type field location. The next two octets are composed of the following three fields: User Priority field This field is 3 bits in length and is used to define the priority of the Ethernet frame. This is utilized to define and deliver a class of service Canonical format indicator This is 1 bit in length. Just **don’t** ask!!! VLAN Identifier field This field is 12 bits in length and contains the VLAN identifier (VID) of this frame. The original Length/Type field will then follow the inserted VLAN tag.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 12 The Network Layer 3: IP uIP Layer properties: Provides best effort delivery It is unreliable Packet may be lost Duplicated Out of order Connection less Provides logical addresses Provides routing Demultiplex data on protocol number
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 13 The Internet datagram 31 HlenVersType of serv.Total length IdentificationFlags 24 4 Fragment offset 19 TTLProtocolHeader Checksum Source IP address Destination IP address IP Options (if any)Padding 20 Bytes Frame headerTransportFCS IP header
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 14 IP Datagram Format (cont.) uType of Service – TOS: now being used for QoS uTotal length: length of datagram in bytes, includes header and data uTime to live – TTL: specifies how long datagram is allowed to remain in internet Routers decrement by 1 When TTL = 0 router discards datagram Prevents infinite loops uProtocol: specifies the format of the data area Protocol numbers administered by central authority to guarantee agreement, e.g. ICMP=1, TCP=6, UDP=17 … uSource & destination IP address: (32 bits each) contain IP address of sender and intended recipient uOptions: (variable length) Mainly used to record a route, or timestamps, or specify routing
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 15 Internet Class-based addresses uAn Address looks like uClass A: large number of hosts, few networks 0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh 7 network bits (0 and 127 reserved, so 126 networks), 24 host bits (> 16M hosts/net) Initial byte (decimal) uClass B: medium number of hosts and networks 10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh 16,384 class B networks, 65,534 hosts/network Initial byte (decimal) uClass C: large number of small networks 110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh 2,097,152 networks, 254 hosts/network Initial byte (decimal) uClass D: Multicast (See RFC 1112) 1110nnnn nnnnnnnn nnnnnnnn hhhhhhhh Initial byte (decimal) uClass E: Reserved Initial byte (decimal)
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 16 The Transport Layer 4: UDP uUDP Provides : Connection less service over IP No setup teardown One packet at a time Minimal overhead – high performance Provides best effort delivery It is unreliable: Packet may be lost Duplicated Out of order Application is responsible for Data reliability Flow control Error handling
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 17 UDP Datagram format uSource/destination port: port numbers identify sending & receiving processes Port number & IP address allow any application on Internet to be uniquely identified Ports can be static or dynamic Static (< 1024) assigned centrally, known as well known ports Dynamic uMessage length: in bytes includes the UDP header and data (min 8 max 65,535) Source portDestination port UDP message lenChecksum (opt.) 0 Frame header Application data FCS IP header UDP header 8 Bytes
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 18 The Transport Layer 4: TCP uTCP RFC 768 RFC 1122 Provides : Connection orientated service over IP During setup the two ends agree on details Explicit teardown Multiple connections allowed Reliable end-to-end Byte Stream delivery over unreliable network It takes care of: Lost packets Duplicated packets Out of order packets TCP provides Data buffering Flow control Error detection & handling Limits network congestion
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 19 Code Source portDestination port Sequence number Acknowledgement number 4 Hlen 10 ResvWindow Urgent ptrChecksum Options (if any)Padding The TCP Segment Format Frame header Application data FCS IP header TCP header 20 Bytes
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 20 TCP Segment Format – cont. uSource/Dest port: TCP port numbers to ID applications at both ends of connection uSequence number: First byte in segment from sender’s byte stream uAcknowledgement: identifies the number of the byte the sender of this segment expects to receive next uCode: used to determine segment purpose, e.g. SYN, ACK, FIN, URG uWindow: Advertises how much data this station is willing to accept. Can depend on buffer space remaining. uOptions: used for window scaling, SACK, timestamps, maximum segment size etc. Code Source portDestination port Sequence number Acknowledgement number HlenResvWindow Urgent ptrChecksum Options (if any) Padding
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 21 Frame headerApplication data FCS IP header UDP header The RTP Header Format RTP header Sequence Num. RTP Time Stamp Synchronization Source Identifier Bytes CSRC P X M I S T Ver VPT
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester HlenVersType of serv.Total length IdentificationFlags 24 4 Fragment offset 19 TTLProtocolHeader Checksum Source IP address Destination IP address IP Options (if any)Padding
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 23 uLectures, tutorials etc. on TCP/IP: uEncylopaedia uTCP/IP Resources uUnderstanding IP addresses uConfiguring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt uAssigned protocols, ports etc (RFC 1010) & /etc/protocols More Information
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 24 Any Questions?
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 25 Backup Slides
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 26 More Information Some URLs uUKLight web site: uMB-NG project web site: uDataTAG project web site: uUDPmon / TCPmon kit + writeup: uMotherboard and NIC Tests: & “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue uTCP tuning information may be found at: & uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 uPFLDnet uDante PERT
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 27 tcpdump / tcptrace utcpdump: dump all TCP header information for a specified source/destination ftp://ftp.ee.lbl.gov/ utcptrace: format tcpdump output for analysis using xplot NLANR TCP Testrig : Nice wrapper for tcpdump and tcptrace tools uSample use: tcpdump -s 100 -w /tmp/tcpdump.out host hostname tcptrace -Sl /tmp/tcpdump.out xplot /tmp/a2b_tsg.xpl