Computer Communication Example Send picture image and message to friend Microsoft Outlook system software Hello! Netscape Messenger system software Hello! Netscape Messenger sender receiver Communication Channel
Packetization of Data For transmission of a stream of data bits (message), the message is typically partitioned into “ packets ” A packet consists of (at the very least) Packet header (destination, routing info, etc.) Data payload (the bits of the message) Check bits (redundant bits used to check for errors in the received packet)
Communication Protocols For successful transmission/receipt of a packet, the transmitter and receiver must agree on a “ communication protocol ” Set of rules on how the packet is interpreted How to sample the bits of the packet Signaling method Synchronization of the transmitter/receiver How to determine which parts of the packet are the packet header (destination info, etc.), data payload, check bits, etc. How to interpret the bits of the data payload Integer, floating-point, character string, JPEG picture, etc.
Computer Communication Models and Communication Protocol Suites Most commonly used reference base communication model is the Open Systems Interconnection (OSI) model Standardized by the International Organization for Standardization (ISO) Most common implementation of the OSI model is a set of protocols referred to as the TCP/IP protocol suite (or stack) TCP = Transmission Control Protocol IP = Internet Protocol
Communication Protocols L1 L2 L3 L4 L5 L7 L6 [Forouzan 2003]
Layer-by-Layer (OSI Model) View packets frames
Activities Required (Sender Side) Edit message and enter “ send ” MS Outlook Express Convert into sequence of bits Tags must be inserted so that original message can be reconstructed at destination E.g., “ string ” … “ JPEG ” … “ end ” … … Encrypt message if necessary for privacy Compress if necessary Partition into packets of fixed maximum size Attach header information (Packet ID, destination, checksum, … ) Intersperse with packets from messages created by other applications On first link of path, Partition each packet into fixed-size frames (with headers) Send each frame out onto the network IP address
Activities Required on Network Route each packet to its destination During each “ hop ” of the path Send signals back and forth to coordinate the sending and receiving of the stream of bits corresponding to a frame Handshaking Check each frame for errors Request retransmission in the case of errors Arrange received frames into the proper order Wait for all frames of the packet to be received Once each packet reaches its destination node, Store packet in a memory buffer at destination Send signal to destination CPU to inform it of the arrival of the new packet Port Number IP address
Activities at Destination Node Receive packets Check each packet for errors and request retransmission in the case of errors Arrange received packets into the proper order Once all packets have been received, form a complete message Decompress if necessary Decrypt if necessary Check for errors Use tags in the bit stream to reconstruct the message Show message to user using tool (e.g., MS Outlook Express)
Network Addresses IP (Internet Protocol) address Address used to identify a computing node on the internet Network layer (L3) address E.g., (Look up “ properties ” on “ TCP/IP ” on “ Network ” ) MAC (Medium Access Control) address Address used to identify a LAN card – cannot be changed Data link layer (L2) address E.g., abcd1234 (Enter “ ipconfig /all ” from MS Windows “ cmd ” window) Port address Address used to identify a network interface point for an application prog. Corresponds to a memory buffer Send a message - write to a memory buffer on a remote computer Receive a message – read from a memory buffer on the local computer Example: 39 (for FTP), 3000 (for a user-defined port)
Connection-Oriented and Connectionless Networking Connection-oriented networking Uses a specific network path that is established for the duration of a connection Three phases: connection establishment, data transfer, connection termination Main advantage: reliable communication Main implementation method: TCP (transfer control protocol) Used in the “parallel merge sort” socket-based program (TCP sockets interface) Connectionless networking Finds a new path for each packet sent Main advantage: fast communication for short messages Main implementation method: UDP (user datagram protocol)
Communication Performance Parameters (1) Throughput ( 데이터 처리량 ) Actual number of bits transmitted per second Note 1: different from latency ( 지연시간 ) Note 2: different from bandwidth ( 대역폭 ) Most important communication performance parameter Typical measurement method Send a data file from a source node to a destination node Record the time t1 when the first byte of the data is received Record the time t2 when the last byte of the data is received Divide amount of data received by (t2 – t1) Note: Mbps = mega-bits-per-second (not bytes)
Communication Performance Parameters (2) Bandwidth Maximum number of bits that can be transmitted per second Note 1: different from latency ( 지연 시간 ) Note 2: different from throughput ( 데이터 처리량 ) Measures performance of network only (not the computer hardware or software) Typical measurement method Difficult to measure since effects of small data amounts, software and hardware at source and destination nodes must be removed The “ rated ” figure stated in the specifications for the relevant communication protocol is most commonly used E.g., 11 Mbps for IEEE b
Communication Performance Parameters (3) Latency Time required for the first byte of a message to be transferred from the source to the destination node Should include software processing time Typical measurement method At time t1, source node sends a very small message to destination node Destination node receives message and sends it back to the source node Source node receives message and records the time t2 One-way communication latency is (t2 – t1) / 2 Why can ’ t we measure latency directly (record time t3 at destination and measure latency as t3 – t1)?
Computer Communication Example (Revisited) Send picture image and message to friend Microsoft Outlook system software Hello! Netscape Messenger system software Hello! Netscape Messenger sender receiver NIC H/W (LAN card) User Memory Space OS Kernel Memory Space NIC H/W 1.Polling 2.Interrupt 3.DMA “zero copy” [IBM’08]
EECE Section 7.8 of [Culler 1999] Communication Microbenchmarks at 3 levels Basic network transaction Shared address space Message passing using MPI Network Transaction Performance Echo test using Active Messages (AM) user-level software network interface source destination k-byte message Receive message and immediately send reply Send message; receive reply; compute 1-way communication delay Why must this type of echo test be used?
EECE LogP Communication Model LogP model used for network transaction performance modeling L latency (within the physical network) o overhead (= sending overhead + receiving overhead) g gap (the minimum gap between consecutive message send operations) P processing time (for normal processing of application programs) Refer to Figs and 7.31 [Culler 1999]
EECE Message-Passing Operations Simple model for overall time to send n bytes T(n) = T 0 + n/B T 0 is time to send initial byte of data over the network Sending overhead + receiving overhead n is number of bytes B is the bandwidth of the network link r infinity : asymptotic bandwidth n ½ : transfer size at which throughput = ½ * r infinity
EECE Table 7.1 of [Culler 1999]: progressive improvement in T 0, B, MFLOPS/processor Berkeley NOW T 0 = 6 microseconds r infinity = 120 MB/s (Megabytes per second)
EECE Application-Level Performance How does LogP affect application performance? Depends on the characteristics of the application General trends observable Figures 7.35, 7.36, 7.37, 7.38 and Table 7.2 [Culler 1999] T 0 large larger messages are preferable T 0 small, B large small messages are acceptable Larger numbers of processors smaller message sizes, smaller working sets (size of data that fits into faster memory, such as one cache line)
EECE Synchronization Issues Message-Passing Model Locks are not necessary since mutual exclusion is not a problem Each process has exclusive access to its local memory and uses message- passing to send/receive data from/to other nodes Group synchronization and group communication is still a problem Shared-Address-Space Model Requires basic support for “ locks ” and “ barriers ” Software algorithms execute on top of basic atomic exchange primitives Programming environment/hardware must provide perception of atomic memory operations
EECE Group Communication Operations Unicast (one-to-one) Multicast (one-to-many) Broadcast (one-to-all) All-to-all broadcast All-to-all personalized multicast (or broadcast) Also referred to as “ gossiping ” Special operations used for performance improvement Parallel prefix (used with parallel supercomputers) Map-reduce (white paper written by Google engineers)
Communication Support in the ESA Lab Cluster 1Gbps Ethernet cards and switches Myrinet switches, Myrinet LAN cards (from Myricom) 1.28 Gbps/port TCP/IP, Myrinet GM and BIP LAN interface software [Kim 2001] Myrinet2000 switch and Myrinet2000 LAN cards 2.0 Gbps/port bandwidth (= 250MBps) TCP sockets > 100 microsecond latency, much less than peak BW Myrinet GM LAN interface software ( Around 5 microsecond latency, close to peak BW Note: current (2009) state-of-art is Myrinet10G, MX S/W Around 2 microsecond latency, close to 10Gbps throughput
References Behrouz A. Forouzan, TCP/IP Protocol Suite, 2nd Ed., McGraw-Hill, Boston, D. E. Culler, J. P. Singh and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufmann, San Francisco, zerocopy/, zerocopy/ S. C. Kim and S. Lee, ``Measurement and prediction of communication latencies in Myrinet networks,'' J. Parallel and Distributed Computing, Vol. 61, No. 11, pp , November 2001.