CS-556: Distributed Systems Manolis Marazakis Inter-process Communication (III)

Slides:



Advertisements
Similar presentations
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Advertisements

Remote Procedure Call (RPC)
1 Transport Protocols & TCP CSE 3213 Fall April 2015.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 2 TCP/IP Fundamentals.
CCNA – Network Fundamentals
CSCI 4550/8556 Computer Networks
Transmission Control Protocol (TCP)
Intermediate TCP/IP TCP Operation.
CISCO NETWORKING ACADEMY PROGRAM (CNAP)
CSE551: Computer Network Review r Network Layers r TCP/UDP r IP.
1 TCP - Part I Relates to Lab 5. First module on TCP which covers packet format, data transfer, and connection management.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
TELE202 Lecture 14 TCP/UDP (2) 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »TCP/UDP (1) »Source: chapter 17 ¥This Lecture »TCP/UDP (2) »Source: chapter.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Last Class: RPCs and RMI
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
EECS122 - UCB 1 CS 194: Distributed Systems Remote Object Invocation, Message- Oriented Communications (Based on textbook slides) Computer Science Division.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
EEC-681/781 Distributed Computing Systems Lecture 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
An adapted reference model for networked communication.
Gursharan Singh Tatla Transport Layer 16-May
What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host IP treats a computer as an endpoint of communication Best.
Process-to-Process Delivery:
TRANSPORT LAYER T.Najah Al-Subaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
1 Transport Layer Computer Networks. 2 Where are we?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
IP and Errors IP Best Effort Datagrams can be: –Lost –Delayed –Duplicated –Delivered out of order –Corrupted.
6.1. Transport Control Protocol (TCP) It is the most widely used transport protocol in the world. Provides reliable end to end connection between two hosts.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
University of the Western Cape Chapter 12: The Transport Layer.
SMUCSE 4344 transport layer. SMUCSE 4344 transport layer end-to-end protocols –transport code runs only on endpoint hosts encapsulates network communications.
Section 5: The Transport Layer. 5.2 CS Computer Networks John Mc Donald, Dept. of Computer Science, NUI Maynooth. Introduction In the previous section.
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
Chapter 4: Interprocess Communication‏ Pages
Transport Layer3-1 Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultipl exing m reliable.
Transport Layer Moving Segments. Transport Layer Protocols Provide a logical communication link between processes running on different hosts as if directly.
Distributed Systems Concepts and Design Chapter 4.
CCNA 1 v3.0 Module 11 TCP/IP Transport and Application Layers.
Chapter 2 Applications and Layered Architectures Sockets.
TCP/IP Honolulu Community College Cisco Academy Training Center Semester 2 Version 2.1.
3.1 Silberschatz, Galvin and Gagne ©2009Operating System Concepts with Java – 8 th Edition Chapter 3: Processes.
Inter-process Communication. 2 Berkeley Sockets (I) Socket primitives for TCP/IP. PrimitiveMeaning SocketCreate a new communication endpoint BindAttach.
Interfaces and Services Each layer provides a service to the layer above it. A service is a set of primitive operations. Under UNIX, primitives are implemented.
Lecture 4 Overview. Ethernet Data Link Layer protocol Ethernet (IEEE 802.3) is widely used Supported by a variety of physical layer implementations Multi-access.
1 Introduction to TCP/IP. 2 OSI and Protocol Stack OSI: Open Systems Interconnect OSI ModelTCP/IP HierarchyProtocols 7 th Application Layer 6 th Presentation.
Institute of Technology Sligo - Dept of Computing Chapter 12 The Transport Layer.
Communication Chapter 2.
Computer Science Lecture 3, page 1 CS677: Distributed OS Last Class: Communication in Distributed Systems Structured or unstructured? Addressing? Blocking/non-blocking?
OS2-SUT– Sem ; R. Jalili Communication Chapter 2.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Data Communications and Networks Chapter 6 – IP, UDP and TCP ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
McGraw-Hill Chapter 23 Process-to-Process Delivery: UDP, TCP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Communication Chapter 2. Layered Protocols (1) Layers, interfaces, and protocols in the OSI model. 2-1.
1 TCP ProtocolsLayer name DNSApplication TCP, UDPTransport IPInternet (Network ) WiFi, Ethernet Link (Physical)
DMET 602: Networks and Media Lab Amr El Mougy Yasmeen EssamAlaa Tarek.
OS2-Sharif University of Technology - Sem ; R. Jalili Communication Chapter 2.
3. END-TO-END PROTOCOLS (PART 1) Rocky K. C. Chang Department of Computing The Hong Kong Polytechnic University 22 March
The Transport Layer Implementation Services Functions Protocols
Fast Retransmit For sliding windows flow control we waited for a timer to expire before beginning retransmission of a packet TCP uses an additional mechanism.
Introduction to TCP/IP
5. End-to-end protocols (part 1)
Introduction of Transport Protocols
Process-to-Process Delivery:
Process-to-Process Delivery: UDP, TCP
Last Class: Communication in Distributed Systems
Transport Layer 9/22/2019.
Presentation transcript:

CS-556: Distributed Systems Manolis Marazakis Inter-process Communication (III)

Fall Semester 2005CS-556: Distributed Systems Berkeley Sockets (I) Socket primitives for TCP/IP. PrimitiveMeaning SocketCreate a new communication endpoint BindAttach a local address to a socket ListenAnnounce willingness to accept connections AcceptBlock caller until a connection request arrives ConnectActively attempt to establish a connection SendSend some data over the connection ReceiveReceive some data over the connection CloseRelease the connection

Fall Semester 2005CS-556: Distributed Systems Berkeley Sockets (II) Connection-oriented communication pattern using sockets.

Fall Semester 2005CS-556: Distributed Systems Connected vs Connectionless (I) IP  best-effort, unreliable, connectionless Remembers nothing about a packet after it has sent it Checksum computed on header only No assumptions about the underlying physical medium Serial link, Ethernet, Token ring, X.25, ATM, wireless CDPD, … UDP: (optional) checksum notion of port

Fall Semester 2005CS-556: Distributed Systems Connected vs Connectionless (II) TCP  reliable connection-oriented service Segments are sent in IP datagrams Checksum of data in each segment Sequence # of the 1 st byte in the segment Acknowledge-and-retransmit mechanism  Each side maintains a receive window Range of sequence # that this side is prepared to receive Any arriving data with sequence # outsiode the receive window is discarded Queuing of data arriving out-of-order Window slides to the right, if the next expected sequence # has arrived … and an ACK is sent back with the sequence # expected next  Send window: Bytes sent but not yet acknowledged RTO timer (retransnmission timeout) Timeout does not always mean that the data was lost !! Bytes that can be sent but have not yet been sent

Fall Semester 2005CS-556: Distributed Systems UDP Failure Model Omission failures timeouts duplicate messages lost messages Need to maintain history  Last reply sent to each client provided that a client can make only one request at a time interprets each request as the ACK for the previous reply periodic ‘’purge’’ of history No ACK for the last response received before client terminates Fixed max. buffer size (8 KB) No message order guarantee Process crash failures

Fall Semester 2005CS-556: Distributed Systems TCP Failure Model Reliable message delivery checksums, sequence numbers, timeouts no need for applications to deal with  retransmissions  duplicates  reordering no need for histories Flow control mechanism large transfers without overwhelming the receiver … BUT not reliable sessions: Connections may be severed or severely congested  Processes cannot distinguish network from process failure  Processes cannot tell if their recent messages were received

Fall Semester 2005CS-556: Distributed Systems TCP is a stream protocol No inherent notion of “message boundary” The amount of data in a packet is not directly related to the amount of data delivered to TCP in the send() call No reliable for the receiver to determine how the data was packetized  Several packets may have arrived between recv() calls The amount of data returned in any given read() is unpredictable  Fixed-length messages  Variable-length messages End-of-record marker Fixed-length header (including record length) + variable data

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (I) “ TCP guarantees delivery of the data it sends ” True or False ? Guarantee to whom ? False … How can we handle outages & crashes ? TCP NIC IP NIC IP NIC IP TCP NIC IP Application (A)Application (B) User-space kernel-space

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (II) IP is a best-effort, unreliable protocol … so the TCP layer is the first place in the data path where it makes senses to even talk about guarantees The sender’s TCP layer can make no guarantee about segments that arrive at the receiver’s TCP layer An arriving segment may be corrupted, or it may contain duplicate data, or it may be out of order … The receiver’s TCP layer guarantees to the sender’s TCP layer that any segment that it ACKs & all data that came before it have been correctly received This does not mean that the data has been delivered to the application … ot that it will ever be delivered !!  For example, the receiving host may crash after the ACK but before delivery …

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (III) It also makes sense to talk about guarantees at application B (receiver) There can be no guarantee that all data sent by application A will arrive However, all data that does arrive will be in order and uncorrupted Avoid the attitude that “TCP will take care of everything” TCP is an end-to-end protocol, providing a reliable transport mechanism between peers … The “peers” are the TCP layers of the sender & the receiver !!

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (IV) Explicit acknowledgements What does the client do if the server does not ACK receipt ?? It may not be safe to simply resend a request … Network outagePeer crashesPeer’s host crash When a problem occurs at an endpoint, there is generally no alternative path  The problem persists until it is repaired An intermediate router may send the originator an ICMP message indicating that the destination network or the host is unreachable OR: The sender eventually times-out & resends the segments not ACKed. This continues until the sender gives up & drops the connection (~9 minutes). Pending read  ETIMEDOUT Otherwise, the next write fails  SIGPIPE or EPIPE

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (V) Peer crash: Indistinguishable from the case of the peer calling close() and then exit() The peer’s TCP layer issues a FIN segment  This does not necessarily imply that the peer has no more data to send, or even that it is not willing to receive more data … Reception of the FIN may come at different execution states of the application  If client is blocked, TCP has no way of notifying it The next transmission generates a RST segment  ECONNRESET If the RST is ignored & more data is transmitted  SIGIPE This may occur if the client performs >=2 consecutive write() calls without an intervening read()  Notification takes place only after the 2 nd write()  If client has a pending read(), it gets an immediate error indication (eg: read() returns EOF)

Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (VI) Peer’s host crash: The peer’s TCP cannot issue the FIN segment Until recovery, this case cannot be distinguished from a network outage  The peer’s TCP no longer responds, but the sender keeps retransmitting  … Until either the host recovers, or the sender gives up the connection  ETIMEDOUT If the host reboots before the sender gives up, a retransmitted segment may arrive at the TCP layer … without it having knowledge of the connection  RST  If sender has a read() pending  ECONNRESET  Else, the next write() results in a SIGPIPE signal

Fall Semester 2005CS-556: Distributed Systems Behavior of Peers Checking for client termination Heartbeats, timeouts for read operations, SO_KEEPALIVE option, … Checking for valid input Buffer overflow errors

Fall Semester 2005CS-556: Distributed Systems We rely on DNS …

Fall Semester 2005CS-556: Distributed Systems The Message-Passing Interface Some of the most intuitive primitives of MPI. PrimitiveMeaning MPI_bsendAppend outgoing message to a local send buffer MPI_sendSend a message and wait until copied to local or remote buffer MPI_ssendSend a message and wait until receipt starts MPI_sendrecvSend a message and wait for reply MPI_isendPass reference to outgoing message, and continue MPI_issendPass reference to outgoing message, and wait until receipt starts MPI_recvReceive a message; block if there are none MPI_irecvCheck if there is an incoming message, but do not block

Fall Semester 2005CS-556: Distributed Systems Group Communication Multicasting: 1-to-many comm. pattern Applications:  replicated services (better fault tolerance)  discovery of services  replicated data (better performance)  propagation of event notifications Failure model:  depends on implementation: IP multicast (UDP datagrams): omission failures class-D Inet addresses: “1110” bit prefix TTL reliable multicast ordered multicast FIFO Causal Total

Fall Semester 2005CS-556: Distributed Systems Conventional Procedure Call a) Parameter passing in a local procedure call: the stack before the call to read b) The stack while the called procedure is active

Fall Semester 2005CS-556: Distributed Systems Software layers Applications and Services RPC and RMI request-reply protocol marshalling and external data representation UDP and TCP middleware RPC is more than a (transport) protocol: a structuring mechanism for distributed systems

Fall Semester 2005CS-556: Distributed Systems Steps of a Remote Procedure Call 1. Client procedure calls client stub in normal way 2. Client stub builds message, calls local OS 3. Client's OS sends message to remote OS 4. Remote OS gives message to server stub 5. Server stub unpacks parameters, calls server 6. Server does work, returns result to the stub 7. Server stub packs it in message, calls local OS 8. Server's OS sends message to client's OS 9. Client's OS gives message to client stub 10. Stub unpacks result, returns to client

Fall Semester 2005CS-556: Distributed Systems Client and Server Stubs Principle of RPC between a client & server program.

Fall Semester 2005CS-556: Distributed Systems Example (Sun RPC - ONC) long square(long) example client ren.eecis.udel.edu 11 result: 121 Need RPC specification file (square.x): defines procedure name, arguments & results Run rpcgen square.x: generates square.h, square_clnt.c, square_xdr.c, square_svc.c square_clnt.c & square_svc.c: Stub routines for client & server square_xdr.c: XDR (External Data Representation) code - takes care of data type conversions

Fall Semester 2005CS-556: Distributed Systems RPC Specification File (square.x) struct square_in { longarg1; }; struct square_out { long res1; }; program SQUARE_PROG { version SQUARE_VERS { square_out SQUAREPROC(square_in) = 1;// procedure # } = 1;// version # } = 0x ;// program # IDL – Interface Definition Language

Fall Semester 2005CS-556: Distributed Systems Parameter Specification & Stub Generation procedureCorresponding message

Fall Semester 2005CS-556: Distributed Systems Writing a Client & a Server The steps in writing a client & a server in DCE RPC.

Fall Semester 2005CS-556: Distributed Systems Binding (SUN RPC) Port Mapper (rpcbind) listens at UDP port 111 Server registers program ID & version rpcinfo -p -> display all registered RPC servers When client issues clnt_create, the port mapper is contacted: program-to-port number mapping  arguments: (program ID, version, protocol)  response: server’s port number

Fall Semester 2005CS-556: Distributed Systems Binding (DCE)

Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (I)

Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (II) a. Original message on Pentium (little-endian) b. The message after receipt on SPARC (big-endian) c. The message after being inverted.

Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (III) How to pass pointers ? Meaningful only within a specific address space ! Arrays (of known length) & structures: Copy/restore semantics (bet. stubs) IN/OUT/INOUT markers  Optimization: may eliminate one copy operation Pointer to an arbitrary data structure ? No general solution Work-around:  Pass back the pointer to its “source”

Fall Semester 2005CS-556: Distributed Systems External Data Representation (I) Data structures: “flattened” on transmission rebuilt upon reception Primitive data types: byte order (big-endian: MSB comes first) ASCII vs UNICODE (2 bytes per character) marshalling/unmarshalling  to/from agreed external format

Fall Semester 2005CS-556: Distributed Systems External Data Representation (II) XDR (RFC 1832), CDR (CORBA), Java: data -> byte stream object references HTTP/MIME: data -> ASCII text IP addressporttimeobject IDinterface ID

Fall Semester 2005CS-556: Distributed Systems CORBA CDR example: The flattened form represents a Person struct with value: {‘Smith’, ‘London’, 1934} 0–3 4–7 8–11 12–15 16– –27 5 "Smit" "h___" 6 "Lond" "on__" 1934 index in sequence of bytes4 bytes notes on representation length of string ‘Smith’ length of string ‘London’ unsigned long

Fall Semester 2005CS-556: Distributed Systems Properties of TCP Connected vs Connectionless Protocols TCP is a stream protocol Performance of TCP Avoid re-inventing TCP !! TCP failure modes Behaviour of peers LAN vs WAN testing Tools & Resources

Fall Semester 2005CS-556: Distributed Systems Basic socket calls recv send socket bind localhost sockaddr_in() listen accept peer sockaddr_in() socket connect recv send peer sockaddr_in() SERVERCLIENT

Fall Semester 2005CS-556: Distributed Systems Performance of TCP (I) 4.4BSD Implementation: UDP: ~800 LOC TCP: ~4,500 LOC CPU processing: checksums, data copying TCP ACKs: Receiver can piggyback the ACK Usually every second segment is ACKed.. May even delay ACKs (up to 0.5 sec) Connection setup: 3 segments 1 ½ RTT: SYN, SYN+ACK, ACK Connection tear-down: 4 segments FIN, ACK, FIN (server-to-client), ACK Except the last segment, these can be combined with data-bearing segments

Fall Semester 2005CS-556: Distributed Systems Performance of TCP (II) Results from a benchmark involving transmission of 5,000 data blocks UDP datagram size=TCP write size=1,440 bytes  Ethernet frame: 1,500 bytes  IP header: 20 bytes, TCP header: 20 bytes  TCP options: 12 bytes Average over 50 runs Client produces data blocks, transmits them, and then exits Server may run on localhost ( ) Same host as the client, but given as an address Other host

Fall Semester 2005CS-556: Distributed Systems Performance of TCP (III) ServerTCPUDP timeMB/sectimeMB/secdrops Client Localhost Remote Localhost (loop-back): MTU=16,384 Client (network I/f): MTU=1,500

Fall Semester 2005CS-556: Distributed Systems Performance of TCP (IV) ServerTCPUDP timeMB/sectimeMB/secdrops Client Remote Results for write size=300 bytes

Fall Semester 2005CS-556: Distributed Systems Avoid re-inventing TCP !! Retransmissions ? RTO  Must be adjustable Exponential back-off Flow control  Sliding window Congestion control Matching replies to requests ? Sequence # for each request Efficiency of the implementation ? TCP code base is highly optimized … and runs in kernel-space

Fall Semester 2005CS-556: Distributed Systems LAN vs WAN testing Performance on the WAN may not be satisfactory, due to the extra latency … may have to reconsider the design Incorrect code is more likely to be triggered on the WAN … assumptions on volume/rate of arriving data

Fall Semester 2005CS-556: Distributed Systems HTTP GET// 1.1 URL or pathnamemethodHTTP versionheadersmessage body HTTP/1.1200OK resource data HTTP versionstatus codereasonheadersmessage body Resource := MIME-encoded data Content negotiation Authentication Methods: GET, HEAD, POST PUT, DELETE, TRACE, OPTIONS

Fall Semester 2005CS-556: Distributed Systems Tools (I) ping IP header + ICMP echo request/reply tcpdump Network analyzer – “sniffer” traceroute Determine the network path by forcing each intermediate router to send an ICMP error message to the originator  Send a UDP datagram with TTL=1 - so that the 1 st router in the path will discard it !  Send a 2 nd UDP datagram with TTL=2 – so that the 2 nd router in the path will discard it !  …  At the last hop, TTL=1 & an attempt is made to deliver the datagram (generating the ICMP error message “port unreachable”)

Fall Semester 2005CS-556: Distributed Systems Tools (II) ttcp Benchmarking tool, with –many- parameters  UDP or TCP transfers, buffers, size of read/write’s lsof Determine which process has a “file descriptor” open (file or socket)  lsof –i TCP:6000  lsof netstat Active sockets: netstat –af inet Interfaces: netstat –i Routing table: netstat -rn Protocol statistics: netstat –sp tcp System call tracers: strace, truss, ktrace

Fall Semester 2005CS-556: Distributed Systems Resources Books: Richard Stevens:  TCP/IP illustrated series Protocols, Implementation, T/TCP/HTTP/NNTP/Domain Sockets  UNIX Network Programming series Networking APIs: Sockets, XTI Interprocess Communication J.C. Snader: “Effective TCP/IP Programming” RFCs: