Download presentation
Presentation is loading. Please wait.
Published byAlexander Johnston Modified over 9 years ago
1
CS-556: Distributed Systems Manolis Marazakis maraz@csd.uoc.gr Inter-process Communication (III)
2
Fall Semester 2005CS-556: Distributed Systems Berkeley Sockets (I) Socket primitives for TCP/IP. PrimitiveMeaning SocketCreate a new communication endpoint BindAttach a local address to a socket ListenAnnounce willingness to accept connections AcceptBlock caller until a connection request arrives ConnectActively attempt to establish a connection SendSend some data over the connection ReceiveReceive some data over the connection CloseRelease the connection
3
Fall Semester 2005CS-556: Distributed Systems Berkeley Sockets (II) Connection-oriented communication pattern using sockets.
4
Fall Semester 2005CS-556: Distributed Systems Connected vs Connectionless (I) IP best-effort, unreliable, connectionless Remembers nothing about a packet after it has sent it Checksum computed on header only No assumptions about the underlying physical medium Serial link, Ethernet, Token ring, X.25, ATM, wireless CDPD, … UDP: (optional) checksum notion of port
5
Fall Semester 2005CS-556: Distributed Systems Connected vs Connectionless (II) TCP reliable connection-oriented service Segments are sent in IP datagrams Checksum of data in each segment Sequence # of the 1 st byte in the segment Acknowledge-and-retransmit mechanism Each side maintains a receive window Range of sequence # that this side is prepared to receive Any arriving data with sequence # outsiode the receive window is discarded Queuing of data arriving out-of-order Window slides to the right, if the next expected sequence # has arrived … and an ACK is sent back with the sequence # expected next Send window: Bytes sent but not yet acknowledged RTO timer (retransnmission timeout) Timeout does not always mean that the data was lost !! Bytes that can be sent but have not yet been sent
6
Fall Semester 2005CS-556: Distributed Systems UDP Failure Model Omission failures timeouts duplicate messages lost messages Need to maintain history Last reply sent to each client provided that a client can make only one request at a time interprets each request as the ACK for the previous reply periodic ‘’purge’’ of history No ACK for the last response received before client terminates Fixed max. buffer size (8 KB) No message order guarantee Process crash failures
7
Fall Semester 2005CS-556: Distributed Systems TCP Failure Model Reliable message delivery checksums, sequence numbers, timeouts no need for applications to deal with retransmissions duplicates reordering no need for histories Flow control mechanism large transfers without overwhelming the receiver … BUT not reliable sessions: Connections may be severed or severely congested Processes cannot distinguish network from process failure Processes cannot tell if their recent messages were received
8
Fall Semester 2005CS-556: Distributed Systems TCP is a stream protocol No inherent notion of “message boundary” The amount of data in a packet is not directly related to the amount of data delivered to TCP in the send() call No reliable for the receiver to determine how the data was packetized Several packets may have arrived between recv() calls The amount of data returned in any given read() is unpredictable Fixed-length messages Variable-length messages End-of-record marker Fixed-length header (including record length) + variable data
9
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (I) “ TCP guarantees delivery of the data it sends ” True or False ? Guarantee to whom ? False … How can we handle outages & crashes ? TCP NIC IP NIC IP NIC IP TCP NIC IP Application (A)Application (B) User-space kernel-space
10
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (II) IP is a best-effort, unreliable protocol … so the TCP layer is the first place in the data path where it makes senses to even talk about guarantees The sender’s TCP layer can make no guarantee about segments that arrive at the receiver’s TCP layer An arriving segment may be corrupted, or it may contain duplicate data, or it may be out of order … The receiver’s TCP layer guarantees to the sender’s TCP layer that any segment that it ACKs & all data that came before it have been correctly received This does not mean that the data has been delivered to the application … ot that it will ever be delivered !! For example, the receiving host may crash after the ACK but before delivery …
11
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (III) It also makes sense to talk about guarantees at application B (receiver) There can be no guarantee that all data sent by application A will arrive However, all data that does arrive will be in order and uncorrupted Avoid the attitude that “TCP will take care of everything” TCP is an end-to-end protocol, providing a reliable transport mechanism between peers … The “peers” are the TCP layers of the sender & the receiver !!
12
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (IV) Explicit acknowledgements What does the client do if the server does not ACK receipt ?? It may not be safe to simply resend a request … Network outagePeer crashesPeer’s host crash When a problem occurs at an endpoint, there is generally no alternative path The problem persists until it is repaired An intermediate router may send the originator an ICMP message indicating that the destination network or the host is unreachable OR: The sender eventually times-out & resends the segments not ACKed. This continues until the sender gives up & drops the connection (~9 minutes). Pending read ETIMEDOUT Otherwise, the next write fails SIGPIPE or EPIPE
13
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (V) Peer crash: Indistinguishable from the case of the peer calling close() and then exit() The peer’s TCP layer issues a FIN segment This does not necessarily imply that the peer has no more data to send, or even that it is not willing to receive more data … Reception of the FIN may come at different execution states of the application If client is blocked, TCP has no way of notifying it The next transmission generates a RST segment ECONNRESET If the RST is ignored & more data is transmitted SIGIPE This may occur if the client performs >=2 consecutive write() calls without an intervening read() Notification takes place only after the 2 nd write() If client has a pending read(), it gets an immediate error indication (eg: read() returns EOF)
14
Fall Semester 2005CS-556: Distributed Systems TCP Failure Modes (VI) Peer’s host crash: The peer’s TCP cannot issue the FIN segment Until recovery, this case cannot be distinguished from a network outage The peer’s TCP no longer responds, but the sender keeps retransmitting … Until either the host recovers, or the sender gives up the connection ETIMEDOUT If the host reboots before the sender gives up, a retransmitted segment may arrive at the TCP layer … without it having knowledge of the connection RST If sender has a read() pending ECONNRESET Else, the next write() results in a SIGPIPE signal
15
Fall Semester 2005CS-556: Distributed Systems Behavior of Peers Checking for client termination Heartbeats, timeouts for read operations, SO_KEEPALIVE option, … Checking for valid input Buffer overflow errors
16
Fall Semester 2005CS-556: Distributed Systems We rely on DNS …
17
Fall Semester 2005CS-556: Distributed Systems The Message-Passing Interface Some of the most intuitive primitives of MPI. PrimitiveMeaning MPI_bsendAppend outgoing message to a local send buffer MPI_sendSend a message and wait until copied to local or remote buffer MPI_ssendSend a message and wait until receipt starts MPI_sendrecvSend a message and wait for reply MPI_isendPass reference to outgoing message, and continue MPI_issendPass reference to outgoing message, and wait until receipt starts MPI_recvReceive a message; block if there are none MPI_irecvCheck if there is an incoming message, but do not block
18
Fall Semester 2005CS-556: Distributed Systems Group Communication Multicasting: 1-to-many comm. pattern Applications: replicated services (better fault tolerance) discovery of services replicated data (better performance) propagation of event notifications Failure model: depends on implementation: IP multicast (UDP datagrams): omission failures class-D Inet addresses: “1110” bit prefix TTL reliable multicast ordered multicast FIFO Causal Total
19
Fall Semester 2005CS-556: Distributed Systems Conventional Procedure Call a) Parameter passing in a local procedure call: the stack before the call to read b) The stack while the called procedure is active
20
Fall Semester 2005CS-556: Distributed Systems Software layers Applications and Services RPC and RMI request-reply protocol marshalling and external data representation UDP and TCP middleware RPC is more than a (transport) protocol: a structuring mechanism for distributed systems
21
Fall Semester 2005CS-556: Distributed Systems Steps of a Remote Procedure Call 1. Client procedure calls client stub in normal way 2. Client stub builds message, calls local OS 3. Client's OS sends message to remote OS 4. Remote OS gives message to server stub 5. Server stub unpacks parameters, calls server 6. Server does work, returns result to the stub 7. Server stub packs it in message, calls local OS 8. Server's OS sends message to client's OS 9. Client's OS gives message to client stub 10. Stub unpacks result, returns to client
22
Fall Semester 2005CS-556: Distributed Systems Client and Server Stubs Principle of RPC between a client & server program.
23
Fall Semester 2005CS-556: Distributed Systems Example (Sun RPC - ONC) long square(long) example client ren.eecis.udel.edu 11 result: 121 Need RPC specification file (square.x): defines procedure name, arguments & results Run rpcgen square.x: generates square.h, square_clnt.c, square_xdr.c, square_svc.c square_clnt.c & square_svc.c: Stub routines for client & server square_xdr.c: XDR (External Data Representation) code - takes care of data type conversions
24
Fall Semester 2005CS-556: Distributed Systems RPC Specification File (square.x) struct square_in { longarg1; }; struct square_out { long res1; }; program SQUARE_PROG { version SQUARE_VERS { square_out SQUAREPROC(square_in) = 1;// procedure # } = 1;// version # } = 0x321230000;// program # IDL – Interface Definition Language
25
Fall Semester 2005CS-556: Distributed Systems Parameter Specification & Stub Generation procedureCorresponding message
26
Fall Semester 2005CS-556: Distributed Systems Writing a Client & a Server The steps in writing a client & a server in DCE RPC.
27
Fall Semester 2005CS-556: Distributed Systems Binding (SUN RPC) Port Mapper (rpcbind) listens at UDP port 111 Server registers program ID & version rpcinfo -p -> display all registered RPC servers When client issues clnt_create, the port mapper is contacted: program-to-port number mapping arguments: (program ID, version, protocol) response: server’s port number
28
Fall Semester 2005CS-556: Distributed Systems Binding (DCE)
29
Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (I)
30
Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (II) a. Original message on Pentium (little-endian) b. The message after receipt on SPARC (big-endian) c. The message after being inverted.
31
Fall Semester 2005CS-556: Distributed Systems Passing Value Parameters (III) How to pass pointers ? Meaningful only within a specific address space ! Arrays (of known length) & structures: Copy/restore semantics (bet. stubs) IN/OUT/INOUT markers Optimization: may eliminate one copy operation Pointer to an arbitrary data structure ? No general solution Work-around: Pass back the pointer to its “source”
32
Fall Semester 2005CS-556: Distributed Systems External Data Representation (I) Data structures: “flattened” on transmission rebuilt upon reception Primitive data types: byte order (big-endian: MSB comes first) ASCII vs UNICODE (2 bytes per character) marshalling/unmarshalling to/from agreed external format
33
Fall Semester 2005CS-556: Distributed Systems External Data Representation (II) XDR (RFC 1832), CDR (CORBA), Java: data -> byte stream object references HTTP/MIME: data -> ASCII text IP addressporttimeobject IDinterface ID
34
Fall Semester 2005CS-556: Distributed Systems CORBA CDR example: The flattened form represents a Person struct with value: {‘Smith’, ‘London’, 1934} 0–3 4–7 8–11 12–15 16–19 20-23 24–27 5 "Smit" "h___" 6 "Lond" "on__" 1934 index in sequence of bytes4 bytes notes on representation length of string ‘Smith’ length of string ‘London’ unsigned long
35
Fall Semester 2005CS-556: Distributed Systems Properties of TCP Connected vs Connectionless Protocols TCP is a stream protocol Performance of TCP Avoid re-inventing TCP !! TCP failure modes Behaviour of peers LAN vs WAN testing Tools & Resources
36
Fall Semester 2005CS-556: Distributed Systems Basic socket calls recv send socket bind localhost sockaddr_in() listen accept peer sockaddr_in() socket connect recv send peer sockaddr_in() SERVERCLIENT
37
Fall Semester 2005CS-556: Distributed Systems Performance of TCP (I) 4.4BSD Implementation: UDP: ~800 LOC TCP: ~4,500 LOC CPU processing: checksums, data copying TCP ACKs: Receiver can piggyback the ACK Usually every second segment is ACKed.. May even delay ACKs (up to 0.5 sec) Connection setup: 3 segments 1 ½ RTT: SYN, SYN+ACK, ACK Connection tear-down: 4 segments FIN, ACK, FIN (server-to-client), ACK Except the last segment, these can be combined with data-bearing segments
38
Fall Semester 2005CS-556: Distributed Systems Performance of TCP (II) Results from a benchmark involving transmission of 5,000 data blocks UDP datagram size=TCP write size=1,440 bytes Ethernet frame: 1,500 bytes IP header: 20 bytes, TCP header: 20 bytes TCP options: 12 bytes Average over 50 runs Client produces data blocks, transmits them, and then exits Server may run on localhost (127.0.0.1) Same host as the client, but given as an address Other host
39
Fall Semester 2005CS-556: Distributed Systems Performance of TCP (III) ServerTCPUDP timeMB/sectimeMB/secdrops Client2.882.51.963.67336 Localhost0.957.531.973.64272 Remote7.181.0025.821.23440 Localhost (loop-back): MTU=16,384 Client (network I/f): MTU=1,500
40
Fall Semester 2005CS-556: Distributed Systems Performance of TCP (IV) ServerTCPUDP timeMB/sectimeMB/secdrops Client1.051.411.630.91212 Remote1.550.9651.910.78306 Results for write size=300 bytes
41
Fall Semester 2005CS-556: Distributed Systems Avoid re-inventing TCP !! Retransmissions ? RTO Must be adjustable Exponential back-off Flow control Sliding window Congestion control Matching replies to requests ? Sequence # for each request Efficiency of the implementation ? TCP code base is highly optimized … and runs in kernel-space
42
Fall Semester 2005CS-556: Distributed Systems LAN vs WAN testing Performance on the WAN may not be satisfactory, due to the extra latency … may have to reconsider the design Incorrect code is more likely to be triggered on the WAN … assumptions on volume/rate of arriving data
43
Fall Semester 2005CS-556: Distributed Systems HTTP GET//www.dcs.qmw.ac.uk/index.htmlHTTP/ 1.1 URL or pathnamemethodHTTP versionheadersmessage body HTTP/1.1200OK resource data HTTP versionstatus codereasonheadersmessage body Resource := MIME-encoded data Content negotiation Authentication Methods: GET, HEAD, POST PUT, DELETE, TRACE, OPTIONS
44
Fall Semester 2005CS-556: Distributed Systems Tools (I) ping IP header + ICMP echo request/reply tcpdump Network analyzer – “sniffer” traceroute Determine the network path by forcing each intermediate router to send an ICMP error message to the originator Send a UDP datagram with TTL=1 - so that the 1 st router in the path will discard it ! Send a 2 nd UDP datagram with TTL=2 – so that the 2 nd router in the path will discard it ! … At the last hop, TTL=1 & an attempt is made to deliver the datagram (generating the ICMP error message “port unreachable”)
45
Fall Semester 2005CS-556: Distributed Systems Tools (II) ttcp Benchmarking tool, with –many- parameters UDP or TCP transfers, buffers, size of read/write’s lsof Determine which process has a “file descriptor” open (file or socket) lsof –i TCP:6000 lsof –i @remotehost.xdomain.net netstat Active sockets: netstat –af inet Interfaces: netstat –i Routing table: netstat -rn Protocol statistics: netstat –sp tcp System call tracers: strace, truss, ktrace
46
Fall Semester 2005CS-556: Distributed Systems Resources Books: Richard Stevens: TCP/IP illustrated series Protocols, Implementation, T/TCP/HTTP/NNTP/Domain Sockets UNIX Network Programming series Networking APIs: Sockets, XTI Interprocess Communication J.C. Snader: “Effective TCP/IP Programming” RFCs: http://www.rfc-editor.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.