Accessing Remote Sites CS 188 Distributed Systems January 13, 2015
Deutsch's “Seven Fallacies of Distributed Computing” 1. The network is reliable 2. There is no latency (instant response time) 3. The available bandwidth is infinite 4. The network is secure 5. The topology of the network does not change 6. There is one administrator for the whole network 7. The cost of transporting additional data is zero Bottom Line: true transparency is not achievable
Introduction Distributed systems require nodes to talk to each other How do they do that, mechanically? To achieve various effects? Synchronization of computations Data passing File systems
Messages The ultimate answer to all of these problems At the bottom, it’s all done with messages But messages are very general So even given you’re using messages, there are many options
Key Characteristics of Messages They are explicitly sent Someone asks to send them They have well defined contents Under control of the sender They are all-or-nothing Entire message is delivered or nothing is Receipt is usually optional Receiver must ask to get the message Always unidirectional, usually unicast
Synchronizing Computation With Messages for (i=0; i< 10; i++) { for (j=0; j< 10; j++) { /* do stuff */ . . . } Tell machine B that loop is done Done! Machine A Machine B
Conceptually Simple Messages can’t be received before they’re sent So receiving process can’t start computation till after the sender sends the message Sender doesn’t send until he’s ready for receiver to go ahead
Tricky Issues Delivery isn’t reliable What if the message doesn’t get there? Delivery speed is not predictable So what? But . . . Assumes receiver follows the rules Doesn’t start till he gets the message Does receive the message some reasonable time after its delivery Security issues
What If the Sender Needs Feedback? Once B starts, do next task for (i=0; i< 10; i++) { for (j=0; j< 10; j++) { /* do stuff */ . . . } Tell machine B that loop is done Done! I started This will require a second message Machine A Machine B From B to A, this time
So What? There is now a round trip delay A to B Then B to A Two opportunities for messages to get lost Two chances for member processes to screw up
So Why Wait? Machine A Machine B Done! Once B starts, do next task for (i=0; i< 10; i++) { for (j=0; j< 10; j++) { /* do stuff */ . . . } Tell machine B that loop is done Done! Machine A Machine B
Results of Not Waiting Less delay No more two round trip delays No certainty about event ordering Did A start its new code before or after B got the message? What if messages are lost, B fails, etc.?
Lessons From Operating System Synchronization Ensuring proper ordering of events is critical to predictable behavior Thinking about proper synchronization is hard Getting proper synchronization usually requires blocking With bad performance implications And possible deadlocks
Distributed System Implications Total correctness will be hard and expensive to achieve Getting acceptable results without total correctness is desirable For complex scenarios, likely to be many possible outcomes Some likely to be unpredictable
Data Passing With Messages Many distributed computations require local processes to share data Often with remote processes Requires moving data to the other machine Again, all you’ve got is messages
Moving Data Machine A Machine B Seems simple enough But . . . Alpha Beta Gamma . Omega Alpha Beta Gamma . Omega Seems simple enough Machine A But . . . Machine B
Some Complications What if the data won’t fit in one message? We must use multiple messages Leading to issues of ordering, losses, knowing when you’re done, etc. What if the receiver doesn’t have room to store all of it? Are we keeping a copy at the sender, as well? If so, are both copies writeable? If so, what happens if someone writes?
File Systems With Messages Like data passing, in most ways But the inherent persistence of the data raises issues Particularly for data that can be written Either on the original site or the accessing site
Accessing Files With Messages Someone has file X open Open file x for read File X Open file X Now what? Machine A Machine B
Either machine can fail at any time Reading Note a few things Someone has file X open Read file x File X Read file X Either machine can fail at any time The data is still on machine A But there’s a copy on machine B Maybe more than one Machine A Machine B
What about machine A’s original copy? Writing Someone has file X open Read file x Write file x File X Write file X What about machine A’s original copy? Machine A Machine B
Some Obvious Complexities What if the write message is lost? What if A reads the file before the write arrives? But after B does the write? Is that good, bad, or ambiguous? What if A writes the file after B reads it, but before the write message arrives? There are many other complexities
Some Security Issues A message arrives purporting to come from machine X Perhaps you would do the requested action for machine X, but . . . Is it really machine X asking? If machine X is asking for something, is it really what the message says? If you send a response, how can you be sure only machine X gets it?
Basic Solution Authenticate the message Obtain evidence that the message came from its purported sender And that it wasn’t altered in transit Don’t take actions without suitably strong authentication
Message Authentication In rare cases, the network is closed E.g., it’s a direct wire to machine X Nobody else can inject the message, so it must come from X More commonly, use cryptographic methods Which we won’t cover in detail here But typically require secure key distribution
One Unhandled Issue “If you send a response, how can you be sure only machine X gets it?” Authentication doesn’t help here If you’re sending out secret information, it isn’t enough that X asked Only X should see the answer Usually handled by encrypting the message With same key issues as above
One Other Security Issue Can an attacker prevent a given message from being received? A problem of denial of service Often achieved by causing congestion Cryptography doesn’t help here
Basic Data Transport Machine A asks machine B for some data In general, more than can be held in one message How do we handle that?
In distributed systems, it rarely does A Simple Approach Data Item X X (Part 1) Data Item X X (Part 2) X (Part 3) X (Part 4) Send me X Assuming everything goes well . . . Machine A Machine B In distributed systems, it rarely does
One Possible Problem Machine A Machine B Data Item X X (Part 1) Data Item X X (Part 2) X (Part 3) X (Part 4) Send me X How do we know something bad happened? Machine A Machine B What remedial action can we take?
Options For Lost Messages Detect loss and cancel operation Detect loss and request retransmission Go ahead without the lost data Ensure redundancy in data sent and regenerate lost data on that basis Proper choice depends on system specifics
Another Possible Problem X (Part 1) Data Item X X (Part 2) X (Part 3) X (Part 4) Send me X How to handle out-of-order delivery? Machine A Machine B
Options for Misordered Delivery Wait And, possibly, wait, and wait, and wait How long do you wait? Assume misordered message was dropped And take an action appropriate for drops If it comes in eventually, ignore it On detection, ask the sender to retransmit
Another Potential Problem What if X is big? Really big Perhaps machine A doesn’t know how big What’s really going on at machine A?
Handling the Incoming Data Data must be stored somewhere In RAM, at first Perhaps on disk/flash later As each message holding part of X arrives, its content must be stored Machine A must allocate buffer space for that data How much, for how long?
Illustrating the Problem Send me X X, part 7 X, part 5 X, part 8 X, part 9 X, part 6 X, part 3 X, part 4 X, part 1 X, part 2 Now what? Buffer for X Machine A
Another Wrinkle The problem is actually worse at the system level Every incoming message must be put in a buffer In the OS or device driver or elsewhere Stays in the buffer until the application process “receives” it What if the messages arrive too fast and fill all buffers?
What Are Your Options? When you have used up either the application or system buffers The same as in the network Store until you run out of storage Send it back to where it came from Drop something You can make things better by asking for help, though
Asking For Help IP doesn’t ask for help It must deal with each packet on its own At a higher level, though, help is possible Flow control Ask the sender to slow down
Where Do We Do Flow Control? Almost always end-to-end But “end” has a flexible definition Could be “end machine” TCP flow control Could be “end application” Application level flow control
A Brief Diversion Why not flow control in IP? Based, in part, on the end-to-end argument An argument in network and system design Put application functionality into the application end points, not the network E.g., if you need flow control for your application, put it in the application Important caveat – IF the endpoints can actually do it
Returning to the Core Problem We are receiving data We need to store (at least temporarily) the data we receive How do we make sure we can do so? Ideally without wasting resources
Possible Answers Figure out how much data will be sent before you set aside space So you definitely have enough space “Use” some of the data as you go along Allowing you to reuse buffer space Allocate more space as you need it Tell sender to stop when you can’t hold more data
Making Life Even Worse The problems we’ve seen occur with only two participating nodes One is a wonderful magic number in distributed systems Everything easier when there’s only one of whatever we’re considering Two is a fairly wonderful magic number It’s either here or there, which simplifies life Three and above aren’t wonderful or magic They make life hard
One Example Machine A Machine B Machine C File X Read file X
What about Machine C’s copy? So Far, So Good, But . . . File X Write file X Machine A Machine B What about Machine C’s copy? Machine C
Some Choices Send a message to C invalidating its copy Either from A or B Send a message to C updating its copy Don’t allow the situation at all E.g., don’t allow C to read X Don’t worry about it It’s C’s problem, if C cares
Implications of This Choice Making sure everyone knows what’s happening requires more messages Which could be delayed, lost, etc. Generally more processing and delay Not worrying about the state of other nodes requires fewer messages But those nodes may act on old data Could get weird behaviors Preventing problem from occurring reduces system’s utility and power But at low cost and with fewer surprises
Which Choice Is Best? It all depends What is your model of system behavior? How important is consistency and what are the effects of inconsistency? Remember to consider failure models What if some message is lost/delayed? Different distributed systems deal with the issue in different ways
Another Complexity “When sorrows come, they come not single spies, but in battalions” William Shakespeare, Hamlet, Act 4, Scene 5 Distributed systems suffer that problem, too You can’t be sure there’s only one fault Multiple independent or related faults are possible And will happen, sooner or later Will your solution to one be sabotaged by another?
Tying It Back To Distributed Systems Even the relatively simple issue of moving data around proves complex Highly desirable to understand how your system behaves Transparent solutions are nice But often too expensive An important design choice