Download presentation
Presentation is loading. Please wait.
Published bySheldon Borron Modified over 9 years ago
1
BASIC BUILDING BLOCKS -Harit Desai
2
Byzantine Generals Problem If a computer fails, –it behaves in a well defined manner A component always shows a zero at the output or simply stop execution –It behaves arbitrarily Sends totally different information to different components with which it communicates The problem of reaching an agreement in a system where components can fail in an arbitrary manner is called byzantine generals problem
3
Interactive Consistency Problem Each node makes decision based on the values it gets We require all non-faulty nodes to make same decision So, the goal is that all non-faulty nodes gets the same set of values Hence, consensus can be achieved But, a faulty node may send different values to different nodes
4
Transmitter Node i Node j 1 0 1 Transmitter 1 0 0
5
Protocols with ordinary messages Requirements n >= 3m+1 where, n = total number of nodes m = number of faulty nodes Assumptions about message passing system –Every message that is sent by node is delivered correctly by the message passing system to the receiver
6
Assumptions – Continued… –The receiver of a message knows which node has sent the message –Absence of a message can be detected
7
Interactive consistency algorithm Algorithm ICA(0) 1) The transmitter sends its value to all the other N-1 nodes. 2)Each node uses the value it receives from the transmitter or uses the default value.
8
Algorithm ICA(m), m>0 1) The transmitter its value to all the other n-1 nodes. 2)Let Vi be the value the node i receives from the transmitter, or else be the default value if it receives no value sends.Node i acts as the transmitter in algorithm ICA(m-1) to send the value Vi to each of the other n-2 nodes 3)For each node i, let Vj be the value received by by the node j (j != i). Node i uses the value majority(V1, v2,….,Vn-1).
9
Protocol with signed messages Algorithm SM(m) Initially Vi = null 1) The transmitter signs its value and sends to all other nodes. 2) For each i : (a) If a node i receives a message of the form v from the transmitter then (a1)it sets Vi to {v}, and (a2)it sends the message v:0:i to every other node.
10
continued……. (b) If node i receives a message of the form v:0:j1:j2: …. :jk and v is not in Vi, then (b1) it adds v to Vi, and (b2) if k<m it sends the message v:0:j1:j2: …. :jk:i to every node other than j1, j2,….,jk. 3)For each i: when node i will receive no more messages, it considers the final value as choice(Vi)
11
Clock synchronization Problems –clocks of different nodes have different times and may be running at different speeds. –communication will induce delay between sending and receiving of the message. –networks delays can vary. –clocks may be faulty(dual-faced).
12
Requirements of clock synchronization for a nonfaulty clock Ci |dCi/dt –1| < $ where $ is of the order of 10e-5 at any time,the value of all the nonfaulty processors’ clocks must be approximately equal |Ci(t) – Cj(t)| <= b b = constant there is a small bound by which a nonfaulty clock is changed during resynchronization.
13
Synchronization protocols Deterministic protocols –clock synchronization conditions and bounds are guaranteed. –but, they require some assumption about message delays. Probabilistic protocols –does not require any assumptions about message delays. –but guarantees precision only with a probability.
14
Deterministic Clock Synchronization all clocks are initially synchronized to approximately the same value. |Ci(t0) – Cj(t0)| < b each process can communicate directly with any other process. if a process sends a message at real-time t and received at t’, then message delay is t’- t. message is delivered in [&-e, &+e] time, for fixed & and e, with &>e.
15
Th algo works in rounds… The i th rounds is triggered when the clock reaches Ti. When process j reaches Ti, it broadcasts a message containing Ti. It also collects i th round messages for a bounded amount of time and records their arrival times according to its local clock. This waiting period is to ensure that correct process will send a message in this waiting period.
16
Bounded waiting time Process j ‘s clock reaches Ti. Process k’s clock will reach Ti within a time b At this time k will broadcast Ti to all processes Message delay = &+e J receives k’s message at (b+&+e) after it own clock reaches the value Ti. Clock rates may differ by $ from real time. So, the bounded time, within which j should receive the message of k containing Ti is (1+$)(b+&+e) Once this time is elapsed, the process must have received messages from all non-faulty processes.
17
A process then calculates averaging function from the set of arrival times. By this averaging function it switches it logical clock to new value. It then waits for t time to execute next round.(Ti+t). Averaging function: there are atmost f faulty process, so the averaging function discards the top f and bottom f values from the set of values and then it takes the mid-point of the remaining values.
18
algorithm $ = bound on clock drift. b = bound on how far apart the clocks are initially &, e = bounds on message delays. t = period between rounds. CORR = correction variable, initially empty ARR = array containing arrival of most recent messages. NOW = represents the current logical clock time. reduce = function on array and returns middle values. mid = returns mid-point of a set of values.
19
Do forever /* in case messages are received before the process reaches Ti */ while u = (m,k) do ARR[k] = NOW /*fall out of loop when u = START or TIMER and begin round */ T := NOW broadcast(T) set-timer(T +(1 + $)(b + & + e) while u = (m,k) do ARR[k] = NOW /*fall out of loop when u =TIMER; end round */ AV := mid(reduce (ARR) ADJ := T + & - AV CORR := CORR + ADJ set-timer(T + t) End do
20
Probabilistic clock synchronization Assumes that dual-faced clocks do not exist. Clocks are assumed to be correct Message delay are unbounded but there is minimum delay (min) that exists. To read the clock of process i,process j sends a message to i.When I receives this message, it replies with a message(T). Round trip delay for receiving i’s clock,as measured by j’s clock is 2D. Process j can make some estimation of the time at node i.
21
Let t be the real-time when j receives a reply from i and 2d be the real-time round trip delay. The time of receipt of message, according to I’s clock has tobe more than T +min(1-$), where $ is the bound on the drift of the clocks. Maximum delay of the return message is 2d – min. Clock time maximum delay is ( 2d – min )(1 + $). Also, 2d <= 2D(1+$). Hence maximum clock time delay is –2D(1+$)(1+$)-min(1+$) = 2D(1+2$)-min(1+$) So, j can infer that at time j receives the message from i, i’s clock is in the range : –[T+min(1+$), T+2D(1+2$-min(1+$)]
22
Now, j has to select its value in the interval as i’s clock value. The error is minimised if mid-point value is selected. Maximum error possible E =D(1+2$)-min. Maximum error value E can be taken as the precision with which j can read the value of the i’s clock. Shorter the round trip delay,better is the precision of reading a clock. If a process j wants to read the clock of another process i with specified precision e, it must discard all reading attempts in which the round trip delay is greater than 2U, where U = (1-2$)(e+min).
23
Stable storage Computer system has some stable storage whose contents are preserved despite failures. Failures that these techniques cannot handle… –Transient failures: these cause the disk to behave unpredictably for short period of time. –Bad sector: page becomes corrupted, and data stored in it cannot be read. –Controller failure: the disk controller fails. –Disk failure: entire disk becomes unreadable.
24
Undesirable results of a read(a) operation are: –Soft read error: page a is good,but read returns bad.this situation may not persist for long,and is caused by transient failures. –Persistent read error: page a is good,but read returns bad, and successive reads also returns bad.(bad sector) –Undetected error: page a is bad but returns good,or page a is good but returns different data. Undesirable results of a write(a,d)operation are: –Null write: page a is unchanged. –Bad write: page a becomes (bad,a).
25
Decay events Corruption: A page goes from (good, d) to (bad, d). Revival: A page goes from (bad, d) to (good, d). Undetected error:A page changes from(s,d) to (s,d’) with d<>d’.
26
Implementation Using one disk: –CarefulRead: read is performed repeatedly until it returns the status good, or the page cannot be read after certain amount of tries. –CarefulWrite: performs a write followed by read until read returns the status good. –But, this cannot take care of decay events. Stable storage is represented as by an ordered pair of disk pages. –StableRead: performs a CarefulRead from one of the paired pages, and if the result is bad, performs a CarefulRead from the other. –StableWrite:performs CarefulWrite to one of the representative pages first.When the operation is completed, it performs a CarefulWrite to the other page.
27
This takes care of the decay events. A crash during StableWrite may cause two pages to differ. To handle this, cleanup operation is performed. Do a CarefulRead from each of the two representative pages If both return good and same data then Do nothing Elseif one returns bad then Do a CarefulWrite of data from good page to bad page. Elseif both return good, but different data then choose either one of the page and do a CarefulWrite of its data to the other page
28
Disk shadowing It is technique for maintaining a set of identical disk images on separate disk devices. Primary purpose is to increase reliability amd availability. Consider a case of two disks(mirrored disks) –Total failure occur only if both disk fail. MTTFm = MTTF/2 * MTTF/MTTR
29
Redundant Arrays of Disks Data is spread over multiple disks using “bit-interleaving”. Bit-interleaving provides high I/O performance. But are not reliable,since the failure of any disk can cause entire data to become unavailable. So, disks are partitioned in groups. –Each group has some data disk and some check disk. –Number of check disk depends on the coding technique used. –Say, check disk stores parity then one one disk is required.
30
Failure of RAID occurs only if more than one disk fails. –Assume that a RAID consists of only one group of disks, with G data disks and C check disks. –If failure and repair of disks are exponential distributed, then mean time to failure of the group = MTTF/(G+C) * (MTTF/(G+C-1))/MTTR
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.