Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Similar presentations


Presentation on theme: "Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber."— Presentation transcript:

1 Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber

2 Two-Army Problem Unreliable Channel –Can’t Guarantee Correct Communication –Last Message May be Lost

3 Byzantine Generals Problem (1) Unreliable Processors (Traitors) –Report Incorrect Values (Troop Levels) 1 1 1 3 3 3 4 4 4 7 2 1

4 Byzantine Generals Problem (2) Loyal Generals Need to Verify Reports –Use Reports as Votes on Correct Values –That’s About It with the Color Diagrams 1,2,3,41,2,3,41,2,3,41,2,3,4 1,2,3,41,2,3,4 1,7,3,41,7,3,4 1,7,3,41,7,3,4 1,7,3,41,7,3,4 1,2,3,41,2,3,4 1,1,3,41,1,3,4 1,1,3,41,1,3,4 1,1,3,41,1,3,4 4,6,6,84,6,6,8 1,1,1,11,1,1,1

5 Distributed System 1.System of Processors 2.Connected In a Network 3.Running Independently 4.Solving Problems Together

6 Types of Failure 1.Unreliable Communication Channels 2.Processors Crash or Create Mischief 3.Synchronizing Processors Atomic Broadcast 4.Problems Agreeing On Results Consensus

7 Scope of This Solution 1.Processors Can Crash Crashed Processors Never Recover Processors are Not Malicious 2.Reliable Communication Channels 3.Asynchronous Synchronize After a Finite Number of Steps 4.At Least One Processor is Correct Every Down Processor is Detected By at Least One Up Processor At Least One Up Processor is Detected By All Up Processors

8 Failure Detectors Attached to Each Processor Determine the Crash State of Some Processors –Processors Communicate Crash State Information Imperfect –Suspect Processors Crashed –Slow Processors Might Become “Unsuspected” –Cause Host Processor to Abandon Other Processors

9 Completeness & Accuracy Completeness –Down Processors are Abandoned Accuracy –Up Processors are Not Abandoned

10 Function Definitions abandons(p, q, t) –Processor p Abandons Processor q at Time t isDown(q, t) –Processor q is Really Down at Time t

11 Completeness Strong Completeness –Every Down Processor is Abandoned by Every Up Processor Eventually –  p,  q,  t 0,  t > t 0 : isDown(q, t)  abandons(p, q, t) Weak Completeness –Every Down Processor is Abandoned by At Least One Up Processor Eventually –  p,  q,  t 0,  t > t 0 : isDown(q, t)  abandons(p, q, t)

12 Accuracy Strong Accuracy (Perpetual/Eventual) –Every Up Processor is Not Abandoned by Every Processor Ever/Eventually –Perpetual:  p,  q,  t: isDown(q, t)  abandons(p, q, t) –Eventual:  p,  q,  t 0,  t > t 0 : isDown(q, t)  abandons(p, q, t) Weak Accuracy (Perpetual/Eventual) –At Least One Up Processor is Not Abandoned by Any Processor Ever/Eventually –Perpetual:  p,  q,  t: isDown(q, t)  abandons(p, q, t) –Eventual:  p,  q,  t 0,  t > t 0 : isDown(q, t)  abandons(p, q, t)

13 Classes of Failure Detectors Strong Perpetual Accuracy Weak Perpetual Accuracy Strong Eventual Accuracy Weak Eventual Accuracy Strong Completeness PS PP SS Weak Completeness QW QQ WW 8 Combinations of Completeness and Accuracy

14 Reducibility (Emulation) Some Classes are More Powerful Than Others –Strong Complete Can Emulate Weak Complete Some Classes Can Emulate Others Using an Algorithm: –Up Processors Share Lists of Abandoned Processors, Exclude Themselves –Abandoned by One Becomes Abandoned by All –Weak Complete Can Emulate Strong Complete

15 Completeness Classes Are Equivalent Strong Perpetual Accuracy Weak Perpetual Accuracy Strong Eventual Accuracy Weak Eventual Accuracy Strong Completeness PS PP SS Weak Completeness QW QQ WW 4 Distinct Accuracy Classes

16 Relationship of Accuracy Classes Perpetual is More Powerful Than Eventual –Perpetual:  t –Eventual:  t 0,  t > t 0 Strong is More Powerful Than Weak –Strong:  q –Weak:  q

17 Relationship of Failure Detector Classes Strong Perpetual Accuracy Weak Perpetual Accuracy Strong Eventual Accuracy Weak Eventual Accuracy Strong Completeness PS PP SS Weak Completeness QW QQ WW P is Most Powerful;  S is Least Powerful

18 The Consensus Problem Processors Reach Agreement on a Value –Termination: All Up Processors –Agreement: All Agree to Same Value –Integrity: Decision is Final –Validity: A Proposed Value is Chosen If They Can Agree on One Thing, They Can Agree on Anything Algorithms for S and  S Detectors –At Least One Up Processor Using S Detectors –A Majority of Up Processors Using  S Detectors

19 Algorithm for S Detectors S Detectors – At Least One Up Processor is Not Abandoned by Any Up Processor Ever 1.Collect Proposed Values from Each Processor –or the News That the Process Crashed 2.Collect Other Processors’ Knowledge of Proposed Values –Discard Values not Known to All 3.Pick (Consistently) a Value from Known Values All Processors Get Phase 1 & 2 Information from the Processor That is Never Abandoned

20 Algorithm for  S Detectors Rotating Coordinator –Each Processor Takes Their Turn –Tries to Make Decision –If the Processor is Up and is Not Abandoned by Any Up Processor, the Decision is Made

21 Each Round of  S Algorithm At Least One Up Processor is Not Abandoned by Any Up Processor Eventually 1.All Processors Send Value and the Round Number to Coordinator 2.Coordinator Waits for a Majority and Sends the Value with the Latest Round Number to All Processors 3.Each Processor Indicates If It Abandoned Coordinator 4.Coordinator Waits for a Majority, If No Processor Abandoned Coordinator, the Value is Decided Repeat Until Coordinator is Not Abandoned Eventually

22 Atomic Broadcast All Processors Receive the Same Messages in the Same Order Atomic Broadcast is Equivalent to Consensus –Each Can Be Reduced to the Other –Solution to Consensus Applies to Atomic Broadcast

23 Atomic Broadcast Reduces to Consensus Atomic Broadcast Can Be Implemented Using a Consensus Algorithm –Each Processor Proposes a Message –Consensus is Used to Decide Which Message is Recognized as the Next Atomically Broadcast Message

24 Consensus Reduces to Atomic Broadcast Consensus Can Be Implemented Using An Atomic Broadcast Algorithm –To Decide a Value, a Process Atomically Broadcasts It –Go to Lunch Early

25 Summary Reliable Distributed Systems Unreliable Failure Detectors Relationship of Detector Classes Algorithms for Consensus Equivalence with Atomic Broadcast


Download ppt "Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber."

Similar presentations


Ads by Google