Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 1: Introduction Spring 2007 Alex Shraer
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Last on Models –Synchronous and Asynchronous –Failure models (a little…) Specifications –Liveness and Safety The Coordinated Attack Problem Note: The proofs on the board are included in the course’s material –Yes, you should know them for the exam
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Safety and Liveness The properties are verifiable on an execution’s trace Safety = a property always happens –Closed under all prefixes Liveness = a property eventually happens
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Safety and Liveness A safety property cannot be “fixed” after it is violated. You can always extend a trace to satisfy a liveness property. finite refutation: if every counter-example trace for a property p has a finite prefix in which p does not hold, then p is safety. On the other hand, you can always extend a trace to satisfy a liveness property.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Safety/Liveness/Both/None? Consider a partial elevator spec: The elevator will not stop in between floors. The elevator may break after the 1 st year of use If someone summons the elevator to some floor: –The elevator will eventually stop. –The elevator reaches that floor no later than 1 minute later
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Safety/Liveness/Both/None? ראש השנה חל בדיוק פעם אחת בשנה. היועץ המשפטי יוכל להתמנות לתפקיד שופט. נהג שצבר שלוש עבירות תנועה לא ינהג לפני שיעבור קורס נהיגה מונעת. אף תהליך לא יכול להשתמש ב CPU במשך זמן אינסופי. כל קריאה לפונקציה חוזרת. המשטרה תפנה את הצומת החסום, אבל זה ייקח לה לפחות חצי שעה. המשטרה תפנה את הצומת החסום תוך חצי שעה לכל היותר. מספר המנעולים שאחראי הבניין מתקין במהלך השנה הוא לפחות 30.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack Let’s attack A B
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring The Model: Synchronous with Message Loss Message loss can be detected –Bounded delay, timeouts Message loss is unbounded –In some runs, all the messages are lost
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Coordinated Attack Definition (Reminder) Requirements: –both generals must decide the same: either to attack or not to attack –if both are not ready to attack they must not attack –if both are ready to attack and no messages are lost then they must attack Still cannot be achieved!
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Properties of Coordinated Attack Agreement: If both generals decide, they decide the same. Termination: Every general eventually decides. Validity: –If both inputs are “not ready” then no general decides “attack” –if both inputs are “ready” and every message sent is delivered then no general decides “no-attack”.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring What happens if? (cont’d) Weak Termination: If there are no message losses, then all processes eventually decide. We want an algorithm that solves the problem where Agreement, Weak Termination and Validity are required.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring What happens if? (cont’d) Unanimous Termination: If any process decides, then all processes eventually decide. We want an algorithm that solves the problem where Agreement, Weak Termination, Unanimous Termination and Validity are required. Homework
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Where’s the difference? Why couldn’t we use the proof from class when only Weak Termination was used?
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Stronger Models Bounded loss rate – take 1 –At most, 10 messages are lost on each channel (from general A to general B and vice versa). Is it enough?
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Interfaces (Reminder) There are two generals A, B. Each has an input inp A, inp B {“ready”, “not ready”} Possible actions for Q {A, B}: –Decide Q (v), v {“attack”, “no attack”} (Output) –Send Q (m), m {“yes”, “no”} (Output) –Deliver Q (m) (Input)
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Suggested Algorithm Each general performs the following: –Repeat 11 times: Send(inp) –Upon Deliver(m) Decide(this.inp & m.inp) Or any deterministic rule that matches validity –halt.
Eran Bergman & Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring To Summarize The exact model assumptions and the exact problem specification are critical –Minor changes in either lead to different results.