Presentation is loading. Please wait.

Presentation is loading. Please wait.

Isis 2 guarantees Did you understand how Isis 2 ordering and durability work?

Similar presentations


Presentation on theme: "Isis 2 guarantees Did you understand how Isis 2 ordering and durability work?"— Presentation transcript:

1 Isis 2 guarantees Did you understand how Isis 2 ordering and durability work?

2 Group multicast In the Isis 2 programming manual, you’ll see that multicast primatives like g.Send allow you to specify the destinations for a message: you can call g.Send(rcode, List dests, args) True or false: g.Send(rcode, args) behaves exactly the same as g.Send(rcode, g.GetView().members, args)? SPQSPQ g.GetView().members = {S,P,Q}. Here S is the sender. Time 

3 Group Multicast False. When you use g.Send(rcode, args) to send a message to the entire group, Isis 2 uses a lock to synchronize sending with the group membership – This ensures that your message goes to all members in some view of the group (the sender gets a copy too) – In the second approach, a race could arise if some member was joining the group just as you do the send. SPQSPQ SPQRSPQR g.GetView().members = {S,P,Q}. Here S is the sender. The problem is that the g.Send() really is issued after the GetView() was evaluated. During that split second, R might have joined!

4 Fault-Tolerance for Multicasts Suppose that you use g.Send() to send a message and some member of the group, process p, receives it and prints the message to the local console (without doing any other Isis 2 system calls first). True or false: If g was in view {p,q,r} when p received the message, q and r will receive it too.

5 Fault-Tolerance for Multicasts False. – g.Send() is an optimistic delivery multicast – It doesn’t wait to be certain that every member has received a copy. Thus there are patterns of failures in which the sender could crash, and in which process p also crashes simultaneously, where q and r might never receive the message SPQRSPQR S sends a message, but then crashes P receives it. Then P crashes too QRQR … Q and R do not receive it … new view reports the failures

6 Fault-Tolerance for Multicasts True or False: In Isis 2 the only way to prevent the scenario seen here is to use the more costly g.SafeSend() SPQRSPQR QRQR

7 Fault-Tolerance for Multicasts As worded, False: The SafeSend protocol has a pessimistic mode of delivery: before any copy is delivered, every member logs the message. But you can also avoid the risk of an outcome like this one by using an optimistic protocol but calling the g.Flush() method before “talking” to the outside world. SPQRSPQR QRQR

8 Fault-Tolerance for Multicasts True or False: In Isis 2 there is a way to “mask” the optimistic delivery behavior so that no external observer could ever know it arose. SPQRSPQR QRQR

9 Fault-Tolerance for Multicasts True. If the recipient of a g.Send() calls g.Flush() before printing to the console, an outside observer will never see a non-atomic delivery. SPQRSPQR QRQR g.Flush() delays until delivery is finished P calls g.Flush() before printing anything. It has received the message, but until g.Flush finishes, doesn’t act upon it.

10 Ordering for Multicasts Suppose S uses g.Send to send M 0, then M 1 True or false: Every receiver gets M 0 before M 1 SPQRSPQR M0M0 M1M1

11 Ordering for Multicasts True. g.Send() preserves the “FIFO” or sender ordering. FIFO is short for “first in, first out” but in Isis 2 means “first sent, first delivered” SPQRSPQR M0M0 M1M1

12 Ordering for Multicasts True or false. Suppose that only the leader (0’th ranked member) sends multicasts in a group. Then with g.Send(), every member sees every multicast in the identical order. SPQRSPQR M0M0 M1M1 PQRPQR M2M2 M3M3

13 Ordering for Multicasts True. After the old leader fails and the new view is reported, we won’t see further messages from the old leader. Thus the ordering guarantee of Send (FIFO) is all we need, on a “view by view” basis. SPQRSPQR M0M0 M1M1 PQRPQR M2M2 M3M3 The group will never receive additional multicasts from S after S leaves the view

14 Ordering for Multicasts Suppose S uses g.Send to send M 0, and then much later in real-time, R sends M 1 True or false: Every receiver gets M 0 before M 1 SPQRSPQR M0M0 M1M1

15 Ordering for Multicasts False. Message loss or scheduling delays could affect delivery ordering. g.Send() only promises to maintain the ordering for messages that had the identical sender. These messages had different senders. SPQRSPQR M0M0 M1M1 P gets M 1, then M 0 All others get M 0 before M 1

16 g.OrderedSend Suppose S uses g.OrderedSend to send M 0, and then much later in real-time, R sends M 1 True or false: Every receiver gets M 0 before M 1 SPQRSPQR M0M0 M1M1

17 Every process gets M 3 M 2 even though M 2 was sent first g.OrderedSend False. With OrderedSend, every process gets messages in the identical order. But it might not be the real-time order in which messages were sent. So every process gets M 0 M 1, or every process gets M 1 M 0. SPQRSPQR M0M0 M1M1 M2M2 M3M3 Every process gets M 0 M 1

18 g.OrderedSend With OrderedSend, every process gets messages in the order they were sent, assuming they were sent by some single source. SPQRSPQR M0M0 M1M1 M2M2 M3M3

19 g.OrderedSend True. In this example, M 0 will always be delivered before M 2. And M 1 will always be delivered before M 3. OrderedSend respects the FIFO ordering, but might not follow a real-time (clock) ordering if the senders are different (like for M 2 and M 3 ) SPQRSPQR M0M0 M1M1 M2M2 M3M3

20 Fault-Tolerance for g.OrderedSend Suppose that you use g.OrderedSend() to send a message and some member of the group, process p, receives it and prints the message to the local console (without doing any other Isis 2 system calls first). True or false: If g was in view {p,q,r} when p received the message, q and r will receive it too.

21 Fault-Tolerance for Multicasts False. – g.OrderedSend() is an optimistic delivery multicast – It doesn’t wait to be certain that every member has received a copy. Thus there are patterns of failures in which the sender could crash, and in which process p also crashes simultaneously, where q and r might never receive the message SPQRSPQR S sends a message, but then crashes P receives it. Then P crashes too QRQR … Q and R do not receive it … new view reports the failures

22 Fault-Tolerance for g.OrderedSend Suppose that you use g.OrderedSend() to send a message and some member of the group, process p, receives it, but calls g.Flush() before any external action is taken. Then no external observer will ever observe a non-atomic delivery.

23 Fault-Tolerance for g.OrderedSend True. If the recipient of a g.OrderedSend() calls g.Flush() before printing to the console, an outside observer will never see a non- atomic delivery. SPQRSPQR QRQR P calls g.Flush() before printing anything. It has received the message, but until g.Flush finishes, doesn’t act upon it. g.Flush() delays until delivery is finished

24 Ordering for g.OrderedSend Like g.Send(), g.OrderedSend() will always select an ordering that preserves the sender (FIFO) order. g.OrderedSend() goes beyond what g.Send() offers by also ensuring that even with multiple concurrent senders, ever process sees messages in the same order.

25 Ordering for g.OrderedSend True. All Isis 2 multicast primitives maintain the sender order, even if a message is lost and must be retransmitted. SPQRSPQR M0M0 M1M1 Due to a message loss, delivering M 1 immediately would violate FIFO order. Isis 2 delays M 1 until after M 0 is retransmitted.

26 Ordering for g.SafeSend The ordering provided by g.SafeSend() is actually identical to that of g.OrderedSend().

27 Ordering for g.SafeSend True. The ordering provided by g.SafeSend() is actually identical to that of g.OrderedSend(). Both provide FIFO ordering for the sender. And both place concurrent multicasts from different senders into some fixed order, which cannot easily be predicted and might not be the real-time ordering, and then deliver in that fixed order.

28 g.OrderedSend vs. g.Send g.OrderedSend is always slower than g.Send, but faster than g.SafeSend

29 g.OrderedSend vs. g.Send False. g.OrderedSend adapts to the situation. It will be as fast as g.Send if there is just one sender in the group. But it will run slower if there are multiple concurrent senders. Even so, g.OrderedSend is always faster than g.SafeSend, which logs messages to a disk file before delivering them.

30 g.OrderedSend+g.Flush If g.OrderedSend() is used, and g.Flush() is called prior to interacting with external entities, the result is identical to g.SafeSend()

31 g.OrderedSend+g.Flush This issue is very subtle and goes beyond what was covered in the video. [Case A: true] In a soft-state (first tier) service this statement would be true. In such a situation, you cannot distinguish between g.OrderedSend+g.Flush and g.SafeSend, provided that g.Flush is called prior to interacting with the outside world.

32 g.OrderedSend+g.Flush [Case B: false] With a durable service things get more complex. In the event of a total failure where all group members crash concurrently, g.SafeSend maintains a log and will replay all messages after failure. g.OrderedSend doesn’t log its messages, hence no replay occurs. This matters when a group is used as the “front end” to a durable replicated database. In such cases we need g.SafeSend.

33 g.Flush(k) True or false: Calling g.Flush(k) causes Isis 2 : 1.To form a list of known active multicasts, namely messages that have been sent or received but that are still active in the system. 2.Check how many recipients each of the multicasts has. 3.And then wait until there are at least k recipients that have acknowledge receipt of each. Thus, after g.Flush(k) returns, a multicast can’t vanish from the system unless k concurrent failures occur, and they would need to include every one of the processes that acknowledged receipt.

34 g.Flush(k) True. Without a value specified for k, g.Flush() waits until every group member has received any pending multicasts. With k>1 specified, g.Flush() waits until k group members have acknowledged each pending multicast.

35 g.Flush(k) True or False: it makes no sense at all to call g.Flush(1).

36 g.Flush(k) True. The value of k tells Isis 2 how many “acknowledged copies” must exist of every pending multicast known to the process P at which the call to g.Flush(k) was done. But any multicast known to P clearly has k=1, since P itself has a copy. Thus k=1 is meaningless. We normally use k=2 or k=3.

37 Let’s try a harder case M x is an update and YT means “your turn”. We want them processed in the identical order. The idea is that S “owns” some variable X and sends updates M 0 and M 1. Then R takes over and sends updates. SPQRSPQR M0M0 M1M1 YT M2M2 M3M3

38 Let’s try a harder case Assume the service is running in the soft-state tier. Will this algorithm work properly if OrderedSend() is used for the M x and YT multicasts? SPQRSPQR M0M0 M1M1 YT M2M2 M3M3

39 Let’s try a harder case … Yes. OrderedSend() respects FIFO order, and as long as R waits to receive YT before sending M 2 M 3, every process will receive every multicast in the identical order. SPQRSPQR M0M0 M1M1 YT M2M2 M3M3

40 Let’s try a harder case If FIFO order is what we need, will this code be correct if g.Send() is used for the M x and YT multicasts? SPQRSPQR M0M0 M1M1 YT M2M2 M3M3

41 Let’s try a harder case … No. Consider this example. g.Send() could reach R before it reaches P and Q. This creates a race condition, and M 2 and M 3 might be received out of order. SPQRSPQR M0M0 M1M1 YT M2M2 M3M3

42 Let’s try a harder case …What if we use g.Send(), but insert a call to g.Flush() before sending the YT multicasts? SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

43 Let’s try a harder case … yes. This version works because M 1 M 2 are already stable before the YT multicast is sent. But by waiting before sending YT it could be slow… the flush waits for acknowledgements SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

44 Let’s try a harder case …What if we use g.Send(), but insert a call to g.Flush after receiving the YT multicasts? SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

45 Let’s try a harder case … yes! The g.Flush() waits for unstable multicasts to finish, and we know that R has seen M 0 M 1 YT. So those finish, and every process receives M 2 M 3 after M 0 M 1 SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

46 Another hard case Suppose a group manages red objects and blue objects, and they have nothing to do with one-another. Moreover, suppose there is a single member that sends red updates, and a different single member that sends blue updates. A.We can use g.Send() for the updates B.We should use g.OrderedSend() for updates

47 Another hard case Suppose a group manages red objects and blue objects, and they have nothing to do with one-another. Moreover, suppose there is a single member that sends red updates, and a different single member that sends blue updates. A.We can use g.Send() for the updates B.We should use g.OrderedSend() for updates g.Send() will have one order for red updates (based on the FIFO send order) and another one for blue. Thus updates may interleave in different orders at different members, but this won’t matter because the group treats red and blue objects as being independent.

48 Another hard case Same setup, but now suppose we also need to do a green operation that looks at both red and blue objects and computes some sort of aggregate statistic over the full set. A.We can still use g.Send() for the updates, but should use g.OrderedSend() for the green operations. B.We should use g.OrderedSend() for everything

49 Another hard case Same setup, but now suppose we also need to do a green operation that looks at both red and blue objects and computes some sort of aggregate statistic over the full set. A.We can still use g.Send() for the updates, but should use g.OrderedSend() for the green operations. B.We should use g.OrderedSend() for everything This won’t work because OrderedSend isn’t ordered relative to Send. Thus the green operations could “see” an inconsistent state in which the red and blue updates are only partially completed at some members If every operation uses OrderedSend, the green operations see some well defined single state consistent across the entire group.

50 Which is the most efficient? OrderedSend() is slower than Send() in this setting Flush() by the sender is faster than by a receiver. – The sender is first to learn that multicasts are stable. – Receiver needs to wait until the sender reports stability. Best solution: call Flush before sending YT Mystery: Will the new sender delay waiting to learn about stability of the prior YT? – Yes if the code that sends the multicasts is slow enough, but perhaps not if the code is very fast – May need to experiment to find out!

51 Visualizing the flush delay The acks and the stability report are shown by green dashes here. The g.Flush() needs to wait for that stability report. – At the sender, this happens as fast as possible… SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

52 Visualizing the flush delay At the receiver in this example, a call to g.Flush() needs to wait for a stability report from the prior sender… – So this could be slower SPQRSPQR M0M0 M1M1 YT M2M2 M3M3 flush

53 Checkpoints True or false: An Isis 2 checkpoint is used to – Initialize a joining group member – Maintain the group state even across failures in which all group members crash or shut down

54 Checkpoints True. An Isis 2 checkpoint is used to – Initialize a joining group member (state transfer) – Maintain the group state even across failures in which all group members crash or shut down. For this you must call g.Persistent(filename), specifying a file in which the state can be stored.

55 Checkpoints True. This is important to keep in mind when using a secured Isis 2 process group. g.SetSecure() encrypts data on the wire, but not in memory within the group members, and any checkpoints stored to disk will be unencrypted. Isis 2 assumes that the operating system security model is adequate to protect the contents of the checkpoint file.

56 Checkpoints for State Transfer True or false. Isis 2 prevents race conditions relative to group multicasts that might be occurring just as a member joins. This ensures that the state in a state transfer has every relevant update.

57 Checkpoints for State Transfer True. Isis 2 finishes pending multicasts, then creates a checkpoint for the state transfer. A joining member loads the state transfer before receiving messages in the new view. SPQRSPQR Even in a very busy group, exactly the right data is included in the state transfer: every prior update is represented, and any future update will reach the new member. The virtual synchrony model gives these terms mathematically rigorous meaning!

58 Checkpoints for State Transfer True or false. With Paxos, the same sort of reconfiguration behavior is provided by the so-called “dynamic membership” feature.

59 Checkpoints for State Transfer False. The classic Paxos protocols do include a dynamic membership option, but messages sent in a past view might become committed in a future view. A recent “virtually synchronous Paxos” model merges the two approaches. Isis 2 implements that model.

60 Relative speed True or false. The g.SafeSend() protocol requires 2 phases for delivery – In the first phase messages are written to logs, which might be stored on disk and members vote on message ordering. – In the second phase, ordered, durable delivery occurs. – In a hidden background step, low-level acknowledgements form a kind of third phase.

61 Relative speed True. Like the Paxos protocol on which Safesend is based, the Isis 2 protocol is fairly expensive and behaves like a 2-phase commit. It should only be used when a group manages replicated state that lives outside the system, perhaps in replicas of an ACID database. SPQRSPQR Messages are stored in a log during the first phase. This causes brief delays, while waiting for the disk I/O Ordered delivery occurs in the second phase Low-level messages are acknowledged in the background, but this is normally not visible to users

62 Relative speed True or false. Other versions of Paxos don’t cost remotely so much, and they don’t have disk based logs. So the Isis 2 version is unusually expensive.

63 Relative speed True. By default, the g.SafeSend log is maintained in memory, but this is not a completely safe option: if the group experiences a total failure, those in-memory logs are lost and hence durability is not maintained. To ensure durability across total failures, the data must be stored on disk. Use the “g.SafeSend disklogger durability method” for this purpose.

64 Relative speed True or false. g.OrderedSend() is basically just as costly as g.SafeSend(), especially if g.Flush() is called before interacting with external users.

65 Relative speed False. g.OrderedSend() is much cheaper, even with g.Flush() used in the recommended way. In fact g.OrderedSend() is often as fast as g.Send(), which is depicted in the graphs below

66 Understanding OrderedSend With a single sender in a group, OrderedSend is mapped directly to Send. So the speed is exactly as on that graph. With two or more senders, incoming messages get queued, and then the leader sends a SETORDER message, using Send. – Each SETORDER can order many OrderedSends – As soon as it arrives those OrderedSends are dequeued and delivered. The steady-state speed will be similar to Send but a little slower than what was seen on the graph. – This is a continuous concurrent process in applications that send steady rates of multicasts.

67 Relative speed True or false. g.OrderedSend(): 1.Delivers “optimistically”, meaning during the first phase, as soon as the ordering is determined 2.Doesn’t log messages to disk 3.With a single sender, just runs g.Send(). The more costly ordering decision is only active if there are two or more senders running during some single view of the group

68 Relative speed True. g.OrderedSend(): 1.Delivers “optimistically”, meaning during the first phase, as soon as the ordering is determined. This is why g.Flush(k) is recommended before interacting with an external user. 2.Doesn’t log messages to disk. No replay of past messages is available. SafeSend maintains a permanent record. 3.With a single sender, just runs g.Send(). The more costly ordering decision is only active if there are two or more senders running during some single view of the group. The protocol counts senders and changes modes if more than one member sends.

69 Relative speed True or false. It is always a good idea to try and replace g.OrderedSend() with g.Send()

70 Relative speed False. Although g.Send() is faster when there are multiple senders (the case in which g.OrderedSend must pick an ordering and tell everyone what it is), g.OrderedSend() always delivers every message in the same order at all members. If you replace g.OrderedSend with g.Send but actually needed identical ordering, your code will be inconsistent. Thus this change must be made carefully.

71 Relative speed True or false. With the g.SafeSend() protocol, if you plan to use it in the soft-state tier, you can configure it to keep the message logs in memory rather than on disk. But for use with some kind of external database replicas (e.g. if the messages are database updates) you need to configure SafeSend to use a disk-based log.

72 Relative speed False. Paxos is poorly understood even by many practitioners who use prebuilt Paxos implementations in their systems. In fact Paxos always has the kind of structure seen in the g.SafeSend() protocol. The Isis 2 protocol is a faithful implementation of Paxos.

73 g.SafeSend() By replicating a file or database and using g.SafeSend(), one can create identical replicas that will be correct even if a total failure of the group occurs, and then a restart. True or false: To use this option, the application must ensure that there is a way to determine which updates were applied to a recovering database replica.

74 g.SafeSend() True. When restarting, Isis 2 will replay the logged SafeSend messages. We want them applied to the database exactly once, hence must somehow check to see if these are repeats of operations the database processed prior to crashing.

75 g.SafeSend() By replicating a file or database and using g.SafeSend(), one can create identical replicas that will be correct even if a total failure of the group occurs, and then a restart. This option is valuable even in the “first” or “soft state” tier of the cloud.

76 g.SafeSend() False. The g.SafeSend() multicast is only used when building a durable replicated database with Isis 2, and isn’t meaningful in the so-called soft-state tier of the cloud. In the soft state tier of the cloud, a failure resets the state of the failed node. Any local files or databases are restored to the original state from when the VM image was created.

77 Soft State Tier True or False: The outer tier of the cloud is said to be the soft-state or tier-one layer. Programs in this layer cannot maintain values in variables or in files, and must be purely functional.

78 Soft State Tier False: The term soft-state doesn’t mean that a program can’t have variables or local files. In fact these programs are just like any others and they run in VM images that include file systems. The issue is that if the cloud shuts such a node down, the state is erased. When the node restarts it is always restored to the intial state that was originally in the VM.

79 Views and Multicasts Isis 2 delivers new View events and multicasts on the same single thread (per group). Thus if this thread blocks for any reason (on a lock, or a Semaphore, or even by calling Sleep or doing a file I/O) no other group events can occur until the thread unblocks

80 Views and Multicasts True. A common mistake made by new users of Isis 2 is to write code that blocks or even simply runs very slowly in response to an incoming View or multicast. If you need to do something slow, fork off a new thread. That will maintain “liveness” of the system.

81 Views and Multicasts If processes print the current group View when they receive a multicast, every receiver of the multicast will print the identical View.

82 Views and Multicasts True. In the Isis 2 virtual synchrony model, multicasts are “totally ordered” relative to Views. For example, if process P receives a multicast M when the group membership is, {P,Q,R}, we know that Q and R will also receive M, and that they will see group view {P,Q,R} too.

83 Views and Point-to-Point Messages Suppose that an application uses the Isis 2 unicast options: P2PSend or P2PQuery. Then these also are ordered relative to views

84 Views and Point-to-Point Messages True. P2PSend or P2PQuery have a sender, and Isis 2 guarantees that the sender will still be a group member when the P2P message delivery occurs. Thus if process S crashes and process P sees the new view, P is certain to never receive a multicast or a P2P message from S in the future. In Isis 2, a process never receives messages “from the dead”.

85 Other options True or false: g.RawSend() is FIFO ordered, but doesn’t attempt to recover lost messages.

86 Other options True. g.RawSend() is FIFO ordered, but doesn’t attempt to recover lost messages. In the event of a network packet loss, a gap can occur. But if M 0 was sent before M 2, and Isis 2 delivers M 2, it will never deliver M 0 later even if a copy somehow shows up “late”.

87 Other options True or false: g.CausalSend() has an ordering guarantee that can be described as the transitive closure of the FIFO property.

88 Other options True. g.CausalSend() guarantees that if M was sent before M’, by any group member, then M’ will be delivered after M. This is a version of Lamport’s happens before relationship, sometimes written M  M’.


Download ppt "Isis 2 guarantees Did you understand how Isis 2 ordering and durability work?"

Similar presentations


Ads by Google