Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal.

Similar presentations


Presentation on theme: "November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal."— Presentation transcript:

1 November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal

2 November 22, 2007 Vincent Gramoli2/61 My Thesis Implementing a distributed shared memory for large-scale dynamic systems

3 November 22, 2007 Vincent Gramoli3/61 My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY,

4 November 22, 2007 Vincent Gramoli4/61 My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT,

5 November 22, 2007 Vincent Gramoli5/61 My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT, DOABLE!

6 November 22, 2007 Vincent Gramoli6/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

7 November 22, 2007 Vincent Gramoli7/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

8 November 22, 2007 Vincent Gramoli8/61 Distributed Systems Enlarge Internet explosion IPv4 -> IPv6 Multiplication of personal devices 17 billions of network devices by 2012 (IDC prediction) Internet

9 November 22, 2007 Vincent Gramoli9/61 Distributed Systems are Dynamic Independent computational entities act asynchronously, and are affected by unpredictable events (join/leaving). These sporadic activities make the system dynamic

10 November 22, 2007 Vincent Gramoli10/61 Massively Accessed Applications WebServices use large information –eBay: Auctioning service –Wikipedia: Collaborative encyclopedia –LastMinute: Booking application …but require too much power supply and cost too much modify (article) reserve (tickets) increase (auction)

11 November 22, 2007 Vincent Gramoli11/61 Massively Distributed Applications Peer-to-Peer applications share resources –BitTorrent: File Sharing –Skype: Voice over IP –Joost: Video Streaming …but prevent large-scale collaboration. copy exchange create

12 November 22, 2007 Vincent Gramoli12/61 Filling the Gap is Necessary Providing distributed applications where entities (nodes) can fully collaborate P2Pedia: using P2P to built a collaborative encyclopedia P2P eBay: using P2P as an auctioning service

13 November 22, 2007 Vincent Gramoli13/61 There are 2 Ways of Colaborating Using a Shared Memory –A node writes information in the memory –Another node reads information from the memory Using Message Passing –A node sends a message to another node –The second node receives the message from the other Memory Node 1 Node 2 Write v Read v Node 3 Node 1 Node 2 Node 3 Send v Recv v

14 November 22, 2007 Vincent Gramoli14/61 Shared Memory is Easier to Use Shared Memory is easy to use –If information is written, collaboration progresses! Message Passing is difficult to use –To which node the information should be sent?

15 November 22, 2007 Vincent Gramoli15/61 Message Passing Tolerates Failures Shared Memory is failure-prone –Communication relies on memory availability Message-Passing is fault-tolerant –As long as there is a way to route a message Memory Node 1 Node 2 Write v Read v Node 3 Node 1Node 2 Node 3 Send v Recv v

16 November 22, 2007 Vincent Gramoli16/61 The Best of the 2 Ways Distributed Shared Memory (DSM) –emulates a Shared Memory to provide simplicity, –in the Message Passing model to tolerate failures. DSM read / write(v) operations read-ack(v) / write-ack

17 November 22, 2007 Vincent Gramoli17/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

18 November 22, 2007 Vincent Gramoli18/61 Our DSM Consistency: Atomicity Atomicity (Linearizability) defines an operation ordering: – If an operation ends before another starts, then it can not be ordered after – Write operations are totally ordered and read operations are ordered with respect to write operations – A read returns the last value written (or the default one if none exist)

19 November 22, 2007 Vincent Gramoli19/61 Quorum-based DSM Quorums: mutually intersecting sets of nodes Ex. 3 quorums of size q=2, with memory size m=3 Q1 Q2 Q3 Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Each node of the quorums maintains: – A local value v of the object – A unique tag t, the version number of this value Sharing memory robustly in message-passing systems H. Attiya, A. Bar-Noy, D. Dolev, JACM 1995

20 November 22, 2007 Vincent Gramoli20/61 Quorum-based DSM Read and write operations – A node i reads the object value v k by Asking v j and t j to each node j of a quorum Choosing the value v k with the largest tag t k Replicating v k and t k to all nodes of a quorum – A node i writes a new object value v n by Asking t j to each node j of a quorum Choosing a larger t n than any t j returned Replicating v n and t n to all nodes of a quorum Get Set Get Set t n = t k ++

21 November 22, 2007 Vincent Gramoli21/61 Quorum-based DSM Reading a value Q1 Q2 Q3 value? tag? v1,t1

22 November 22, 2007 Vincent Gramoli22/61 Quorum-based DSM Reading a value Q1 Q2 Q3 v1,t1

23 November 22, 2007 Vincent Gramoli23/61 Quorum-based DSM Reading a value Q1 Q2 Q3 Output: v1

24 November 22, 2007 Vincent Gramoli24/61 Quorum-based DSM Writing a value v2 Q1 Q2 Q3 Input: v2

25 November 22, 2007 Vincent Gramoli25/61 Quorum-based DSM Writing a value v2 Q1 Q2 Q3 max tag? t1

26 November 22, 2007 Vincent Gramoli26/61 Quorum-based DSM Writing a value v2 Q1 Q2 Q3 v2,t2 (with t2 > t1)

27 November 22, 2007 Vincent Gramoli27/61 Quorum-based DSM Works well in static system – Number of failures f must be f ≤ m - q Q1 Q2 Q3 Q1 ∩ Q2 ≠ Ø Q2 ∩ Q3 ≠ Ø – All operations can access a quorum

28 November 22, 2007 Vincent Gramoli28/61 Quorum-based DSM Does not work in dynamic systems –All quorums may fail if failures are unbounded Q1 Q2 Q3 Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø

29 November 22, 2007 Vincent Gramoli29/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

30 November 22, 2007 Vincent Gramoli30/61 Reconfiguring Dynamism produces unbounded number of failures Solution: Reconfiguration –Replacing the quorum configuration periodically Q1 Q2 Q3 Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø

31 November 22, 2007 Vincent Gramoli31/61 Agreeing on the Configuration All must agree on the next configuration –Quorum-based consensus algorithm: Paxos Before, a consensus block complemented the DSM service: Paxos, 3-phase leader-based algorithm 1.Prepare a ballot (2 message delays) 2.Propose a configuration to install (2 message delays) 3.Propagate the decided configuration (1 message delay) RAMBO: Reconfigurable Atomic Memory Service for Dynamic Networks N. Lynch, A. Shvartsman, DISC 2002

32 November 22, 2007 Vincent Gramoli32/61 RDS: Reconfigurable Distributed Storage RDS integrates consensus service into the reconfigurable DSM Fast version of Paxos: –Remove the first phase (in some cases) –Quorums also propagate configuration Ensuring Read/Write Atomicity: –Piggyback object information into Paxos messages Parallelizing Obsolete Configuration Removal: –Add an additional message to the propagate phase of Paxos

33 November 22, 2007 Vincent Gramoli33/61 Contributions Operations are fast (sometimes optimal) –1 to 2 message delays Reconfiguration is fast (fault-tolerance) –3 to 5 message delays While: –Operation atomicity and –Operation independence are preserved

34 November 22, 2007 Vincent Gramoli34/61 Facing Dynamism Reconfigurable Distributed Storage G. Chockler, S. Gilbert, V. Gramoli, P. Musial, A. Shvartsman Proceedings of OPODIS 2005

35 November 22, 2007 Vincent Gramoli35/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

36 November 22, 2007 Vincent Gramoli36/61 Facing Scalability is Difficult Problems: –Large-scale participation induces load –When load is too high, requests can be lost –Bandwidth resources are limited Goal: Tolerate load by preventing communication overhead Solution: A DSM that adapts to load variations and that restricts communication

37 November 22, 2007 Vincent Gramoli37/61 Using Logical Overlay Object replicas r 1, …, r k share a 2-dim coordinate space r1r1 r2r2 r3r3 r4r4 r5r5 r6r6 r7r7 r8r8 … … r k-1 rkrk

38 November 22, 2007 Vincent Gramoli38/61 Benefiting from Locality Each replica r i can communicate only with its nearest neighbors riri

39 November 22, 2007 Vincent Gramoli39/61 Reparing the Overlay Topology takeover mechanism If a node r i fails, a takeover node r j replaces it riri rjrj A Scalable Content-Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker SIGCOMM 2001

40 November 22, 2007 Vincent Gramoli40/61 Dynamic Bi-Quorums Bi-Quorums: –Quorums of two types where not all quorums intersect –Quorums of different types intersect Vertical Quorum: All replicas responsible of an abscissa x Horizontal Quorum: All replicas responsible of an ordinate y x y For any horizontal quorum H and any vertical quorum V: H  V ≠ Ø

41 November 22, 2007 Vincent Gramoli41/61 Operation Execution Read Operation: 1)Get up-to-date value and largest tag on a horizontal quorum, 2) Propagate this value and tag on a vertical quorum. Write Operation: 1)Get up-to-date value and largest tag on a horizontal quorum, 2) Propagate the value to write (and a higher tag) twice on the same vertical quorum

42 November 22, 2007 Vincent Gramoli42/61 Load Adaptation Thwart: requests follow the diagonal until a non- overloaded node is found. Expansion: A node is added to the memory if no non-overloaded node is found. Shrink: if underloaded, a node leaves the memory after having notified its neighbors.

43 November 22, 2007 Vincent Gramoli43/61 Contributions SQUARE is a DSM that: Scales well by tolerating load variations Defines load-optimal quorums (under reasonable assumption) Uses communication efficient reconfiguration

44 November 22, 2007 Vincent Gramoli44/61 Operation Latency Bad News: The operation latency increases with the load (request rate) Request rateMemory sizeRead Latency Write Latency 10010479733 12514622812 2502411321396 5004615012173 10009824083501

45 November 22, 2007 Vincent Gramoli45/61 Facing Scalability is Difficult P2P Architecture for Self-* Atomic Memory E. Anceaume, M. Gradinariu, V. Gramoli, A. Virgillito Proceedings of ISPAN 2005 SQUARE: Scalable Quorum-Based Atomic Memory with Local Reconfiguration V. Gramoli, E. Anceaume, A. Virgillito Proceedings of ACM SAC 2007

46 November 22, 2007 Vincent Gramoli46/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

47 November 22, 2007 Vincent Gramoli47/61 Probability for modeling Reality Motivations for Probabilistic Solutions: – Tradeoff prevents deterministic solutions efficiency – Allowing more Realistic Models Any node can fail independently Even if it is unlikely that many nodes fail at the same time

48 November 22, 2007 Vincent Gramoli48/61 What is Churn? Churn is the dynamism intensity! Dynamic System: –n interconnected nodes –Nodes join/leave the system –A joining node is new –Here, we model the churn simply as c : At each time unit, cn nodes leave the network At each time unit, cn nodes enter the network

49 November 22, 2007 Vincent Gramoli49/61 Relaxing Consistency Every operation verifies all atomicity rules with high probability! Unsuccessful operation: operation that violate at east one of those rules Probabilistic Atomicity: 1. If an operation Op1 ends before another Op2 starts, then it is ordered after with probability ε = e -β 2 (with β a constant) (If this happen, operation Op2 is considered as unsuccessful) 2. Write operations are totally ordered and read operations are ordered w.r.t. write operations 3. A read returns the last successfully value written (or the default one if none exist) with probability 1- e -β 2 (with β a constant) (If this does not hold, then the read is unsuccessful)

50 November 22, 2007 Vincent Gramoli50/61 Intersection is provided during a bounded period of time with high probability Gossip-based algorithm in parallel –Shuffle set of neighbors using gossip-based algorithm Traditional read/write operations using two message round-trip between the client and a quorum –Consult value and tag from a quorum –Create new larger tag (if write) –Propagate value and tag to a quorum TQS: Timed Quorum System

51 November 22, 2007 Vincent Gramoli51/61 Contacting a quorum -Disseminate message with TTL l to k neighbors, -Decrement TTL received if first time received. -Forward received messages to k neighbors if their TTL is not null. -So that at the end, we have #contacted nodes = with Δ, the max period of time between 2 successful operations TQS: Timed Quorum System

52 November 22, 2007 Vincent Gramoli52/61 Complexity of our Implementation Assumptions: –At least one operation succeeds every Δ time units –Gossip-based protocol provides uniformity Operation Time Complexity (in expectation): …where D = (1-c) - Δ is the dynamic parameter

53 November 22, 2007 Vincent Gramoli53/61 Complexity of our Implementation Operation Communication Complexity (in expectation): …where D = (1-c) - Δ is the dynamic parameter

54 November 22, 2007 Vincent Gramoli54/61 Complexity of our Implementation Operation Communication Complexity (in expectation): …where D = (1-c) - Δ is the dynamic parameter If D is a constant, then it reaches communication complexity of static systems presented in: Probabilistic Quorum Systems D. Malkhi, M. Reiter, A. Wool, R. Wright Information and Comp. J. 2001

55 November 22, 2007 Vincent Gramoli55/61 Probability of Success Quorum size Probability of non-intersecting 10% of failures 30% of failures 50% of failures 70% of failures 90% of failures n = 10,000

56 November 22, 2007 Vincent Gramoli56/61 Contributions TQS relies on timely and probabilistic intersections Operation latency is low Operation communication complexity is low No reconfigurations are needed Replication is inherently done by the operations Atomicity is ensured with high probability

57 November 22, 2007 Vincent Gramoli57/61 A DSM to face Scalability and Dynamism Core Persistence in Peer-to-Peer Systems: Relating Size to Lifetime V. Gramoli, A-.M. Kermarrec, A. Mostéfaoui, M. Raynal, B. Sericola Proceedings of RDDS 2006 (in conjunction with OTM 2006) Timed Quorum Systems for Large-Scale and Dynamic Environments V. Gramoli, M. Raynal Proceedings of OPODIS 2007

58 November 22, 2007 Vincent Gramoli58/61 RoadMap Necessary? Communicating in Large-Scale Systems An Example of Distributed Shared Memory Difficult? Facing Dynamism is not trivial Difficult? Facing Scalability is tricky too Doable? Yes, here is a solution! Conclusion

59 November 22, 2007 Vincent Gramoli59/61 Conclusion We have presented three DSM: –Dynamism: RDS –Scalability: SQUARE –Dynamism and Scalability: TQS

60 November 22, 2007 Vincent Gramoli60/61 Conclusion SolutionsLatencyCommunicatio n Guarantee RDSLowHighSafe SQUAREHighLowSafe TQSLow High Probability

61 November 22, 2007 Vincent Gramoli61/61 Open Questions Could we still speed up operations? –Disseminating continuously up-to-date values –Consulting values that have already been aggregated How to model dynamism? –Differing results for the P2P File-Sharing –What would it be for different applications?

62 November 22, 2007 Vincent Gramoli62/61 END

63 November 22, 2007 Vincent Gramoli63/61 Load Balancing Good News: The load is well-balanced over the replicas 63

64 November 22, 2007 Vincent Gramoli64/61 Load Adaptation Good News: The memory self-adapts well in face of dynamism 64

65 November 22, 2007 Vincent Gramoli65/61 Reconfigurable Distributed Storage Prepare phase: a)The leader creates a new ballot and sends it to quorums b)A quorum of nodes send back their candidate config. The leader chooses the configuration for the ballot 1.Propose phase: a)The leader sends the ballot and its config. to quorums The leader sends its tag and value and adds the current configuration b)A quorum of nodes can send their ballot vote, their tag and value to quorums These quorum nodes decide the next configuration 2.Propagate phase: a)These quorum nodes propagate the decided configuation to quorums b)These quorum nodes remove the old configuration 65 if not done already


Download ppt "November 22, 2007 Vincent Gramoli1/61 Distributed Shared Memory for Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal."

Similar presentations


Ads by Google