A Torus Quorum Protocol for Distributed Mutual Exclusion A Torus Quorum Protocol for Distributed Mutual Exclusion S.D. Lang and L.J. Mao School of Computer Science, University of Central Florida Presented by Jun Zhang
The Problem of Mutual Exclusion Exclusive access to a shared resource is sometimes essential for consistency. Exclusive access to a shared resource is sometimes essential for consistency. The problem of mutual exclusion involves granting exclusive access to shared resources whenever it is desired by a requestor. The problem of mutual exclusion involves granting exclusive access to shared resources whenever it is desired by a requestor.
Two Major Approaches Token-based Token-based The system uses a unique token to represent the privilege for the node possessing it to enter the critical section. The system uses a unique token to represent the privilege for the node possessing it to enter the critical section. Permission-based Permission-based A node wishing to enter the critical section needs to request permissions from a subset of other nodes, called a request set, also called quorum. A node wishing to enter the critical section needs to request permissions from a subset of other nodes, called a request set, also called quorum.
Intuition behind Quorums Any pair of mutual exclusion requests must be arbitrated and one of the requesting nodes may be given access. Any pair of mutual exclusion requests must be arbitrated and one of the requesting nodes may be given access. The system comprises entirely of identical nodes which must share the responsibility of mutual exclusion The system comprises entirely of identical nodes which must share the responsibility of mutual exclusion Thus, any pair of two requests must reach a certain common node. Thus, any pair of two requests must reach a certain common node. This implies that the quorums of any two nodes i and j, given by Q i and Q j, must have a non-empty intersection. This implies that the quorums of any two nodes i and j, given by Q i and Q j, must have a non-empty intersection.
Algorithm Intuition If node i can lock all members of its quorum, then no other node can capture all its members since the intersection of its quorum with that of i’s will have at least one node. If node i can lock all members of its quorum, then no other node can capture all its members since the intersection of its quorum with that of i’s will have at least one node. If a node fails to capture all its members, it waits till all of them are freed to lock them. If a node fails to capture all its members, it waits till all of them are freed to lock them. To prevent deadlocks, nodes get a priority based on the timestamp of their request. To prevent deadlocks, nodes get a priority based on the timestamp of their request.
Grid Quorum
Quorum size W + H - 1
Torus Grid Definition 1. A torus grid of size h x w consists of a rectangular array of h rows and w columns, in which for 1 ≤ i ≤ h -1, row i is followed by row i + 1, and row h is followed by row 1 using wraparound. Similarly, for 1 ≤j ≤ w - 1, column j is followed by column j + 1, and column w is followed by column 1 using wraparound.
Torus Quorum Definition 2. A torus quorum in an h x w torus grid is a set of w + floor( h / 2) nodes, consisting of one entire row (say, row j), plus one node out of each of the (h / 2) rows following row j using end wraparound. We call the row portion of a quorum its head, and the portion consisting of one node from each h / 2 succeeding rows the quorum's tail.
Properties of Torus Quorums (Non-empty intersection) Any two torus quorums have a non-empty intersection. (Non-empty intersection) Any two torus quorums have a non-empty intersection. (Minimality) No quorum is a proper subset of another quorum. (Minimality) No quorum is a proper subset of another quorum. (Equal-sized) All torus quorums have the same size w + floor (h / 2). (Equal-sized) All torus quorums have the same size w + floor (h / 2). (Equal-responsibility) Each node belongs to exactly (w+ floor (h / 2) ) w floor(h/2)-1 quorums. (Equal-responsibility) Each node belongs to exactly (w+ floor (h / 2) ) w floor(h/2)-1 quorums. (Optimal quorum size) The optimal (minimum) quorum size is approximately sqrt(2n), which is obtained by choosing h = 2w. (Optimal quorum size) The optimal (minimum) quorum size is approximately sqrt(2n), which is obtained by choosing h = 2w.
Analysis of Property 4 For any given node (i, j), 1 ≤ i ≤ h, 1 ≤ j ≤ w For any given node (i, j), 1 ≤ i ≤ h, 1 ≤ j ≤ w The number of quorums whose head is i (it must contains node (i, j)) = w floor( h / 2) (1) w floor( h / 2) (1) The number of quorums whose head is not i but contains node (i, j) = floor (h / 2) * w floor( h / 2) – 1 (2) floor (h / 2) * w floor( h / 2) – 1 (2) Add (1) and (2), we got w * w floor( h / 2) -1 + floor (h / 2) * w floor( h / 2) – 1 = ( w + floor (h / 2)) * w floor( h / 2) – 1
Reliability Definition 3. For a torus grid of size h x w with the node's reliability p, let T (h, w) denote the system's availability, i.e., the probability that a torus quorum exists. For a torus grid of size h x w with the node's reliability p, let T (h, w) denote the system's availability, i.e., the probability that a torus quorum exists. Similarly, for a non-wraparound rectangular grid of size h x w, let T1(h, w, j, k) denote the probability that there exists a non-wraparound torus quorum of head length w and tail length k, assuming the bottom j rows of the grid are all in a state of at least one up node per each row. Similarly, for a non-wraparound rectangular grid of size h x w, let T1(h, w, j, k) denote the probability that there exists a non-wraparound torus quorum of head length w and tail length k, assuming the bottom j rows of the grid are all in a state of at least one up node per each row.
Reliability (Cont.)
Availability Comparison
Conclusion Torus quorums provide an equal-sized and equal-responsibility coterie; thus, they have the potential to lead to a more balanced load in distributed system control. Torus quorums provide an equal-sized and equal-responsibility coterie; thus, they have the potential to lead to a more balanced load in distributed system control.
Thank you