Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactions with Nested Parallelism

Similar presentations


Presentation on theme: "Transactions with Nested Parallelism"— Presentation transcript:

1 Transactions with Nested Parallelism
(Adding Transactions to Cilk) Kunal Agrawal, Jeremy T. Fineman, and Jim Sukha MIT CSAIL TRANSACT August 16, 2007

2 A Sample Cilk Program int supply1[10000]; int supply2[10000]; int N = 4; cilk int main() { spawn buy_computer(supply1); spawn buy_computer(supply2); sync; return 0; } cilk void buy_computer(int* c) { i = rand() % (10000–N); for (j = 0; j < N; j++) spawn buy_part(c, i+j); sync; } cilk void buy_part(int* c, int i) { c[i]--; } A Cilk program which updates inventory after buying two computers. Purchasing a computer decrements parts from the inventory arrays, supply1 and supply2, potentially in parallel.

3 Cilk Runtime and Performance
4 workers threads int supply1[10000]; int supply2[10000]; int N = 4; cilk int main() { spawn buy_computer(supply1); spawn buy_computer(supply2); sync; return 0; } cilk void buy_computer(int* c) { i = rand() % (10000–N); for (j = 0; j < N; j++) spawn buy_part(c, i+j); sync; } cilk void buy_part(int* c, int i) { c[i]--; } T1 = 16, T∞ = 5, P=4 The Cilk runtime schedules the program using work-stealing. Cilk executes a computation with work T1 and span (critical path) T∞ on P processors in time O(T1 / P + T∞), w.h.p.

4 Transactions in Cilk? int supply1[10000]; int N = 4; cilk int main() { spawn buy_computer(supply1); //X1 spawn buy_computer(supply1); //X2 sync; return 0; } cilk void buy_computer(int* c) { atomic { i = rand() % (10000–N); for (j = 0; j < N; j++) spawn buy_part(c, i+j); sync; } } cilk void buy_part(int* c, int i) { c[i]--; } 4 workers X1 X2 Suppose both computers require parts from the same inventory. Can we use transactions to ensure both calls to buy_computer() are atomic, even though X1 and X2 contain nested parallelism?

5 Transactions with Nested Parallelism and Nested Transactions?
int supply1[10000]; int N = 4; cilk int main() { spawn buy_computer(supply1); //X1 spawn buy_computer(supply1); //X2 sync; return 0; } cilk void buy_computer(int* c) { atomic { for (j = 0; j < N; j++) { i = rand() % 10000; spawn buy_part(i); } sync; } } cilk void buy_part(int* c, int i) { atomic { c[i]--; } } 4 workers X1 X2 Suppose each computer can use more than one of the same part. Can we have parallel transactions nested inside X1 and X2?

6 Motivation: Library Functions
int supply1[10000]; int N = 4; cilk int main() { spawn buy_computer(a); // X1 spawn buy_computer(a); // X2 sync; return 0; } cilk void buy_computer(int* c) { atomic { spawn foo(c); sync; } } ??? X1 ??? X2 If transactions can have nested parallelism and nested transactions, then we can composably call some library functions written using Cilk inside a transaction without knowing their exact implementation. cilk void foo(int* c) { ??? // spawn? }

7 XCilk Design We describe XCilk, a theoretical design for a software transactional memory system for Cilk which supports transactions with nested parallelism and nested transactions, both of unbounded nesting depth. XCilk uses an XConflict data structure to efficiently check for transaction conflicts. XCilk lazily cleans up memory locations on aborts.

8 XCilk Bounds on Overhead
XCilk provides a provable bound on the overhead of TM in the following special case: For a computation with no transaction conflicts and no concurrent readers to a shared memory location, if the computation has work T1 and critical path T∞, XCilk executes the computation on P processors in time Linear speedup if P = O(√(T1/T∞)), vs. O(T1/T∞) for normal Cilk. O(T1 / P + PT∞). The XCilk runtime system still works correctly in the general case, with conflicts and parallel readers.

9 Outline Definition of Conflict in XCilk Efficient XConflict Queries

10 Summary of XCilk Semantics
XCilk performs eager conflict detection. Transactions in XCilk are closed-nested. These two conditions imply a prefix race-free execution [ALS06]. If the effects of aborted transactions can be “ignored”, then prefix race-freedom ≈ serializability.

11 XCilk Computation Tree
XCilk builds a computation tree as a transactional program executes. A program begins with a single root node (X0). X0 1 3 2 3 workers P1 S1 S2 X1 P2 P3 S3 S4 S5 S6 X2 Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 v2

12 XCilk Computation Tree (spawn)
XCilk builds a computation tree as a transactional program executes. A program begins with a single root node (X0). X0 1 3 2 3 workers P1 S1 S2 A spawn creates a P-node (P1) with two S-nodes (S1, S2) as children. The worker then starts executing the left child. X1 P2 P3 S3 S4 S5 S6 X2 Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 v2

13 XCilk Computation Tree (steal)
XCilk builds a computation tree as a transactional program executes. A program begins with a single root node (X0). X0 1 3 2 3 workers P1 S1 S2 A spawn creates a P-node (P1) with two S-nodes (S1, S2) as children. The worker then starts executing the left child. X1 P2 P3 S3 S4 S5 S6 X2 In XCilk, as in Cilk, a worker can steal an S-node (S2) from the deque of another worker. Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 v2

14 XCilk Computation Tree (xbegin)
XCilk builds a computation tree as a transactional program executes. A program begins with a single root node (X0). X0 1 3 2 3 workers P1 S1 S2 A spawn creates a P-node (P1) with two S-nodes (S1, S2) as children. The worker then starts executing the left child. X1 P2 P3 S3 S4 S5 S6 X2 In XCilk, as in Cilk, a worker can steal an S-node (S2) from the deque of another worker. Y1 P4 P5 u1 S8 S9 S10 S11 An xbegin creates a transactional S-node (X1) Y2 P6 Z1 u2 v1 S12 S13 Z2 v2

15 XCilk: Readsets and Writesets
W(X0)={L} Conceptually, every transaction X maintains a set of locations that the transaction read from (the readset R(X)), and a set of locations that the transaction as written to (the writeset W(X)).* The root of the tree represents the world; we assume the writeset of the root contains a value for all memory locations L. When a memory operation u1 on a location L occurs, it reads the value from the closest ancestor transaction with L in its readset. 1 3 2 3 workers P1 S1 S2 W(X1)= X1 P2 P3 S3 S4 S5 S6 X2 W(Y1)= Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 *We assume W(X) is a subset of R(X). v2

16 XCilk: write X0 W(X0)={L} Conceptually, every transaction X maintains a set of locations that the transaction read from (the readset R(X)), and a set of locations that the transaction as written to (the writeset W(X)).* The root of the tree represents the world; we assume the writeset of the root contains a value for all memory locations L. When a memory operation u1 on a location L occurs, it reads the value from the closest ancestor transaction with L in its readset. 1 3 2 3 workers P1 S1 S2 W(X1)= X1 P2 P3 S3 S4 S5 S6 X2 W(Y1)= {L} Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 u1: write to L Z2 v2

17 XCilk: xend An xend commits a transaction. With closed nesting, when a transaction (Y1) commits, it conceptually merges its readset and writeset into the readset/writeset of its transactional parent (X1). X0 W(X0)={L} 1 3 2 3 workers P1 S1 S2 W(X1)={L} X1 P2 P3 S3 S4 S5 S6 X2 W(Y1)= {L} Y1 P4 P5 u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 v2

18 XCilk: Conflicting write
W(X0)={L} If v1 tries to write to L, XCilk detects a conflict, because X1 is not an ancestor of v1. X0 1 3 2 3 workers P1 S1 S2 W(X1)={L} X1 P2 P3 S3 S4 Since v1 conflicts with X1, XCilk can choose to abort Z1 immediately. S5 S6 X2 Y1 P4 P5 Alternatively, XCilk can also signal an abort of X1, wait for worker 1 to notice, and then finish v1. u1 S8 S9 S10 S11 Y2 P6 Z1 u2 v1 S12 S13 Z2 Conflict v2

19 Invariant: Conflict-Free Execution
XCilk performs eager conflict detection, and guarantees that the execution is always conflict-free. At any time, for any given memory location L: 1 Inactive Aborted Write to L Read from L All active transactions that have L it their writeset fall along a chain. All active transactions with L in their readset are either along the chain or are descendants of the end of the chain. P S S P P S S S S P S S P X* S S 2

20 Conflicts with the Last Writer to L
In the case where no transactions abort, XCilk reduces conflict detection to queries checking for conflicts against the id of the last transaction to write to L. Let Y be the last transaction which last wrote to L. X* must be an ancestor of Y, i.e., Y has “merged” into X* because of transaction commits. Inactive Write to L Read from L P S S P P If a transaction Z wants to perform a read from L, S S S S Z2 P Determine if X* is an ancestor of Z. If no, report a conflict. S S P X* S S Y Z1

21 The XConflict Oracle XConflict(Y,Z) answers this query:
XConflict(*,Z1) 1 Y0 XConflict(Y,Z) answers this query: P Inactive For any running node Z, S S 1 Y1 2 int XConflict_Oracle(Y, Z) { X ← Y’s closest active ancestor transaction if (X is an ancestor of Z) return “no conflict” else return “conflict”; } P P S S S S Y2 Y4 1 1 2 2 2 Z3 Z4 Y5 P 1 1 Y6 2 S S P P Y3 1 1 1 S S S S 1 1 1 1 2 Case 1: No conflict Case 2: Conflict Y7 Z1 Z2 The XConflict data structure is able to answer the query of XConflict_Oracle in O(1)-time.

22 Outline Definition of Conflict in XCilk Efficient XConflict Queries

23 Sources of Overhead in XCilk
Assume we keep a history of accesses to each memory location L. The overhead in XCilk comes from two sources: Updates to the XConflict data structure / histories after a transaction Y commits. Queries to XConflict to check for conflicts on (potentially) every memory access.

24 Explicit Merges on Commit
Option 1: Explicit Merge When we commit a transaction Y into its parent X, for every location L in Y’s readset and writeset, we change the id from Y to X in L’s history. Advantage: Faster queries. On a query, the last transaction to write to L is always active. X1 X2 Xd-2 Xd-1 Xd

25 Slow Commit with Explicit Merging
W(X2)= {L0, L1, L2, L3 … Ld-1, Ld} Explicitly merging writesets immediately on transaction commits can blow up work by a factor proportional to the nesting depth. For example, consider a chain of nested transactions with depth d, with each transaction Xi accessing a different memory location Li. O(d) X1 W(X2)= {L1, L2, L3 … Ld-1, Ld} O(d-1) X2 W(X2)= {L2, L3 … Ld-1, Ld} . Xd-2 W(Xd-2)= {Ld-2, Ld-1, Ld} O(2) Xd-1 W(Xd-1)={Ld-1, Ld} No nesting: O(d) work. Closed nesting: O(d2) work. O(1) Xd W(Xd)= {Ld}

26 Implicit Merges Option 2: Implicit Merge
a Y0 Option 2: Implicit Merge Inactive P Implicitly merge Y into its closest active transactional ancestor. On commit, do nothing to histories for L. S S b Y1 g Y8 P P S S S S Y2 Y4 Advantage: Fast updates. b c g h i Z3 Z4 Y5 P c d Y6 g S S P P Y3 b b b S S S S d d d e f Y7 Z1 Z2 Sets a through i represent groups of transactions which have merged together.

27 Implicit Merges with Slow Queries?
a Y0 Option 2: Implicit Merge Inactive P Implicitly merge Y into its closest active transactional ancestor. On commit, do nothing to histories for L. S S b Y1 g Y8 P P S S S S Y2 Y4 Advantage: Fast updates. b c g h i Z3 Z4 Y5 P c d Y6 g Disadvantage: Slow queries? S S P P If the XConflict query must walk up the tree to determine which transaction Y has merged into, the query might require W(d) time. Y3 b b b S S S S d d d e f Y7 Z1 Z2 XCilk potentially performs an XConflict query on every memory access. Therefore, we need queries to take O(1) time!

28 The XConflict Query For any running node Z,
XConflict(*,Z1) 1 Y0 For any running node Z, P Inactive int XConflict_Oracle(Y, Z) { X ← Y’s closest active ancestor transaction if (X is an ancestor of Z) return “no conflict” else return “conflict”; } S S 1 Y1 2 P P S S S S Y2 Y4 1 1 2 2 2 Z3 Z4 Y5 P 1 1 Y6 2 S S P P Y3 1 1 1 S S S S Case 1: No conflict Case 2: Conflict 1 1 1 1 2 Y7 Z1 Z2 XConflict can run in O(1) time because it does not always need to find X to answer the oracle query.

29 Trace Construction: Updates
XCilk speeds up XConflict queries, by dividing the computation tree into traces, with each trace executed by a single worker.* X0 1 3 2 3 workers P1 S1 S2 In Cilk, traces are created by splitting on steals. X1 P2 P3 S3 S4 # traces = O(# steals) = O(PT∞) S5 S6 X2 Thus, we can afford to acquire a global lock on steals, and perform O(1) amortized work per trace. Y1 P5 P4 S8 S9 S10 S11 Nested transactions are merged together by merging traces together when traces complete. P6 Y2 Z1 S12 S13 *Trace construction is similar to the construction in [BFGL04, Fineman05], used for parallel race detection in Cilk. Z2

30 Trace Construction: Queries
XCilk speeds up XConflict queries, by dividing the computation tree into traces, with each trace executed by a single worker.* An XConflict query involves a constant number of O(1)-time operations at two tiers: a global tier (queries between traces), and a local tier (between tree nodes). X0 1 3 2 3 workers P1 S1 S2 X1 P2 P3 S3 S4 S5 S6 X2 When there are no transaction conflicts and no parallel readers to the same memory location, XCilk performs (at most) one XConflict query per memory access. Y1 P5 P4 S8 S9 S10 S11 P6 Y2 Z1 S12 S13 *Trace construction is similar to the construction in [BFGL04, Fineman05], used for parallel race detection in Cilk. Z2

31 XCilk Performance Bound
In the special case of a computation with no transaction conflicts and no concurrent readers to a shared memory location, XCilk performs (at most) one O(1)-time XConflict query per memory access. Maintaining the XConflict data structure introduces overhead of O(T1 / P + PT∞). Therefore, the entire program runs in time O(T1 / P + PT∞). Linear speedup if P = O(√(T1/T∞)), compared to P =O(T1/T∞) for normal Cilk.

32 Open Questions Can we provide any performance guarantees on programs which are conflict-free, but also allowing parallel reads to the same location? What if there are conflicts? Can we “garbage-collect” XCilk’s metadata (e.g., transaction ids) in a provably-efficient manner? Can we simplify the XCilk data structures in special but possibly “common” cases? For example, what if the nesting depth is bounded by d?


Download ppt "Transactions with Nested Parallelism"

Similar presentations


Ads by Google