Presentation is loading. Please wait.

Presentation is loading. Please wait.

Load Balancing and Multithreaded Programming

Similar presentations


Presentation on theme: "Load Balancing and Multithreaded Programming"— Presentation transcript:

1 Load Balancing and Multithreaded Programming
Nir Shavit Multiprocessor Synchronization Spring 2003

2 How to write Parallel Apps?
Multithreaded Programming Programming model Programming language (Cilk) Well-developed theory Successful practice 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

3 Why We Care Interesting in its own right Scheduler
Ideal application for Lock-free data structures 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

4 Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} *Cilk Code (Java Code in Notes) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

5 Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Parallel method call 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

6 Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Wait for children to complete 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

7 Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Safe to use children’s values 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

8 Note Spawn & synch operators The scheduler Like Israeli traffic signs
Are purely advisory in nature The scheduler Like the Israeli driver Has complete freedom to decide 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

9 Dynamic Behavior Multithreaded program is A thread is
A directed acyclic graph (DAG) That unfolds dynamically A thread is Maximal sequence of instructions Without spawn, sync, or return 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

10 Fib DAG sync spawn fib(4) fib(3) fib(2) fib(1) 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

11 Arrows Reflect Dependencies
fib(4) sync spawn fib(3) fib(2) fib(2) fib(1) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

12 How Parallel is That? Define work: Define critical-path length:
Total time on one processor Define critical-path length: Longest dependency path Can’t beat that! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

13 Fib Work fib(4) fib(3) fib(2) fib(2) fib(1) fib(1) 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

14 Fib Work 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 work is 17 16 17 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

15 Fib Critical Path fib(4) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

16 Critical path length is 8
Fib Critical Path fib(4) 1 8 2 7 3 4 6 Critical path length is 8 5 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

17 Notation Watch TP = time on P processors
T1 = work (time on 1 processor) T∞ = critical path length (time on ∞ processors) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

18 Simple Bounds TP ≥ T1/P TP ≥ T∞ In one step, can’t do more than P work
Can’t beat infinite resources 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

19 More Notation Watch Speedup on P processors Linear speedup
Ratio T1/TP How much faster with P processors Linear speedup T1/TP = Θ(P) Max speedup (average parallelism) T1/T∞ 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

20 Remarks Graph nodes have out-degree ≤ 2 Unique Starting node
Ending node 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

21 Matrix Multiplication
30-Nov-18 M. Herlihy & N. Shavit (c) 2003

22 Matrix Multiplication
Each n-by-n matrix multiplication 8 multiplications 4 additions Of n/2-by-n/2 submatrices 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

23 Addition int add(Matrix C, Matrix T, int n) { if (n == 1) {
C[1,1] = C[1,1] + T[1,1]; } else { partition C, T into half-size submatrices; spawn add(C11,T11,n/2); spawn add(C12,T12,n/2); spawn add(C21,T21,n/2); spawn add(C22,T22,n/2) sync(); }} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

24 Addition Let AP(n) be running time For example For n x n matrix
on P processors For example A1(n) is work A∞(n) is critical path length 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

25 Addition Work is Partition, synch, etc 4 spawned additions
A1(n) = 4 A1(n/2) + Θ(1) Partition, synch, etc 4 spawned additions 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

26 Same as double-loop summation
Addition Work is A1(n) = 4 A1(n/2) + Θ(1) = Θ(n2) Same as double-loop summation 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

27 spawned additions in parallel
Critical Path length is A∞(n) = A∞(n/2) + Θ(1) spawned additions in parallel Partition, synch, etc 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

28 Addition Critical Path length is A∞(n) = A∞(n/2) + Θ(1) = Θ(log n)
30-Nov-18 M. Herlihy & N. Shavit (c) 2003

29 Multiplication int mult(Matrix C, Matrix A, Matrix B, int n) {
if (n == 1) { C[1,1] = A[1,1]·B[1,1]; } else { allocate temporary n·n matrix T; partition A,B,C,T into half-size submatrices; 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

30 Multiplication (con’t)
spawn mult(C11,A11,B11,n/2); spawn mult(C12,A11,B12,n/2); spawn mult(C21,A21,B11,n/2); spawn mult(C22,A22,B12,n/2) spawn mult(T11,A11,B21,n/2); spawn mult(T12,A12,B22,n/2); spawn mult(T21,A21,B21,n/2); spawn mult(T22,A22,B22,n/2) sync(); spawn add(C,T,n); }} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

31 8 spawned mulitplications
Multiplication Work is M1(n) = 8 M1(n/2) + A1(n) Final addition 8 spawned mulitplications 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

32 Same as serial triple-nested loop
Multiplication Work is M1(n) = 8 M1(n/2) + Θ(n2) = Θ(n3) Same as serial triple-nested loop 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

33 Half-size parallel multiplications
Critical path length is M∞(n) = M∞(n/2) + A∞(n) Final addition Half-size parallel multiplications 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

34 Multiplication Critical path length is M∞(n) = M∞(n/2) + A∞(n)
= M∞(n/2) + Θ(log n) = Θ(log2 n) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

35 Parallelism M1(n)/ M∞(n) = Θ(n3/log2 n)
To multiply two 1000 x 1000 matrices 10003/102=107 Much more than number of processors on any real machine 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

36 Shared-Memory Multiprocessors
Parallel applications Java Cilk, etc. Mix of other jobs All run together Come & go dynamically 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

37 Scheduling Ideally, In real life, User-level scheduler
Maps threads to dedicated processors In real life, Maps threads to fixed number of processes Kernel-level scheduler Maps processes to dynamic pool of processors 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

38 For Example Initially, Serial computation
All P processors available for application Serial computation Takes over one processor Leaving P-1 for us Waits for I/O We get that processor back …. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

39 Speedup Map threads onto P processes Cannot get P-fold speedup
What if the kernel doesn’t cooperate? Can try for PA-fold speedup PA is time-averaged number of processors the kernel gives us 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

40 8-processor Sun Ultra Enterprise 5000.
Static Load Balancing 8 7 6 5 speedup ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) 4 8-processor Sun Ultra Enterprise 5000. 3 2 1 1 4 8 12 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

41 Dynamic Load Balancing
8 7 6 ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) msort(32M) ray() 5 speedup 4 8-processor Sun Ultra Enterprise 5000. 3 2 1 1 4 8 8 12 12 16 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

42 Scheduling Hierarchy User-level scheduler Kernel-level scheduler
Tells kernel which processes are ready Kernel-level scheduler Synchronous (for analysis, not correctness!) Picks pi threads to schedule at step i Time-weighted average is: 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

43 Greed is Good Greedy scheduler Schedules as much as it can
At each time step 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

44 Theorem Greedy scheduler ensures actual time T ≤ T1/PA + T∞(P-1)/PA
30-Nov-18 M. Herlihy & N. Shavit (c) 2003

45 Proof Strategy Bound this! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

46 Put Tokens in Buckets Thread scheduled and executed Thread scheduled
but not executed work idle 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

47 At the end …. Total #tokens = work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

48 At the end …. T1 tokens work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

49 Must Show ≤ T∞(P-1) tokens work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

50 Every Move You Make … Scheduler is greedy At least one node ready
Number of idle threads in one step At most pi-1 ≤ P-1 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

51 Every Step You Take … Consider longest path in unexecuted sub-DAG at step i At least one node in path ready Length of path shrinks by at least one at each step Initially, path is T∞ So there are at most T∞ idle steps 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

52 Counting Tokens At most P-1 idle threads per step At most T∞ steps
So idle bucket contains at most T∞(P-1) tokens Both buckets contain T1 + T∞(P-1) tokens 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

53 Recapitulating 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

54 Turns Out This bound is within a constant factor of optimal
Actual optimal is NP-complete 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

55 Work Sharing Process generates new threads Migrate them elsewhere
In hopes of balancing the load 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

56 Work Stealing If a process runs out of work
It steals work from another If everyone busy, no migration Idle process incurs synchronization cost 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

57 Lock-Free Work Stealing
Each process has a pool of ready threads Remove thread without synchronizing If you run out of threads, steal someone else’s Choose victim at random 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

58 Work DEQueue1 threads pushBottom popBottom 1. Double-Ended Queue
30-Nov-18 M. Herlihy & N. Shavit (c) 2003

59 Obtain Work popBottom Obtain work Run thread until
Blocks or terminates popBottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

60 New Work pushBottom Unblock node Spawn node 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

61 Whatcha Gonna do When the Well Runs Dry?
@&%$!! empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

62 Steal this Thread! popTop 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

63 Never happen concurrently
Thread DEQueue Methods pushBottom popBottom popTop Never happen concurrently 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

64 Yield Processes spin trying to steal, but all DEQueues are empty
Each process yields processor between steal attempts Gives victims chance to do work 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

65 Performance Without Yield
8 7 6 ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) msort(32M) ray() 5 speedup 4 3 2 1 1 4 8 12 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

66 Ideal Wait-Free Linearizable Constant time
Fortune Cookie: “It is better to be young, rich and beautiful, than old, poor, and ugly! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

67 Compromise Method popTop may signal abort if
Concurrent popTop succeeds Concurrent popBottom takes last thread Blame the victim! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

68 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

69 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

70 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

71 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

72 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

73 Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

74 Dreaded ABA Problem Uh-Oh … CAS Yes! top 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

75 Dreaded ABA Fix tag top bottom 30-Nov-18
M. Herlihy & N. Shavit (c) 2003

76 half index & half tag to avoid ABA
Code public class DEQueue { longRMWregister top; // tag & top int bottom; // bottom thread index Thread[] deq; // array of threads } half index & half tag to avoid ABA 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

77 Dreaded ABA Problem Fix
// extract tag field from top private int TAG_MASK = 0xFFFF0000; private int TAG_SHIFT = 16; private int getTag(int i) { return ((i & TAG_MASK) >> TAG_SHIFT); } 0x index tag 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

78 Code public class DEQueue { … void pushBottom(Thread t){
this.deq[this.bottom] = t; this.bottom++; } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

79 Code Thread popTop() throws Abort { long oldTop = this.top.read();
int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

80 Make sure queue non-empty
Code Thread popTop() throws Abort { int oldTop = this.top.read(); int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Make sure queue non-empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

81 Get old and new top values
Code Thread popTop() throws Abort { int oldTop = this.top.read(); int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Get old and new top values 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

82 Code Install new top value Thread popTop() throws Abort {
int oldTop = this.top; int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; int newTop = oldTop; newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Install new top value 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

83 Code Thread popBottom() { if (this.bottom == 0) return null;
Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

84 Make sure queue non-empty
Code Thread popBottom() { if (this.bottom == 0) return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Make sure queue non-empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

85 Code Grab bottom thread Thread popBottom() { if (this.bottom == 0)
return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Grab bottom thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

86 If not near top, we’re done
Code Thread popBottom() { if (this.bottom == 0) return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } If not near top, we’re done 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

87 Code Reset top & bottom Thread popBottom() { if (this.bottom == 0)
return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Reset top & bottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

88 Summary so Far Multithreaded structures Scheduling Work
Critical path length Parallelism Scheduling Work stealing Lock-free DEQueue 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

89 Lock-Free Work Stealing
OK even if the number of processes exceeds the number of processors or when the number of processors grows and shrinks over time. No need for “non-commercial” operating-system support, such as gang scheduling or process control. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

90 Old English Proverb “May as well be hanged for stealing a sheep as a goat” From which we conclude Stealing was punished severely Sheep were worth more than goats 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

91 But Wait, There’s More! Stealing is expensive What if CAS
Only one thread taken What if We could steal more each time? Say, up to half? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

92 Review Double-ended queue (DEQueue) Local thread
Remove/add thread without CAS If top and bottom > 1 apart 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

93 Consensus If top and bottom are close
Local thread and thief contend Need consensus to resolve In a sequence of k pushes or pops Number of CAS operations is Θ(1) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

94 Consensus Stealing half increases uncertainty
Consensus on half the queue? In a sequence of k pushes or pops Number of CAS operations is Θ(k) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

95 New Idea We can get down to Θ(log k)
How: limit uncertainty to when queue size passes a power of 2! Keep a “half-point” counter Thief resets counter Local thread changes counter at power-of-2 boundary 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

96 The Big Picture Steal-range Up to 2i can be stolen atomically
Previous-Steal-Range tag top last Up to 2i can be stolen atomically tag top At least 2i outside steal range last Bottom somewhere in group of 2i+1 bottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

97 Steal Range stealRange tag: defeats ABA problem
top: index of top-most item in DEQueue stealLast: last item to be stolen tag top last stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

98 When to Steal? Steal on empty Steal probabilistically
if (shouldBalance()) { Process victim = randomProcess(); tryToSteal(victim); } Steal on empty Steal probabilistically Probability decreases as queue increases Steal when queue size passes threshold 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

99 Before PushBottom tag top 3 last bottom 14 30-Nov-18
tag top 3 last bottom 14 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

100 After PushBottom tag top last 7 bottom 15 30-Nov-18
tag top last 7 bottom 15 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

101 Update stealRange boolean updateStealRange() {
if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok=this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true;} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

102 Readjust when queue size is power of two
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; Readjust when queue size is power of two 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

103 Readjust when thief has taken some threads
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; Readjust when thief has taken some threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

104 New range size is roughly half
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; New range size is roughly half 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

105 Try to update stealRange to reflect the new size
boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true; Try to update stealRange to reflect the new size 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

106 Update stealRange If update succeeded, save a copy
boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok=this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true; If update succeeded, save a copy of updated range, to identify future thefts 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

107 pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

108 pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Thread to push 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

109 pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Are we full? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

110 pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Push thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

111 Update StealRange, if required
pushBottom Code public void pushBottom(Thread t, throws Full { if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Update StealRange, if required 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

112 Before PopBottom tag top last 7 bottom 15 30-Nov-18
tag top last 7 bottom 15 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

113 After PopBottom tag top 3 last bottom 14 30-Nov-18
tag top 3 last bottom 14 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

114 popBottom (Part One) public Object popBottom() throws Abort {
if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

115 popBottom (Part One) Bail if queue is empty public Object popBottom()
throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; Bail if queue is empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

116 popBottom (Part One) Panic if unable to fix stealRange
public Object popBottom() throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; Panic if unable to fix stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

117 popBottom (Part One) Tentatively pop a thread
public Object popBottom() throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; Tentatively pop a thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

118 popBottom (Part Two) public Object popBottom() throws Abort { …
long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

119 Deconstruct stealRange
popBottom (Part Two) public Object popBottom() throws Abort { long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … Deconstruct stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

120 If queue is empty, start over
popBottom (Part Two) public Object popBottom() throws Abort { long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … If queue is empty, start over 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

121 popBottom (Part Two) If tentatively-popped thread not in
public Object popBottom() throws Abort { long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == null) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … If tentatively-popped thread not in stealRange - no need to synchronize 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

122 popBottom (Part Three)
public Object popBottom() throws Abort { } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

123 Queue has at most one thread
popBottom (Part Three) public Object popBottom() throws Abort { } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} Queue has at most one thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

124 Try to zero out steal range
popBottom (Part Three) public Object popBottom() throws Abort { } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} Try to zero out steal range 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

125 (and the deque is now empty)
popBottom (Part Three) public Object popBottom() throws Abort { } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} If we succeeded – the thread is ours! (and the deque is now empty) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

126 popBottom (Part Three)
public Object popBottom() throws Abort { } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} If we failed – our last thread was stolen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

127 stealTop (Part One) public int stealTop(EDEQueue victim) {
long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

128 stealTop (Part One) Victim DEQueue
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Victim DEQueue 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

129 stealTop (Part One) The number of threads Actually stolen
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… The number of threads Actually stolen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

130 Deconstruct victim’s steal range
stealTop (Part One) public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Deconstruct victim’s steal range 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

131 stealTop (Part One) Compute length of victim’s stealRange, (victim’s
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Compute length of victim’s stealRange, (victim’s DEQueue length is at least twice as much) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

132 stealTop (Part One) Diff is a minimal bound on the difference in
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Diff is a minimal bound on the difference in lengths between victim and thief 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

133 If we can’t equalize by stealing – don’t steal!!
stealTop (Part One) public int stealTop(EDEQueue victim, int thiefLen) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – thiefLen; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… If we can’t equalize by stealing – don’t steal!! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

134 stealTop (Part One) public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Try to steal half the guaranteed difference: Copy threads-to-be-stolen to thief’s deque 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

135 stealTop (Part Two) public int stealTop(EDEQueue victim) { …
int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

136 stealTop (Part Two) public int stealTop(EDEQueue victim) { int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; The new length of the victim’s stealRange is about half the remaining number of threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

137 stealTop (Part Two) public int stealTop(EDEQueue victim) { int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; Try to update victim’s stealRange to reflect the theft and the new range length 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

138 stealTop (Part Two) If succeeded, update thief’s bottom and stealRange
public int stealTop(EDEQueue victim) { int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; If succeeded, update thief’s bottom and stealRange to include new threads, and return # stolen threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

139 Details Works even if someone steals from thief
Thief may fail to update own stealRange But will still update bottom, making theft happen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

140 Big Picture This code steals as much as it can More sensible to
Split the difference? May depend on stealing strategy 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

141 Vulnerability If queue size hovers around power of 2, performance will be lousy Extra credit Can we avoid this problem? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

142 Conclusions “Boutique” lock-free structures
Not general purpose Customized for work-stealing Non-trivial correctness issues 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

143 Alternative: Gang Scheduling
processor 1 processor 2 processor 3 processor 4 time Bad Example: 4-process computation with 1-process computation on 4-processor machine. Good Example: Data-parallel programs with large working sets. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

144 Alternative: Process Control
processor 1 processor 2 processor 3 processor 4 time process killed new process created Each computation creates and kills processes dynamically to equal number of processors assigned to it. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

145 Clip Art 30-Nov-18 M. Herlihy & N. Shavit (c) 2003

146 T O M M A R V O L O R I D D L E 30-Nov-18
M. Herlihy & N. Shavit (c) 2003


Download ppt "Load Balancing and Multithreaded Programming"

Similar presentations


Ads by Google