Download presentation
Presentation is loading. Please wait.
Published byἈλεξανδρεύς Μαυρογένης Modified over 6 years ago
1
Load Balancing and Multithreaded Programming
Nir Shavit Multiprocessor Synchronization Spring 2003
2
How to write Parallel Apps?
Multithreaded Programming Programming model Programming language (Cilk) Well-developed theory Successful practice 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
3
Why We Care Interesting in its own right Scheduler
Ideal application for Lock-free data structures 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
4
Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} *Cilk Code (Java Code in Notes) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
5
Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Parallel method call 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
6
Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Wait for children to complete 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
7
Multithreaded Fibonacci
int fib(int n) { if (n < 2) { return n; } else { int x = spawn fib(n-1); int y = spawn fib(n-2); sync(); return x + y; }} Safe to use children’s values 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
8
Note Spawn & synch operators The scheduler Like Israeli traffic signs
Are purely advisory in nature The scheduler Like the Israeli driver Has complete freedom to decide 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
9
Dynamic Behavior Multithreaded program is A thread is
A directed acyclic graph (DAG) That unfolds dynamically A thread is Maximal sequence of instructions Without spawn, sync, or return 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
10
Fib DAG sync spawn fib(4) fib(3) fib(2) fib(1) 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
11
Arrows Reflect Dependencies
fib(4) sync spawn fib(3) fib(2) fib(2) fib(1) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
12
How Parallel is That? Define work: Define critical-path length:
Total time on one processor Define critical-path length: Longest dependency path Can’t beat that! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
13
Fib Work fib(4) fib(3) fib(2) fib(2) fib(1) fib(1) 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
14
Fib Work 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 work is 17 16 17 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
15
Fib Critical Path fib(4) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
16
Critical path length is 8
Fib Critical Path fib(4) 1 8 2 7 3 4 6 Critical path length is 8 5 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
17
Notation Watch TP = time on P processors
T1 = work (time on 1 processor) T∞ = critical path length (time on ∞ processors) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
18
Simple Bounds TP ≥ T1/P TP ≥ T∞ In one step, can’t do more than P work
Can’t beat infinite resources 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
19
More Notation Watch Speedup on P processors Linear speedup
Ratio T1/TP How much faster with P processors Linear speedup T1/TP = Θ(P) Max speedup (average parallelism) T1/T∞ 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
20
Remarks Graph nodes have out-degree ≤ 2 Unique Starting node
Ending node 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
21
Matrix Multiplication
30-Nov-18 M. Herlihy & N. Shavit (c) 2003
22
Matrix Multiplication
Each n-by-n matrix multiplication 8 multiplications 4 additions Of n/2-by-n/2 submatrices 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
23
Addition int add(Matrix C, Matrix T, int n) { if (n == 1) {
C[1,1] = C[1,1] + T[1,1]; } else { partition C, T into half-size submatrices; spawn add(C11,T11,n/2); spawn add(C12,T12,n/2); spawn add(C21,T21,n/2); spawn add(C22,T22,n/2) sync(); }} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
24
Addition Let AP(n) be running time For example For n x n matrix
on P processors For example A1(n) is work A∞(n) is critical path length 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
25
Addition Work is Partition, synch, etc 4 spawned additions
A1(n) = 4 A1(n/2) + Θ(1) Partition, synch, etc 4 spawned additions 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
26
Same as double-loop summation
Addition Work is A1(n) = 4 A1(n/2) + Θ(1) = Θ(n2) Same as double-loop summation 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
27
spawned additions in parallel
Critical Path length is A∞(n) = A∞(n/2) + Θ(1) spawned additions in parallel Partition, synch, etc 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
28
Addition Critical Path length is A∞(n) = A∞(n/2) + Θ(1) = Θ(log n)
30-Nov-18 M. Herlihy & N. Shavit (c) 2003
29
Multiplication int mult(Matrix C, Matrix A, Matrix B, int n) {
if (n == 1) { C[1,1] = A[1,1]·B[1,1]; } else { allocate temporary n·n matrix T; partition A,B,C,T into half-size submatrices; … 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
30
Multiplication (con’t)
spawn mult(C11,A11,B11,n/2); spawn mult(C12,A11,B12,n/2); spawn mult(C21,A21,B11,n/2); spawn mult(C22,A22,B12,n/2) spawn mult(T11,A11,B21,n/2); spawn mult(T12,A12,B22,n/2); spawn mult(T21,A21,B21,n/2); spawn mult(T22,A22,B22,n/2) sync(); spawn add(C,T,n); }} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
31
8 spawned mulitplications
Multiplication Work is M1(n) = 8 M1(n/2) + A1(n) Final addition 8 spawned mulitplications 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
32
Same as serial triple-nested loop
Multiplication Work is M1(n) = 8 M1(n/2) + Θ(n2) = Θ(n3) Same as serial triple-nested loop 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
33
Half-size parallel multiplications
Critical path length is M∞(n) = M∞(n/2) + A∞(n) Final addition Half-size parallel multiplications 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
34
Multiplication Critical path length is M∞(n) = M∞(n/2) + A∞(n)
= M∞(n/2) + Θ(log n) = Θ(log2 n) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
35
Parallelism M1(n)/ M∞(n) = Θ(n3/log2 n)
To multiply two 1000 x 1000 matrices 10003/102=107 Much more than number of processors on any real machine 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
36
Shared-Memory Multiprocessors
Parallel applications Java Cilk, etc. Mix of other jobs All run together Come & go dynamically 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
37
Scheduling Ideally, In real life, User-level scheduler
Maps threads to dedicated processors In real life, Maps threads to fixed number of processes Kernel-level scheduler Maps processes to dynamic pool of processors 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
38
For Example Initially, Serial computation
All P processors available for application Serial computation Takes over one processor Leaving P-1 for us Waits for I/O We get that processor back …. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
39
Speedup Map threads onto P processes Cannot get P-fold speedup
What if the kernel doesn’t cooperate? Can try for PA-fold speedup PA is time-averaged number of processors the kernel gives us 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
40
8-processor Sun Ultra Enterprise 5000.
Static Load Balancing 8 7 6 5 speedup ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) 4 8-processor Sun Ultra Enterprise 5000. 3 2 1 1 4 8 12 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
41
Dynamic Load Balancing
8 7 6 ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) msort(32M) ray() 5 speedup 4 8-processor Sun Ultra Enterprise 5000. 3 2 1 1 4 8 8 12 12 16 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
42
Scheduling Hierarchy User-level scheduler Kernel-level scheduler
Tells kernel which processes are ready Kernel-level scheduler Synchronous (for analysis, not correctness!) Picks pi threads to schedule at step i Time-weighted average is: 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
43
Greed is Good Greedy scheduler Schedules as much as it can
At each time step 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
44
Theorem Greedy scheduler ensures actual time T ≤ T1/PA + T∞(P-1)/PA
30-Nov-18 M. Herlihy & N. Shavit (c) 2003
45
Proof Strategy Bound this! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
46
Put Tokens in Buckets Thread scheduled and executed Thread scheduled
but not executed work idle 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
47
At the end …. Total #tokens = work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
48
At the end …. T1 tokens work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
49
Must Show ≤ T∞(P-1) tokens work idle 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
50
Every Move You Make … Scheduler is greedy At least one node ready
Number of idle threads in one step At most pi-1 ≤ P-1 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
51
Every Step You Take … Consider longest path in unexecuted sub-DAG at step i At least one node in path ready Length of path shrinks by at least one at each step Initially, path is T∞ So there are at most T∞ idle steps 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
52
Counting Tokens At most P-1 idle threads per step At most T∞ steps
So idle bucket contains at most T∞(P-1) tokens Both buckets contain T1 + T∞(P-1) tokens 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
53
Recapitulating 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
54
Turns Out This bound is within a constant factor of optimal
Actual optimal is NP-complete 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
55
Work Sharing Process generates new threads Migrate them elsewhere
In hopes of balancing the load 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
56
Work Stealing If a process runs out of work
It steals work from another If everyone busy, no migration Idle process incurs synchronization cost 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
57
Lock-Free Work Stealing
Each process has a pool of ready threads Remove thread without synchronizing If you run out of threads, steal someone else’s Choose victim at random 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
58
Work DEQueue1 threads pushBottom popBottom 1. Double-Ended Queue
30-Nov-18 M. Herlihy & N. Shavit (c) 2003
59
Obtain Work popBottom Obtain work Run thread until
Blocks or terminates popBottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
60
New Work pushBottom Unblock node Spawn node 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
61
Whatcha Gonna do When the Well Runs Dry?
@&%$!! empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
62
Steal this Thread! popTop 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
63
Never happen concurrently
Thread DEQueue Methods pushBottom popBottom popTop Never happen concurrently 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
64
Yield Processes spin trying to steal, but all DEQueues are empty
Each process yields processor between steal attempts Gives victims chance to do work 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
65
Performance Without Yield
8 7 6 ideal mm(1024) lu(2048) barnes(16K,10) heat(4K,512,100) msort(32M) ray() 5 speedup 4 3 2 1 1 4 8 12 16 20 24 28 32 processes 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
66
Ideal Wait-Free Linearizable Constant time
Fortune Cookie: “It is better to be young, rich and beautiful, than old, poor, and ugly! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
67
Compromise Method popTop may signal abort if
Concurrent popTop succeeds Concurrent popBottom takes last thread Blame the victim! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
68
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
69
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
70
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
71
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
72
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
73
Dreaded ABA Problem top 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
74
Dreaded ABA Problem Uh-Oh … CAS Yes! top 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
75
Dreaded ABA Fix tag top bottom 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
76
half index & half tag to avoid ABA
Code public class DEQueue { longRMWregister top; // tag & top int bottom; // bottom thread index Thread[] deq; // array of threads … } half index & half tag to avoid ABA 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
77
Dreaded ABA Problem Fix
// extract tag field from top private int TAG_MASK = 0xFFFF0000; private int TAG_SHIFT = 16; private int getTag(int i) { return ((i & TAG_MASK) >> TAG_SHIFT); } 0x index tag 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
78
Code public class DEQueue { … void pushBottom(Thread t){
this.deq[this.bottom] = t; this.bottom++; } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
79
Code Thread popTop() throws Abort { long oldTop = this.top.read();
int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
80
Make sure queue non-empty
Code Thread popTop() throws Abort { int oldTop = this.top.read(); int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Make sure queue non-empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
81
Get old and new top values
Code Thread popTop() throws Abort { int oldTop = this.top.read(); int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; long newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Get old and new top values 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
82
Code Install new top value Thread popTop() throws Abort {
int oldTop = this.top; int bottom = this.bottom; if (bottom < getIndex(oldTop)) // empty return null; Thread t = this.deq[getIndex(oldTop)]; int newTop = oldTop; newTop = setIndex(oldTop, getIndex(oldTop)+1); if (this.top.CAS(oldTop, newTop)) return t; throw new Abort(); }…} Install new top value 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
83
Code Thread popBottom() { if (this.bottom == 0) return null;
Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
84
Make sure queue non-empty
Code Thread popBottom() { if (this.bottom == 0) return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Make sure queue non-empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
85
Code Grab bottom thread Thread popBottom() { if (this.bottom == 0)
return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Grab bottom thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
86
If not near top, we’re done
Code Thread popBottom() { if (this.bottom == 0) return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } If not near top, we’re done 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
87
Code Reset top & bottom Thread popBottom() { if (this.bottom == 0)
return null; this.bottom--; Thread t = this.deq[this.bottom]; long oldTop = this.top.read(); if (this.bottom > getIndex(oldTop)) return t; long newTop = makeTop(getTag(oldTop),0); this.bottom = 0; if (this.bottom == getIndex(oldTop)) if (this.top.CAS(oldTop, newTop)) return t; this.top.write(newTop); } Reset top & bottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
88
Summary so Far Multithreaded structures Scheduling Work
Critical path length Parallelism Scheduling Work stealing Lock-free DEQueue 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
89
Lock-Free Work Stealing
OK even if the number of processes exceeds the number of processors or when the number of processors grows and shrinks over time. No need for “non-commercial” operating-system support, such as gang scheduling or process control. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
90
Old English Proverb “May as well be hanged for stealing a sheep as a goat” From which we conclude Stealing was punished severely Sheep were worth more than goats 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
91
But Wait, There’s More! Stealing is expensive What if CAS
Only one thread taken What if We could steal more each time? Say, up to half? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
92
Review Double-ended queue (DEQueue) Local thread
Remove/add thread without CAS If top and bottom > 1 apart 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
93
Consensus If top and bottom are close
Local thread and thief contend Need consensus to resolve In a sequence of k pushes or pops Number of CAS operations is Θ(1) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
94
Consensus Stealing half increases uncertainty
Consensus on half the queue? In a sequence of k pushes or pops Number of CAS operations is Θ(k) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
95
New Idea We can get down to Θ(log k)
How: limit uncertainty to when queue size passes a power of 2! Keep a “half-point” counter Thief resets counter Local thread changes counter at power-of-2 boundary 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
96
The Big Picture Steal-range Up to 2i can be stolen atomically
Previous-Steal-Range tag top last Up to 2i can be stolen atomically tag top At least 2i outside steal range last Bottom somewhere in group of 2i+1 bottom 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
97
Steal Range stealRange tag: defeats ABA problem
top: index of top-most item in DEQueue stealLast: last item to be stolen tag top last stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
98
When to Steal? Steal on empty Steal probabilistically
if (shouldBalance()) { Process victim = randomProcess(); tryToSteal(victim); } Steal on empty Steal probabilistically Probability decreases as queue increases Steal when queue size passes threshold 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
99
Before PushBottom tag top 3 last bottom 14 30-Nov-18
tag top 3 last bottom 14 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
100
After PushBottom tag top last 7 bottom 15 30-Nov-18
tag top last 7 bottom 15 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
101
Update stealRange boolean updateStealRange() {
if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok=this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true;} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
102
Readjust when queue size is power of two
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; Readjust when queue size is power of two 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
103
Readjust when thief has taken some threads
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; Readjust when thief has taken some threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
104
New range size is roughly half
Update stealRange boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) prevStealRange = newRange; return ok; } return true; New range size is roughly half 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
105
Try to update stealRange to reflect the new size
boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok = this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true; Try to update stealRange to reflect the new size 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
106
Update stealRange If update succeeded, save a copy
boolean updateStealRange() { if (size is a power of two || theft occurred) { // Try to update the stealRange int newSize = Math.max(1, power of 2 closest to half); long oldRange=this.stealRange; int tag = getTag(oldRange.stealRange); int top = getTop(oldRange.stealRange); long newRange= makeStealRange(tag+1, top, top+newSize-1)); boolean ok=this.stealRange.CAS(oldRange,newRange); if (ok) this.prevStealRange = newRange; return ok; } return true; If update succeeded, save a copy of updated range, to identify future thefts 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
107
pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
108
pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Thread to push 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
109
pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Are we full? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
110
pushBottom Code public void pushBottom(Thread t, throws Full {
if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Push thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
111
Update StealRange, if required
pushBottom Code public void pushBottom(Thread t, throws Full { if (this.getSize() == QUEUE_SIZE) throw new Full(); this.deq[this.bottom] = t; this.bottom=(++this.bottom) % QUEUE_SIZE; updateStealRange(); } Update StealRange, if required 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
112
Before PopBottom tag top last 7 bottom 15 30-Nov-18
tag top last 7 bottom 15 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
113
After PopBottom tag top 3 last bottom 14 30-Nov-18
tag top 3 last bottom 14 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
114
popBottom (Part One) public Object popBottom() throws Abort {
if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; … 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
115
popBottom (Part One) Bail if queue is empty public Object popBottom()
throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; … Bail if queue is empty 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
116
popBottom (Part One) Panic if unable to fix stealRange
public Object popBottom() throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; … Panic if unable to fix stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
117
popBottom (Part One) Tentatively pop a thread
public Object popBottom() throws Abort { if (this.getSize() == 0) return null; if (!updateStealRange()) throw new Abort(); if (this.bottom == 0) this.bottom = QUEUE_SIZE-1; else --this.bottom; Object t = this.deq[this.bottom]; … Tentatively pop a thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
118
popBottom (Part Two) public Object popBottom() throws Abort { …
long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
119
Deconstruct stealRange
popBottom (Part Two) public Object popBottom() throws Abort { … long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … Deconstruct stealRange 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
120
If queue is empty, start over
popBottom (Part Two) public Object popBottom() throws Abort { … long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == EMPTY) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … If queue is empty, start over 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
121
popBottom (Part Two) If tentatively-popped thread not in
public Object popBottom() throws Abort { … long oldStealRange = this.stealRange; int rangeTop = getTop(oldStealRange); int rangeBot = getLast(oldStealRange); if (rangeBot == null) { this.bottom = 0; // last thread already stolen return null; } else if (this.bottom != rangeBot) return t; // no need to synchronize else { … If tentatively-popped thread not in stealRange - no need to synchronize 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
122
popBottom (Part Three)
public Object popBottom() throws Abort { … } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
123
Queue has at most one thread
popBottom (Part Three) public Object popBottom() throws Abort { … } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} Queue has at most one thread 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
124
Try to zero out steal range
popBottom (Part Three) public Object popBottom() throws Abort { … } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} Try to zero out steal range 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
125
(and the deque is now empty)
popBottom (Part Three) public Object popBottom() throws Abort { … } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} If we succeeded – the thread is ours! (and the deque is now empty) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
126
popBottom (Part Three)
public Object popBottom() throws Abort { … } else { // Try to make stealRange empty int rangeTag = getTag(oldStealRange); if (this.stealRange.CAS(oldStealRange, makeStealRange(tag+1,0,EMPTY))) { this.bottom=0; return t; // thread not stolen yet } else { this.bottom=0 return null; // thread stolen }}} If we failed – our last thread was stolen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
127
stealTop (Part One) public int stealTop(EDEQueue victim) {
long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
128
stealTop (Part One) Victim DEQueue
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Victim DEQueue 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
129
stealTop (Part One) The number of threads Actually stolen
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… The number of threads Actually stolen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
130
Deconstruct victim’s steal range
stealTop (Part One) public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Deconstruct victim’s steal range 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
131
stealTop (Part One) Compute length of victim’s stealRange, (victim’s
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Compute length of victim’s stealRange, (victim’s DEQueue length is at least twice as much) 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
132
stealTop (Part One) Diff is a minimal bound on the difference in
public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Diff is a minimal bound on the difference in lengths between victim and thief 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
133
If we can’t equalize by stealing – don’t steal!!
stealTop (Part One) public int stealTop(EDEQueue victim, int thiefLen) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – thiefLen; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… If we can’t equalize by stealing – don’t steal!! 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
134
stealTop (Part One) public int stealTop(EDEQueue victim) { long oldStealRange = victim.stealRange; int oldLast = getLast(oldStealRange); int oldTop = getTop(oldStealRange); int oldTag = getTag(oldStealRange); int deqBot = victim.bot; int rangeLen = oldStealRange.getSize(); int diff = 2*rangeLen – this.deq.length; if (diff <= 1) return 0; else { int numToSteal = diff/2 for (int i = 0; i < numToSteal; i++) this.deq[this.bottom+i % QUEUE_SIZE] = victim.deq[oldTop+i % QUEUE_SIZE]; }… Try to steal half the guaranteed difference: Copy threads-to-be-stolen to thief’s deque 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
135
stealTop (Part Two) public int stealTop(EDEQueue victim) { …
int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
136
stealTop (Part Two) public int stealTop(EDEQueue victim) { … int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; The new length of the victim’s stealRange is about half the remaining number of threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
137
stealTop (Part Two) public int stealTop(EDEQueue victim) { … int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; Try to update victim’s stealRange to reflect the theft and the new range length 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
138
stealTop (Part Two) If succeeded, update thief’s bottom and stealRange
public int stealTop(EDEQueue victim) { … int newRangeLen= max(1,power of 2 closest to half the remaining threads); newTop = (oldTop+numToSteal) % DEQUE_SIZE; newLast = (newTop + newRangeLen – 1) % DEQUE_SIZE; long newRange = makeStealRange(oldTag+1, newTop, newLast); if (victim.stealRange.CAS(oldStealRnage, newRange)) { this.bottom = (this.bottom + numToSteal) % DEQUE_SIZE; this.updateStealRange(); return numToSteal; } return 0; If succeeded, update thief’s bottom and stealRange to include new threads, and return # stolen threads 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
139
Details Works even if someone steals from thief
Thief may fail to update own stealRange But will still update bottom, making theft happen 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
140
Big Picture This code steals as much as it can More sensible to
Split the difference? May depend on stealing strategy 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
141
Vulnerability If queue size hovers around power of 2, performance will be lousy Extra credit Can we avoid this problem? 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
142
Conclusions “Boutique” lock-free structures
Not general purpose Customized for work-stealing Non-trivial correctness issues 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
143
Alternative: Gang Scheduling
processor 1 processor 2 processor 3 processor 4 time Bad Example: 4-process computation with 1-process computation on 4-processor machine. Good Example: Data-parallel programs with large working sets. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
144
Alternative: Process Control
processor 1 processor 2 processor 3 processor 4 time process killed new process created Each computation creates and kills processes dynamically to equal number of processors assigned to it. 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
145
Clip Art 30-Nov-18 M. Herlihy & N. Shavit (c) 2003
146
T O M M A R V O L O R I D D L E 30-Nov-18
M. Herlihy & N. Shavit (c) 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.