Download presentation
Presentation is loading. Please wait.
Published byJonah Ferguson Modified over 9 years ago
2
Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny Hendler, Itai Incze, and Moran Tzafrir
3
Multicore Software Scaling 3 User code Multicore Speedup 1.8x7x3.6x Unfortunately, not so simple…
4
Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Why? Amdahl’s Law As num cores grows the effect of 25% becomes more acute 2.3/4, 2.9/8, 3.4/16, 3.7/32…4.0/infinity
5
Amdahl and Shared Data Structures 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared Fine grained parallelism has huge performance benefit
6
But… Can we always draw the right conclusions from Amdah’s law? Claim: sometimes the overhead of using fine- grained synchronization is so high…that it is better to have a single thread do all the work sequentially in order to avoid it
7
7 Software Combining Tree [Yew et al] n requests in log n time object Tree requires a major coordination effort: multiple CAS operations, cache-misses, etc
8
Oyama et. al Mutex object lock bcd Head a object CAS() Apply a,b,c, and d to object return responses Release lock every request involves CAS
9
Flat Combining Have single lock holder collect and perform requests of all others – Without using CAS operations to coordinate requests – With combining of requests (if cost of k batched operations is less than that of k operations in sequence we win)
10
Flat-Combining object lock Enq(d) Head object CAS() Apply requests to object Publication list Enq(d ) null Deq() counter 54 125453 Enq(d) Deq() Collect requestsAgain try to collect requests Most requests do not involve a CAS, in fact, not even a memory barrier
11
Flat-Combining Pub-List Cleanup Enq(d) Head object Publication list Enq(d ) null Deq() counter 54 12 54 53 Enq(d) Every combiner increments counter and updates record’s time stamp when returning response Traverse and remove from list records with old time stamp If thread reappears must add itself to pub list Cleanup requires no CAS, only reads and writes
12
Fine-Grained FIFO Queue bcd TailHead a JDK6.0 (on > 10 million desktops) lock-free Alg by Michael and Scott CAS() P: Dequeue() => a Q: Enqueue(d)
13
Flat-Combining FIFO Queue object lock Enq(a) Head CAS() Publication list Enq(b ) null counter 54 1254 Enq(b) Deq()Enq(b) Sequential FIFO Queue bcd TailHead a OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step
14
Flat-Combining FIFO Queue object lock Enq(a) Head CAS() Publication list Enq(b ) counter 54 1254 Enq(b) Deq() Enq(b) Sequential “Fat Node” FIFO Queue TailHead c b a c b e OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step “Fat Node” easy sequentially but cannot be done in concurrent alg without CAS
15
Linearizable FIFO Queue Flat Combining Combining tree MS queue, Oyama, and Log-Synch better
16
Benefit’s of Flat Combining Flat Combining in Red better log
17
Linearizable Stack Flat Combining Elimination Stack Treiber Lock-free Stack better
18
Priority Queue Lotan Shavit lock-based SkipQueue Lotan Shavit lock-free SkipQueue Flat combining with sequential pairing heap plugged in… better
19
Parallel FC Synchronous Queues Single Flat Combining Parallel Flat Combining JDK 6.0 JDK no parks Elimination tree better parallel FC single thread performance still better than JDK
20
Why? Parallel Flat Combining in Blue better log
21
Summary FC is provides superior linearizable implementations of quite a few structures Parallel FC, when applicable, allows FC + scalability Good fit with heterogeneous architectures But FC and Parallel FC not always applicable, for example, for search trees (we tried )
22
In the Future: Data-Structures Will have to Adapt
23
But How? Randomized, Relaxed, and Flat A lot more randomization…(hash tables, skiplists, randomized collections) Relaxed fairness…(unordered pools instead of queues and stacks) Move away from comparison based algs…(tries and hash tables instead of trees)
24
Figuring out what these future structures will look like is part of what we do at ScaleSynch… Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.