Download presentation
Presentation is loading. Please wait.
1
A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted by: Ilya Mirsky 28.03.2011
2
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 2
3
Concurrent programming terms 3 Locks (coarse and fine grained) Non blocking algorithms Wait-freedom Lock-freedom Obstruction-freedom Linearizability Memory Contention Latency
4
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 4
5
Motivation 5 Concurrent stacks are widely used in parallel applications and operating systems. A simple implementation using coarse grained locking mechanism causes a “hot spot” at the central stack object and poses a sequential bottleneck. There is a need in a scalable concurrent stack, which presents a good performance under low, medium and high workloads, with no dependency in the ratio of the operations type (push/ pop).
6
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 6
7
Introduction 7 Two key synchronization paradigms for construction of scalable concurrent data structures are software combining and elimination. The most highly scalable concurrent stack algorithm previously known is the lock-free elimination-backoff stack )Hendler, Shavit, Yershalmi). The HSY stack is highly efficient under low contention, as well as under high contention when workload is symmetric. Unfortunately, when workloads are asymmetric, the performance of HSY deteriorates to a sequential stack. Flat-combining (by Hendler et al.) significantly outperforms HSY in low and medium contentions, but it does not scale and even deteriorates at high contention level.
8
Introduction - DECS 8 DECS employs both combining & elimination mechanism. Scales well for all workload types, and outperforms other stack implementations. Maintains the simplicity and low overhead of the HSY stack. Uses a contention-reduction layer as a backoff scheme for a central stack- an elimination-combining layer. A non blocking implementation is presented, NB-DECS, a lock-free variant of DECS in which threads that have waited for too long may cancel their “combining contract” and retry their operation on the central stack.
9
Introduction - DECS 9
10
10 Central Stack Elimination-combining layer
11
Introduction - DECS 11 Central Stack Elimination-combining layer
12
Introduction - DECS 12 Central Stack zzz… Elimination-combining layer
13
Introduction - DECS 13 zzz… Wake up! Central Stack Elimination-combining layer
14
Introduction - DECS 14 Central Stack zzz… Elimination-combining layer
15
Introduction - DECS 15 Central Stack zzz… Elimination-combining layer
16
Introduction - DECS 16 Central Stack zzz… Elimination-combining layer
17
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 17
18
DECS- The Algorithm 18 The data structures 164 Collision ArrayLocations Array MultiOp int id; int op; int length; int cStatus; Cell cell; MultiOp next; MultiOp last; Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; CentralStack Elimination-combining layer
19
DECS- The Algorithm 19 Central Stack push(data1) push(data2) pop() I wish there was someone in similar situation…
20
DECS- The Algorithm 20 multiOp tInfo = initMultiOp(); multiOp tInfo = initMultiOp(data);
21
DECS- The Algorithm 21 Collision ArrayLocations Array T. 6 T. 2 MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 …4 EMPTY 6 6 I’ll wait, maybe someone will arrive… Yay, I can collide with thread 6! Active collider Passive collider
22
DECS- The Algorithm 22 Central Stack Functions
23
DECS- The Algorithm 23
24
DECS- The Algorithm 24
25
DECS- The Algorithm 25 T. 6 T. 2 zzz… Collision ArrayLocations Array MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 I see that T. 6 got PUSH, and I got POP- we can eliminate!
26
DECS- The Algorithm 26 Elimination-Combining Layer Functions
27
DECS- The Algorithm 27 T. 6 T. 2 zzz… MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 MultiOp id = 6 op = PUSH length = 0 cStatus = FINISHED cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus = FINISHED cell next = NULL last Working…
28
DECS- The Algorithm 28 T. 6 T. 2 zzz… MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 MultiOp id = 6 op = PUSH length = 0 cStatus = FINISHED cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus = FINISHED cell next = NULL last Working… Done!
29
DECS- The Algorithm 29
30
DECS- The Algorithm 30 T. 6 T. 2 zzz… Wake up man, I’ve done your job! Thank you T. 2, let’s go have a beer; I’m buying!
31
DECS- The Algorithm 31
32
DECS- The Algorithm 32
33
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 33
34
DECS Performance Evaluation 34 Hardware 128-way UltraSparc T2 Plus (T5140) server. A 2 chip system, in which each chip contains 8 cores, and each core multiplexes 8 hardware threads. Running Solaris 10 OS. The cores in each CPU share the same L2 cache. C++ code compiled with GCC with the –O3 flag. Compared VS: Treiber stack The HSY elimination-backoff stacks Flat-combining stack
35
DECS Performance Evaluation 35 Course of experiments Threads repeatedly apply operations on the stack for a fixed duration of 1 sec, and the resulting throughput is measured, varying the level of concurrency from 1 to 128. Throughput is measured on both symmetric and asymmetric workloads. Stacks are pre-populated with enough cells so that pop operations do not operate on an empty stack. Each data point is the average of 3 runs.
36
DECS Performance Evaluation 36 X-axis: threads number Symmetric workload
37
DECS Performance Evaluation 37 X-axis: threads number Moderately-asymmetric workload
38
DECS Performance Evaluation 38 X-axis: threads number Fully-asymmetric workload
39
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 39
40
NB-DECS 40 DECS is blocking. For some applications non-blocking implementation may be preferable because it’s more robust to thread failures. NB-DECS is a lock-free variant of DECS that allows threads that delegated their operations to another thread, and have waited for too long, to cancel their “combining contracts”, and retry their operations.
41
Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary 41
42
Summary 42 DECS comprises a combining-elimination layer, therefore benefits from collision of operations of reverse, as well as identical semantics. Empirical evaluation showed that DECS outperforms all best known stack algorithms for all workloads. NB-DECS The idea of combining-elimination layer could be used to efficiently implement other concurrent data-structures.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.