Presentation is loading. Please wait.

Presentation is loading. Please wait.

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus.

Similar presentations


Presentation on theme: "Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus."— Presentation transcript:

1 Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus Lower bounds for counters, stacks and queues

2 Contention in shared-memory systems Contention: the extent to which processes access the same memory locations simultaneously When multiple processes simultaneously write to the same memory location, they are being stalled High contention hurts performance!

3 Memory Stalls & Write-Contention (Model: Dwork, Herlihy, Waarts, 93) variable p0p0 p1p1 p2p2 pjpj Stalls# j 2 1 0 Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously. What is the write-contention of the combining-tree counter?

4 Recall the consensus implementation we saw… Decide(v) ; code for p i, i=0,1 1.CAS(C, null, v) 2.return C Initially C=null We use a single object, C, that supports the compare&swap and read operations. What is the write-contention of this algorithm? n It can be shown that this is the write-contention of any wait-free consensus algorithm

5 What can we say about the worst-case contention-aware time complexity of objects such as counters, stacks and queues?

6 Naïve Counter Implementation 3 4 6 5 2 1 FAI Last processes to succeed incur θ(n) time complexity! FAI Can we do much better? FAI object

7 We will see a time lower bound of √n on lock-free implementations of: counters, stacks, queues… (Hendler, Shavit, 2003) Any algorithm either (a) suffers high contention or (b) suffers high latency (step complexity) Var

8 The Memory-Steps Metric #read-variables - the number of distinct base objects read by an operation Memory stalls – The total number of memory stalls incurred by an operation memory-steps = #read-objects + memory-stalls We investigate the worst-case number of memory-steps incurred by a single high-level operation.

9 Capture Influence between processes 3 5 1 4 2 6 Time complexity is determined by the extent by which operations by different processes influence each other.

10 Influence-level Shared Counter 17 Each of us may precede you and modify the value you will get! Influence level (w.r.t. p) FAI Hmmm… I will soon request a value p

11 Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

12 Modifying Steps Shared Counter 17 Hmmm… I will soon request a value Each of us may precede you! p q FAI

13 Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

14 Modifying Steps Shared Counter 18 Hmmm… I will soon request a value Each of us may precede you! p q 17 There’s an atomic step in which q modifies p’s solo- execution response We bring all the ‘Influencers’ to be on the verge of performing a modifying step FAI

15 Space/Write-contention tradeoff We bring all Influencers to be on the verge of a modifying step Each modifying step is necessarily a write/RMW operation S ≥S ≥ I C Space complexity Influence-level Write-contention

16 Latency/Contention tradeoff Base-objects on which there are outstanding modifying steps Shared Counter 17 FAI Hmmm… I will soon request a value p Process p can be made to read all these variables in the course of its operation. LR ≥LR ≥ I C # of read base objects Influence-level Write-contention

17 Time lower bound LRC ≥LRC ≥ I Time complexity is at least I Var

18 Influence(n) Objects Class The above lower bound holds for Influence(n) - a large class of object that includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate- agreement… It holds also for one-time implementations of these objects. Finding the tight bound is a challenging open question

19 A linear lower bound on the number of Stalls for long-lived objects (Fich, Hendler, Shavit, 2005) Metric is slightly different – we count the total number of stalls incurs while accessing multiple objects

20

21 Theorem: Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1. An implementation is obstruction-free if every process is guaranteed to terminate its operation if it runs solo long enough.

22 Worst-case stalls number ≥ n-1 for any OF counter implementation Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p

23 Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2

24 Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 3

25 Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 34

26 Worst-case stalls number ≥ n-1 p 2 34 Let O1 be the first word along p's path that is written by some other process in any p-free execution There must be such a word. O1O1

27 Worst-case stalls number ≥ n-1 p 2 34 O1O1 Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.

28 Worst-case stalls number ≥ n-1 p 2 34 O1O1 If (k 1 =n-1) then we are done. Otherwise, we show that p must access yet another word that may be written by other processes.

29 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

30 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

31 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

32 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

33 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

34 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

35 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change....

36 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3

37 Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3

38 Worst-case stalls number ≥ n-1 p 24 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3 Assume p gets value v

39 Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1} p

40 Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1}

41 Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

42 Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1 q

43 Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

44 Worst-case stalls number ≥ n-1 24 O1O1 3 p Let O 2 be first word that will be accessed by p after it incurs the K 1 stalls that is written by some process  G 1  {p} Let E 2 be an execution that maximizes the number of processes that are about to write to O 2 over all (G1  {p})-free executions.

45 Worst-case stalls number ≥ n-1 O1O1 p Continuing with this construction we get: O2O2 |G 2 | = K 2 |G m | = K m OmOm

46 Conclusion: “Naïve ” implementation is best possible! (In terms of worst-case execution.) 3 4 6 5 2 1 FAI FAI object


Download ppt "Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus."

Similar presentations


Ads by Google