Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus.

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus Lower bounds for counters, stacks and queues

Contention in shared-memory systems Contention: the extent to which processes access the same memory locations simultaneously When multiple processes simultaneously write to the same memory location, they are being stalled High contention hurts performance!

Memory Stalls & Write-Contention (Model: Dwork, Herlihy, Waarts, 93) variable p0p0 p1p1 p2p2 pjpj Stalls# j 2 1 0 Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously. What is the write-contention of the combining-tree counter?

Recall the consensus implementation we saw… Decide(v) ; code for p i, i=0,1 1.CAS(C, null, v) 2.return C Initially C=null We use a single object, C, that supports the compare&swap and read operations. What is the write-contention of this algorithm? n It can be shown that this is the write-contention of any wait-free consensus algorithm

What can we say about the worst-case contention-aware time complexity of objects such as counters, stacks and queues?

Naïve Counter Implementation 3 4 6 5 2 1 FAI Last processes to succeed incur θ(n) time complexity! FAI Can we do much better? FAI object

We will see a time lower bound of √n on lock-free implementations of: counters, stacks, queues… (Hendler, Shavit, 2003) Any algorithm either (a) suffers high contention or (b) suffers high latency (step complexity) Var

The Memory-Steps Metric #read-variables - the number of distinct base objects read by an operation Memory stalls – The total number of memory stalls incurred by an operation memory-steps = #read-objects + memory-stalls We investigate the worst-case number of memory-steps incurred by a single high-level operation.

Capture Influence between processes 3 5 1 4 2 6 Time complexity is determined by the extent by which operations by different processes influence each other.

Influence-level Shared Counter 17 Each of us may precede you and modify the value you will get! Influence level (w.r.t. p) FAI Hmmm… I will soon request a value p

Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

Modifying Steps Shared Counter 17 Hmmm… I will soon request a value Each of us may precede you! p q FAI

Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

Modifying Steps Shared Counter 18 Hmmm… I will soon request a value Each of us may precede you! p q 17 There’s an atomic step in which q modifies p’s solo- execution response We bring all the ‘Influencers’ to be on the verge of performing a modifying step FAI

Space/Write-contention tradeoff We bring all Influencers to be on the verge of a modifying step Each modifying step is necessarily a write/RMW operation S ≥S ≥ I C Space complexity Influence-level Write-contention

Latency/Contention tradeoff Base-objects on which there are outstanding modifying steps Shared Counter 17 FAI Hmmm… I will soon request a value p Process p can be made to read all these variables in the course of its operation. LR ≥LR ≥ I C # of read base objects Influence-level Write-contention

Time lower bound LRC ≥LRC ≥ I Time complexity is at least I Var

Influence(n) Objects Class The above lower bound holds for Influence(n) - a large class of object that includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate- agreement… It holds also for one-time implementations of these objects. Finding the tight bound is a challenging open question

A linear lower bound on the number of Stalls for long-lived objects (Fich, Hendler, Shavit, 2005) Metric is slightly different – we count the total number of stalls incurs while accessing multiple objects

Theorem: Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1. An implementation is obstruction-free if every process is guaranteed to terminate its operation if it runs solo long enough.

Worst-case stalls number ≥ n-1 for any OF counter implementation Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 3

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 34

Worst-case stalls number ≥ n-1 p 2 34 Let O1 be the first word along p's path that is written by some other process in any p-free execution There must be such a word. O1O1

Worst-case stalls number ≥ n-1 p 2 34 O1O1 Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.

Worst-case stalls number ≥ n-1 p 2 34 O1O1 If (k 1 =n-1) then we are done. Otherwise, we show that p must access yet another word that may be written by other processes.

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change....

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3

Worst-case stalls number ≥ n-1 p 24 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3 Assume p gets value v

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1} p

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1}

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1 q

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

Worst-case stalls number ≥ n-1 24 O1O1 3 p Let O 2 be first word that will be accessed by p after it incurs the K 1 stalls that is written by some process  G 1  {p} Let E 2 be an execution that maximizes the number of processes that are about to write to O 2 over all (G1  {p})-free executions.

Worst-case stalls number ≥ n-1 O1O1 p Continuing with this construction we get: O2O2 |G 2 | = K 2 |G m | = K m OmOm

Conclusion: “Naïve ” implementation is best possible! (In terms of worst-case execution.) 3 4 6 5 2 1 FAI FAI object

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus.

Similar presentations

Presentation on theme: "Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus.

Similar presentations

Presentation on theme: "Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus."— Presentation transcript:

Similar presentations

About project

Feedback