Contention in shared memory multiprocessors Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler Definitions Lower bound for consensus.

Slides:

Advertisements

Similar presentations

Virtual Memory (Chapter 4.3)

Advertisements

CS 603 Process Synchronization: The Colored Ticket Algorithm February 13, 2002.

Mutual Exclusion – SW & HW By Oded Regev. Outline: Short review on the Bakery algorithm Short review on the Bakery algorithm Black & White Algorithm Black.

Threads Cannot be Implemented As a Library Andrew Hobbs.

A Completeness theorem for a class of synchronization objects Afek.Y, Weisberger.E, Weisman.H Presented by: Reut Schwartz.

Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.

Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.

Recursion vs. Iteration The original Lisp language was truly a functional language: –Everything was expressed as functions –No local variables –No iteration.

Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.

1 Chapter 2 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2007 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.

Two Techniques for Proving Lower Bounds Hagit Attiya Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

1 Friday, June 16, 2006 "In order to maintain secrecy, this posting will self-destruct in five seconds. Memorize it, then eat your computer." - Anonymous.

Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.

Chapter 8 - Self-Stabilizing Computing1 Chapter 8 – Self-Stabilizing Computing Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi.

Ordering and Consistent Cuts Presented By Biswanath Panda.

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.

CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

What Can Be Implemented Anonymously ? Paper by Rachid Guerraui and Eric Ruppert Presentation by Amir Anter 1.

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus.

CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.

1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.

Synchronization (Barriers) Parallel Processing (CS453)

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

1 CSC 222: Computer Programming II Spring 2004 Pointers and linked lists  human chain analogy  linked lists: adding/deleting/traversing nodes  Node.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.

Concurrency, Mutual Exclusion and Synchronization.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.

Games Development 2 Concurrent Programming CO3301 Week 9.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

CSC 211 Data Structures Lecture 13

6.852: Distributed Algorithms Spring, 2008 Class 13.

Mutual Exclusion Using Atomic Registers Lecturer: Netanel Dahan Instructor: Prof. Yehuda Afek B.Sc. Seminar on Distributed Computation Tel-Aviv University.

A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.

Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.

CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.

Week 5 - Wednesday.  What did we talk about last time?  Recursion  Definitions: base case, recursive case  Recursive methods in Java.

Threads and Singleton. Threads  The JVM allows multiple “threads of execution”  Essentially separate programs running concurrently in one memory space.

Priority Queues Dan Dvorin Based on ‘The Art of Multiprocessor Programming’, by Herlihy & Shavit, chapter 15.

Consider the Java code snippet below. Is it a legal use of Java synchronization? What happens if two threads A and B call get() on an object supporting.

E.G.M. PetrakisAlgorithm Analysis1  Algorithms that are equally correct can vary in their utilization of computational resources  time and memory  a.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.

Operating Systems CMPSC 473 Signals, Introduction to mutual exclusion September 28, Lecture 9 Instructor: Bhuvan Urgaonkar.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY

Concurrency and Performance Based on slides by Henri Casanova.

Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.

1 Stacks Abstract Data Types (ADTs) Stacks Application to the analysis of a time series Java implementation of a stack Interfaces and exceptions.

Distributed Algorithms (22903) Lecturer: Danny Hendler Approximate agreement This presentation is based on the book “Distributed Computing” by Hagit attiya.

AP National Conference, AP CS A and AB: New/Experienced A Tall Order? Mark Stehlik

Distributed Algorithms (22903)

Memory Management.

Distributed Algorithms (22903)

Distributed Algorithms (22903)

Distributed Algorithms (22903)

Multicore programming

Presentation transcript:

Contention in shared memory multiprocessors Multiprocessor synchronization algorithms ( ) Lecturer: Danny Hendler Definitions Lower bound for consensus Lower bounds for counters, stacks and queues

Contention in shared-memory systems Contention: the extent to which processes access the same memory locations simultaneously When multiple processes simultaneously write to the same memory location, they are being stalled High contention hurts performance!

Memory Stalls & Write-Contention variable p0p0 p1p1 p2p2 pjpj Stalls# j Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.

Recall the consensus implementation we saw… Decide(v) ; code for p i, i=0,1 1.CAS(C, null, v) 2.return C Initially C=null We use a single object, C, that supports the compare&swap and read operations. What is the write-contention of this algorithm? n It can be shown that this is the write- contention of any consensus algorithm

What can we say about the worst-case time complexity of objects such as counters, stacks and queues?

Naïve Counter Implementation FAI Last processes to succeed incur θ(n) time complexity! FAI Can we do much better? FAI object

We will see a time lower bound of √n on non-blocking implementations of: counters, stacks, queues… Any algorithm either (a) suffers high contention or (b) suffers high latency

The Memory-Steps Metric #read-variables - the number of distinct base objects read by an operation Memory stalls – The total number of memory stalls incurred by an operation memory-steps = #read-objects + memory-stalls We investigate the worst-case number of memory-steps incurred by a single high-level operation.

Capture Influence between processes Time complexity is determined by the extent by which operations by different processes influence each other.

Influence-level Shared Counter 17 Each of us may precede you and modify the value you will get! Influence level (w.r.t. p) FAI Hmmm… I will soon request a value p

Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

Modifying Steps Shared Counter 17 Hmmm… I will soon request a value Each of us may precede you! p q FAI

Modifying Steps Shared Counter 17 FAI Hmmm… I will soon request a value Each of us may precede you! p q

Modifying Steps Shared Counter 18 Hmmm… I will soon request a value Each of us may precede you! p q 17 There’s an atomic step in which q modifies p’s return value. We bring all the ‘Influencers’ to be on the verge of performing a modifying step FAI

Space/Write-contention tradeoff We bring all Influencers to be on the verge of a modifying step Each modifying step is necessarily a write/RMW operation S ≥S ≥ I C Space complexity Influence-level Write-contention

Latency/Contention tradeoff Base-objects on which there are outstanding modifying steps Shared Counter 17 FAI Hmmm… I will soon request a value p Process p can be made to read all these variables in the course of its operation! LR ≥LR ≥ I C # of read base objects Influence-level Write-contention

Time lower bound LRC ≥LRC ≥ I Time complexity is at least I

Influence(n) Objects Class Definition: The Influence-function, I o (n), of a generic object O, is defined as follows: I o (n)= k, if the influence-level of any n-process nonblocking implementation of O is at least k. Influence(n) includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate-agreement… Definition: Influence(n) is the class of generic objects whose Influence-function is in  (n)

Concurrent Counter is in Influence(n) Shared Counter 17 Each of us may precede you! FAI Hmmm… I will soon request a value p Influence-level is (n-1): every q ≠ p can influence p

Stack is in Influence(n) Each of us may precede you! Hmmm… I will soon attempt to pop a value. p n Top of stack Influence-level is (n-1), e.g. if every q ≠ p has a pending pop operation.

Approximate Agreement is in Influence(n) P1P1 0 2ε2ε2ε 2ε2ε Influence-level is (n-1) If p 1 runs first, it must return 0. If it is preceded by an execution where some q ≠ p 1 terminates, p 1 must return a value no less than ε. P2P2 P3P3 P4P4 P5P5 PnPn In approximate agreement, each process proposes its value. Validity: Each process must decide on a value that is legal (in the range of proposed values). Approximate agreement: The values decided by any two processes must be no more than ε apart.

The First-Generation Problem Every process calls a First operation once. We say an operation is in the first generation of execution E if it is not preceded in E by any other operation All operations not in the first generation of the execution must return false. In quiescence, at least one operation from the first generation must have returned true. Lemma The First-Generation object is in Influence(n), and for this problem our bound is tight. The bound for Influence(n) is tight

The mark array of n multi-reader multi-writer atomic variables An Optimal Implementation for the First Generation Problem Groups of n processes

A linear lower bound on the number of Stalls for long-lived objects The following material is not required for the exam/assignments.

Theorem: Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 3

Worst-case stalls number ≥ n-1 Start from an initial state. Fix a process p about to perform a fetch&increment operation. Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered. p 2 34

Worst-case stalls number ≥ n-1 p 2 34 Let O1 be the first word along p's path that is written by some other process in any p-free execution There must be such a word. O1O1

Worst-case stalls number ≥ n-1 p 2 34 O1O1 Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.

Worst-case stalls number ≥ n-1 p 2 34 O1O1 If (k 1 =n-1) then we are done. Otherwise, we show that p must access yet another word that may be written by other processes.

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change....

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3

Worst-case stalls number ≥ n-1 p 2 34 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change.... 3

Worst-case stalls number ≥ n-1 p 24 O1O1 What happens if p incurs the stalls on O1? But now the rest of the path may change Assume p gets value v

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1} p

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We have: v  {c,…,c+K1}

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

Worst-case stalls number ≥ n-1 24 O1O1 3 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation p We select some process q  G 1  {p} We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1 q

Worst-case stalls number ≥ n-1 v: the value returned by p if we let it run and incur the stalls c: the number of fetch&increment operations completed before p starts its operation We let q perform K 1 +1 fetch&increment operations q must write to a word read by p after O1

Worst-case stalls number ≥ n-1 24 O1O1 3 p Let O 2 be first word that will be accessed by p after it incurs the K 1 stalls that is written by some process  G 1  {p} Let E 2 be an execution that maximizes the number of processes that are about to write to O 2 over all (G1  {p})-free executions.

Worst-case stalls number ≥ n-1 O1O1 p Continuing with this construction we get: O2O2 |G 2 | = K 2 |G m | = K m OmOm

Conclusion: “Naïve ” implementation is best possible! FAI FAI object