Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

CMPT 431 Dr. Alexandra Fedorova Lecture IV: OS Support.
The University of Adelaide, School of Computer Science
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Multiprocessor Architecture Basics The Art of Multiprocessor Programming Spring 2007.
Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007.
Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
The University of Adelaide, School of Computer Science
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
CS510 Concurrent Systems Class 1b Spin Lock Performance.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IV: OS Support.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
More on Locks: Case Studies
Art of Multiprocessor Programming1 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike.
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Spin Locks and Contention
Mutual Exclusion Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
MULTIVIE W Slide 1 (of 23) The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Paper: Thomas E. Anderson Presentation: Emerson.
Lecture 13: Multiprocessors Kai Bu
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
Spin Locks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
Mutual Exclusion Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
SYNAR Systems Networking and Architecture Group CMPT 886: The Art of Scalable Synchronization Dr. Alexandra Fedorova School of Computing Science SFU.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
The University of Adelaide, School of Computer Science
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640,
Ch4. Multiprocessors & Thread-Level Parallelism 4. Syn (Synchronization & Memory Consistency) ECE562 Advanced Computer Architecture Prof. Honggang Wang.
CS267 Lecture 61 Shared Memory Hardware and Memory Consistency Modified from J. Demmel and K. Yelick
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13: Multiprocessors Kai Bu
Spin Locks and Contention
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Multiprocessor Cache Coherency
Spin Locks and Contention
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Spin Locks and Contention Management
Multiprocessors - Flynn’s taxonomy (1966)
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Shared Counters and Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming2 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data.

Art of Multiprocessor Programming3 Kinds of Architectures SISD (Uniprocessor) –Single instruction stream –Single data stream SIMD (Vector) –Single instruction –Multiple data MIMD (Multiprocessors) –Multiple instruction –Multiple data. Our space (1)

Art of Multiprocessor Programming4 MIMD Architectures Memory Contention Communication Contention Communication Latency Shared Bus memory Distributed

Art of Multiprocessor Programming5 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor (1)

Art of Multiprocessor Programming6 What Should you do if you can’t get a lock? Keep trying –“spin” or “busy-wait” –Good if delays are short Give up the processor –Good if delays are long –Always good on uniprocessor our focus

Art of Multiprocessor Programming7 Basic Spin-Lock CS Resets lock upon exit spin lock critical section...

Art of Multiprocessor Programming8 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock introduces sequential bottleneck

Art of Multiprocessor Programming9 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention

Art of Multiprocessor Programming10 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... …lock suffers from contention Seq Bottleneck  no parallelism

Art of Multiprocessor Programming11 Basic Spin-Lock CS Resets lock upon exit spin lock critical section... Contention  ??? …lock suffers from contention

Art of Multiprocessor Programming12 Test-and-Set Boolean value Test-and-set (TAS) –Swap true with current value –Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet”

Art of Multiprocessor Programming13 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } (5)

Art of Multiprocessor Programming14 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic

Art of Multiprocessor Programming15 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Swap old and new values

Art of Multiprocessor Programming16 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true)

Art of Multiprocessor Programming17 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) (5) Swapping in true is called “test-and-set” or TAS

Art of Multiprocessor Programming18 Test-and-Set Locks Locking –Lock is free: value is false –Lock is taken: value is true Acquire lock by calling TAS –If result is false, you win –If result is true, you lose Release lock by writing false

Art of Multiprocessor Programming19 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Art of Multiprocessor Programming20 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Lock state is AtomicBoolean

Art of Multiprocessor Programming21 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Keep trying until lock acquired

Art of Multiprocessor Programming22 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Release lock by resetting state to false

Art of Multiprocessor Programming23 Performance Experiment –n threads –Increment shared counter 1 million times How long should it take? How long does it take?

Art of Multiprocessor Programming24 Graph ideal time threads no speedup because of sequential bottleneck

Art of Multiprocessor Programming25 Mystery #1 time threads TAS lock Ideal (1) What is going on?

Art of Multiprocessor Programming26 Bus-Based Architectures Bus cache memory cache

Art of Multiprocessor Programming27 Bus-Based Architectures Bus cache memory cache Random access memory (10s of cycles)

Art of Multiprocessor Programming28 Bus-Based Architectures cache memory cache Shared Bus Broadcast medium One broadcaster at a time Processors and memory all “snoop” Bus

Art of Multiprocessor Programming29 Bus-Based Architectures Bus cache memory cache Per-Processor Caches Small Fast: 1 or 2 cycles Address & state information

Art of Multiprocessor Programming30 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™

Art of Multiprocessor Programming31 Jargon Watch Cache hit –“I found what I wanted in my cache” –Good Thing™ Cache miss –“I had to shlep all the way to memory for that data” –Bad Thing™

Art of Multiprocessor Programming32 Bus Processor Issues Load Request cache memory cache data

Art of Multiprocessor Programming33 Bus Processor Issues Load Request Bus cache memory cache data Gimme data

Art of Multiprocessor Programming34 cache Bus Memory Responds Bus memory cache data Got your data right here data

Art of Multiprocessor Programming35 Bus Processor Issues Load Request memory cache data Gimme data

Art of Multiprocessor Programming36 Bus Processor Issues Load Request Bus memory cache data Gimme data

Art of Multiprocessor Programming37 Bus Processor Issues Load Request Bus memory cache data I got data

Art of Multiprocessor Programming38 Bus Other Processor Responds memory cache data I got data data Bus

Art of Multiprocessor Programming39 Bus Other Processor Responds memory cache data Bus

Art of Multiprocessor Programming40 Modify Cached Data Bus data memory cachedata (1)

Art of Multiprocessor Programming41 Modify Cached Data Bus data memory cachedata (1)

Art of Multiprocessor Programming42 memory Bus data Modify Cached Data cachedata

Art of Multiprocessor Programming43 memory Bus data Modify Cached Data cache What’s up with the other copies? data

Art of Multiprocessor Programming44 Cache Coherence We have lots of copies of data –Original copy in memory –Cached copies at processors Some processor modifies its own copy –What do we do with the others? –How to avoid confusion?

Art of Multiprocessor Programming45 Write-Back Caches Accumulate changes in cache Write back when needed –Need the cache for something else –Another processor wants it On first modification –Invalidate other entries –Requires non-trivial protocol …

Art of Multiprocessor Programming46 Write-Back Caches Cache entry has three states –Invalid: contains raw seething bits –Valid: I can read but I can’t write –Dirty: Data has been modified Intercept other load requests Write back to memory before using cache

Art of Multiprocessor Programming47 Bus Invalidate memory cachedata

Art of Multiprocessor Programming48 Bus Invalidate Bus memory cachedata Mine, all mine!

Art of Multiprocessor Programming49 Bus Invalidate Bus memory cachedata cache Uh,oh

Art of Multiprocessor Programming50 cache Bus Invalidate memory cachedata Other caches lose read permission

Art of Multiprocessor Programming51 cache Bus Invalidate memory cachedata Other caches lose read permission This cache acquires write permission

Art of Multiprocessor Programming52 cache Bus Invalidate memory cachedata Memory provides data only if not present in any cache, so no need to change it now (expensive) (2)

Art of Multiprocessor Programming53 cache Bus Another Processor Asks for Data memory cachedata (2) Bus

Art of Multiprocessor Programming54 cache data Bus Owner Responds memory cachedata (2) Bus Here it is!

Art of Multiprocessor Programming55 Bus End of the Day … memory cachedata (1) Reading OK, no writing data

Art of Multiprocessor Programming56 Mutual Exclusion What do we want to optimize? –Bus bandwidth used by spinning threads –Release/Acquire latency –Acquire latency for idle lock

Art of Multiprocessor Programming57 Simple TASLock TAS invalidates cache lines Spinners –Miss in cache –Go to bus

Art of Multiprocessor Programming58 NUMA Architecturs Acronym: –Non-Uniform Memory Architecture Illusion: –Flat shared memory Truth: –No caches (sometimes) –Some memory regions faster than others

Art of Multiprocessor Programming59 NUMA Machines Spinning on local memory is fast

Art of Multiprocessor Programming60 NUMA Machines Spinning on remote memory is slow