Relaxed Consistency Part 2

Slides:



Advertisements
Similar presentations
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CSE , Autumn 2011 Michael Bond.  Name  Program & year  Where are you coming from?  Research interests  Or what’s something you find interesting?
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
CS510 Concurrent Systems Introduction to Concurrency.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI.
CSC321 Concurrent Programming: §5 Monitors 1 Section 5 Monitors.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Synchronization via Transactions. 2 Concurrency Quiz If two threads execute this program concurrently, how many different final values of X are there?
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
An Operational Approach to Relaxed Memory Models
CS5102 High Performance Computer Systems Memory Consistency
Distributed Shared Memory
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Lecture 11: Consistency Models
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
The C++ Memory model Implementing synchronization)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Presented to CS258 on 3/12/08 by David McGrogan
Threading And Parallel Programming Constructs
Threads and Memory Models Hal Perkins Autumn 2009
Background for Debate on Memory Consistency Models
Shared Memory Consistency Models: A Tutorial
Store Atomicity What does atomicity really require?
Memory Consistency Models
CS510 Operating System Foundations
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Why we have Counterintuitive Memory Models
Relaxed Consistency Finale
Compilers, Languages, and Memory Models
CS 8803: Memory Models David Devecsery.
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
Problems with Locks Andrew Whitaker CSE451.
Presentation transcript:

Relaxed Consistency Part 2 David Devecsery

Administration! Syllabus quiz error! Reading for next class Was labeled as due Aug 22nd. Your submission will not be counted as late Still Due Sep. 4th. Reading for next class 2 Reviews Discussion sign-ups Start thinking about what topics you’ll want to lead the discussion for.

Can this dereference null? Problem: Our out-of-order loads are speculating that data won’t be changed by processor 1 Can this dereference null? Processor 1 Processor 2 Initial: ready = false, a = null Out-of-order Out-of-order Processor 1 a = new int[10]; ready = true; Processor 2 if (ready) a[5] = 5; Store Queue a, ready Load Buffer Store Queue Load Buffer ready, a a = null ANIMATIONS Memory

Can this dereference null? Goal: Identify when our speculation (load data isn’t changed by another processor) is incorrect. Memory alerts load buffer of change to “a” Can we now release a’s value? Idea: Have memory notify the load buffer when a value changes Out-of-order Out-of-order Store Queue Load Buffer a a ANIMATIONS ready ready Hold loads until all dependent (by program order) loads are done new… true true new… null ready = true a = new… a = null ready = true a = new… Memory

Introduction to Memory Models Partial Consistency (PC) [x86] Stores to the same location by a single processor core are ordered Every store in a thread to a variable happens before the next store in that thread to that variable

Introduction to Memory Models Remember, ordering is based off each location Can this code have a null pointer exception under PC? Initial: a = null, ready = false Answer: Yes Thread 1 a = new int[10]; Thread 2 while (a == null); ready1 = true; Thread 3 while (ready1 == false) ; print a[0];

Introduction to Memory Models Remember, ordering is based off each location Can this code have a null pointer exception under PC? Initial: a = null, ready = false Thread 1 a = new int[10]; Thread 2 while (a == null); ready1 = true; Thread 3 while (ready1 == false) ; print a[0];

Introduction to Memory Models Can this segfault under PC? Initial: ready = false, a = null Answer: Absolutely! Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Memory systems for: Partial Consistency Stores are queued, and execute in FIFO order Memory systems for: Partial Consistency Loads run in parallel, integrate with memory to guarantee store order is observed How can we create a partially consistent system? Out-of-order (memory subsystem) Store Queue Load Buffer Stores can run in parallel, as long as other stores to the same address on this core have run Loads run in parallel Memory

Can this segfault? Out-of-order Out-of-order Memory Processor 1 Initial: ready = false, a = null Parallel Store Queue Load Buffer Parallel Store Queue Load Buffer Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5; ANIMATIONS Memory

Partial consistency: locking Hint: Think of using the synchronization to lock another variable: lock() y = 1 unlock () Will this code enforce mutual exclusion on a PC MM? xchg (x, y): atomic { old = x; x = y; return old } Lock: do { tmp = xchg(unlocked, 1); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) Unlock unlocked = 1

Partial consistency: locking Question: How do I build a lock on a partially consistent system? Answer: I need a way to stop load and store reordering! Provided via Fences (also called memory barriers)

Relaxed consistency: barriers X86 [X]Fences: All X instructions before the fence will be globally visible to all X instructions following the fence MFENCE (load or store) – Handout provided SFENCE (stores only) – Handout provided LFENCE (loads only) – Handout provided NOTE: x86 also has a LOCK prefix, it is needed and related to fences, but different

Partial consistency: barriers Can we use fences to make this mutually exclusive? xchg (x, y): atomic { old = x; x = y; return old } Lock: do { tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) Unlock unlocked = 1

Introduction to Memory Models What can this code print, under TSO? Initial: a = 0, b = 0 Thread 1 a = 1 MFENCE print b Thread 2 b = 1 MFENCE print a

Can this segfault? Out-of-order Out-of-order Memory Processor 1 Initial: a = 0, b = 0 Out-of-order Out-of-order Thread 1 a = 1 MFENCE print b Thread 2 b = 1 MFENCE print a Parallel Store Queue Load Buffer Parallel Store Queue Load Buffer ANIMATIONS Memory

Introduction to Memory Models What can this code print, under TSO? Initial: a = 0, b = 0 Thread 1 a = 1 LFENCE print b Thread 2 b = 1 LFENCE print a

Introduction to Memory Models What can this code print, under TSO? Initial: a = 0, b = 0 Thread 1 a = 1 SFENCE print b Thread 2 b = 1 SFENCE print a

Brand new architecture! Read question 5, Billy-Jean’s architecture.

In what ways can we reorder operations? Data Reordering Read-read Read-write Write-write Memory Atomicity When a store goes to memory, is it visible to all processors?

Reorderings happen at two layers Processor 1 Processor 2 Within the core Within the memory Out-of-order Out-of-order Parallel Store Queue Load Buffer Parallel Store Queue Load Buffer Memory

How do we handle these reorderings? Introduction of barriers Stop reordering of certain instructions around barriers Can programmers really be expected to reason about barriers and instruction reorderings throughout their programs?

“Weak Ordering – A New Definition” Adve and Hill “[…] a description of memory should not require the specification of the performance enhancing features of the underlying hardware. Rather, such features should be camouflaged by defining the memory model in terms of constraints on software which, if obeyed, make the weaker system appear sequentially consistent.” Idea: There should be a contract between the program and the hardware, if that contract is met, the program runs as SC

DRF-0 Idea: If we can impose a happens-before order between two memory accesses on a weakly ordered processor, then the accesses appear sequentially consistent. Happens before orderings: Program Order: Order of program operations on a single core Synchronization Order: Order of synchronizations (special operations synchronizing a memory location between two cores)

Ordering example These operations all have well defined orderings: W(z) W(x) W(y) PO PO PO S(b) R(x) S(a) SO S(b) SO S(a) PO PO S(c) R(y) SO R(z) These operations all have well defined orderings: ex. W(y, P1) PO S(a, P1) SO S(a, P2) PO R(y, P2) W(y, P1) HB R(y, P1)

Are these accesses ordered? P0 P1 P2 P3 P4 P5 W(x) W(y) W(z) What’s the defined behavior between these unordered operations? PO PO S(b) PO R(x) SO S(c) S(b) S(c) SO PO PO PO S(d) SO R(x) S(d) R(z) PO R(z) Which of the above accesses are unordered?

Trade-offs of DRF-0 What are the advantages and disadvantages of DRF0?

“Weak Ordering- A New Definition” What do you believe the advantages of this approach are?

Next Class When do these high-level abstractions break down What does that look like? How should we solve it?