Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer 15-740/18-740: Recent Research in Architecture October 14, 2009.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Multiprocessor Architectures for Speculative Multithreading Josep Torrellas, University of Illinois The Bulk Multicore Architecture for Programmability.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University
Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan
More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.
Colin Blundell (University of Pennsylvania)
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
Multiscalar processors
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
CS533 Concepts of Operating Systems Jonathan Walpole.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
MULTIPLEX: UNIFYING CONVENTIONAL AND SPECULATIVE THREAD-LEVEL PARALLELISM ON A CHIP MULTIPROCESSOR Presented by: Ashok Venkatesan Chong-Liang Ooi, Seon.
Atom-Aid: Detecting and Surviving Atomicity Violations Brandon Lucia, Joseph Devietti, Karin Strauss and Luis Ceze LBA Reading Group 7/3/08 Slides by Michelle.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
An Operational Approach to Relaxed Memory Models
Speculative Lock Elision
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Persistency for Synchronization-Free Regions
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Changing thread semantics
Shared Memory Consistency Models: A Tutorial
Lecture 12: TM, Consistency Models
Lecture: Consistency Models, TM
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 6: Transactions
Lecture 22: Consistency Models, TM
Lecture: Consistency Models, TM
Lecture 10: Consistency Models
Store Atomicity What does atomicity really require?
Memory Consistency Models
BulkCommit: Scalable and Fast Commit of Atomic Blocks
Relaxed Consistency Part 2
Relaxed Consistency Finale
Lecture 21: Synchronization & Consistency
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer /18-740: Recent Research in Architecture October 14, 2009

“Memory Model = Instruction Reordering + Store Atomicity” Arvind and Jan-Willem Maessen ● “Memory consistency models exist to describe and constrain the behavior of [memory systems]” ● Gives a unifying framework for SC and relaxed models with an atomic memory

Instruction Reordering vs. Store Atomicity ● Instruction reordering rules: ● Consistency within a thread ● e.g.: ● Store atomicity rules: ● Ordering which must exist in every serialization ● Consistency across threads

Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. x ← 2 x → 2 x ← 1

Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. 2.Successor Stores of a Store are ordered after its observers. x ← 2 x → 2 x ← 1

Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. 2.Successor Stores of a Store are ordered after its observers. 3.Mutual ancestors of Loads are ordered before the mutual successors of the distinct Stores they observe. ?

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? Local ordering constraints

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? Observation constraints

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? Question: Are there any ordering constraints not represented?

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? Question: Are there any ordering constraints not represented? y ← 2 : y → 2 : y ← 4 : y → 4 y ← 4 : y → 4 : y ← 2 : y → 2 Order is or

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? y ← 2 : y → 2 : y ← 4 : y → 4 y ← 4 : y → 4 : y ← 2 : y → 2 Order is or ● x ← 1 must precede both y → 2 and y → 4 ● ● z → 6 must follow both ● y → 2 and y → 4

Thread A Thread B Thread C x ← 1 Fence y → 2 y → 4 y ← 2 Fence z ← 6 y ← 4 Fence z → 6 Fence x ← 8 x → ? Store atomicity constraint

Sequential Consistency ● Programmer's gold standard ● Question: How can we have the clarity of SC without sacrificing performance?

Improving the Performance of SC Key Idea: Rather than turning the switch at individual memory access boundaries, do it only at chunk boundaries.

This is the topic of: “BulkSC: Bulk Enforcement of Sequential Consistency” Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas “Mechanisms for Store-wait-free Multiprocessors” Thomas Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos

Coarse Grain Enforcement of SC ● Similar to tasks in TLS and transactions in TM ● But, chunks are created dynamically by hardware; tasks and transactions are specified statically in code

Common Ground Dynamically divide the program into ‘chunks’ or ‘atomic sequences’ ASO begins an atomic sequence when an ordering constraint would stall instruction retirement. BulkSC assumes chunks are around 1000 instructions. Re-ordering allowed within chunks/atomic sequences. Updates not visible until the commit. Evaluated on a full system simulator (Simics/Flexus)

Bulk SC: Bulk Enforcement of Sequential Consistency Chunk executes, updates L1 Commit Made, R,W Signatures broadcast Bulk Disambiguator computes intersection - Restart computation if non empty Computes minimum serialization requirement. Enables BulkSC on machines without broadcast capabilites

Atomic Store Ordering Scalable Store Buffer Eliminates store buffer capacity related stalls. No associative lookup required. ASO Implementation Eliminates ordering related stalls. Atomic sequence tracking. Detecting atomicity violations. Rollback on violation. Commit atomic sequences.

Bulk SC Performance Results ASO More realistic workloads

Open Research Questions in Memory Consistency ● Memory model framework was descriptive. What are the prescriptive consequences? ● Can the “big-step” semantics of transactions be explained with “small-step” framework? ● Can the same hardware in a single system be used for all of coarse-grain SC, TLS, and TM? ●...

Thank you!

Extra Slides

x ← 1 Fence y ← 2 y → 3 y ← 3 Fence x ← 4 x → ? Thread AThread B Local ordering constraints

x ← 1 Fence y ← 2 y → 3 y ← 3 Fence x ← 4 x → ? Thread AThread B Observation constraint

x ← 1 Fence y ← 2 y → 3 y ← 3 Fence x ← 4 x → ? Thread AThread B Question: We need one more edge to capture the ordering. Where should it go?

Moral: When a store is observed to have been overwritten, the stores must be ordered. Thread AThread B Store atomicity constraint