…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN.

Slides:

Advertisements

Similar presentations

The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.

Advertisements

An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.

Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

OSDI ’10 Research Visions 3 October Epoch parallelism: One execution is not enough Jessica Ouyang, Kaushik Veeraraghavan, Dongyoon Lee, Peter Chen,

OSDI ’10 Research Visions 3 October Epoch parallelism: One execution is not enough Jessica Ouyang, Kaushik Veeraraghavan, Dongyoon Lee, Peter Chen,

Race Detection for Event-driven Mobile Applications

Detecting and surviving data races using complementary schedules

DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy.

INTEL CONFIDENTIAL Deadlock Introduction to Parallel Programming – Part 7.

Hadi JooybarGPUDet: A Deterministic GPU Architecture1 Hadi Jooybar 1, Wilson Fung 1, Mike O’Connor 2, Joseph Devietti 3, Tor M. Aamodt 1 1 The University.

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Computer Architecture II 1 Computer architecture II Lecture 9.

Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn University of Michigan, Ann Arbor Respec: Efficient.

Multiscalar processors

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,

1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.

Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.

CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.

CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.

ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.

Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.

Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.

Using Datalog for deadlock detection Mentor: Yannis Smaragdakis Oleg Nabiullin Sergey Fedorov

CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.

The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.

Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.

Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.

Optimistic Hybrid Analysis

Lecture 20: Consistency Models, TM

Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

Memory Consistency Models

Lecture 11: Consistency Models

Memory Consistency Models

Automatic Detection of Extended Data-Race-Free Regions

Amir Kamil and Katherine Yelick

Persistency for Synchronization-Free Regions

Lecture 22: Consistency Models, TM

Store Atomicity What does atomicity really require?

Introduction to CUDA.

Memory Consistency Models

CPTS 483 HW#3 Possible Solutions

Amir Kamil and Katherine Yelick

Xinyu Feng University of Science and Technology of China

Compiler Construction

Relaxed Consistency Part 2

Relaxed Consistency Finale

Compilers, Languages, and Memory Models

Lecture: Consistency Models, TM

Chien-Chung Shen CIS/UD

Presentation transcript:

…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN

Language-level guarantees for programs with races 2 Fewer possible program behaviors More potential optimizations No guarantee s DRF0 Global, serial order of all instructions SC Global, serial order of regions RS

JESSICA OUYANG 3 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... lock(A) a = a + 1 b = a unlock(A)... unlock(B)

JESSICA OUYANG 4 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... lock(A) a = a + 1 b = a unlock(A)... unlock(B)

JESSICA OUYANG 5 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... lock(A) a = a + 1 b = a unlock(A)... unlock(B)

JESSICA OUYANG 6 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... lock(A) a = a + 1 b = a unlock(A)... unlock(B)

JESSICA OUYANG 7 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... lock(A) a = a + 1 b = a unlock(A)... unlock(B)

JESSICA OUYANG 8 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 9 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 10 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 11 CPU 1CPU 2 time lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 12 CPU 1CPU 2 time lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 13 CPU 1CPU 2 time lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 14 CPU 1CPU 2 time lock(A) a = 1 unlock(A) lock(B)... a =NaN b =yourPassword... unlock(B)

JESSICA OUYANG 15

JESSICA OUYANG 16 DRF0

JESSICA OUYANG 17 DRF0

JESSICA OUYANG 18 DRF0

JESSICA OUYANG 19 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 20 a = 0 b = 0 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

Region serializability for all programs Guarantees ◦ Atomic synchronization-free regions ◦ Global, serial order of all regions Benefits ◦ Easy for programmers & tools to reason about ◦ Compilers can reorder freely within regions JESSICA OUYANG 21

How to provide region serializability Compiler ◦ Preserve regions from source code Runtime ◦ Run one thread at a time ◦ Only preempt at synchronization boundaries JESSICA OUYANG 22

JESSICA OUYANG 23 time lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B) CPU 3CPU 2CPU 4CPU 1 a:1 lock(A) a = 1 unlock(A) lock(B)... a = a + 1 b = a... unlock(B)

JESSICA OUYANG 24 time CPU 3CPU 2CPU 4CPU 1 E0

JESSICA OUYANG 25 time CPU 3CPU 2CPU 4CPU 1 Epoch- parallel execution Thread-parallel execution

E1 E3 E2 E0 == ? 2. Start epoch 1. Checkpoint state E1 E0 E2 E3 != 3. Check state 4. Roll back & Re-execute time JESSICA OUYANG 26 Uniparallel execution [Veeraraghavan ’11]

JESSICA OUYANG 27

JESSICA OUYANG 28

Conclusion Strong guarantees for all programs ◦ Region serializability One way of providing region serializability ◦ Uniparallelism JESSICA OUYANG 29