DTHREADS: Efficient Deterministic Multithreading

Slides:



Advertisements
Similar presentations
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Advertisements

Michael Bond (Ohio State) Milind Kulkarni (Purdue)
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
Chapter 6: Process Synchronization
Threads. Readings r Silberschatz et al : Chapter 4.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Hoard: A Scalable Memory Allocator for Multithreaded Applications -- Berger et al. -- ASPLOS 2000 Emery Berger, Kathryn McKinley *, Robert Blumofe, Paul.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
The Memory Behavior of Data Structures Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences The University.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable threads HEMING CUI, YI-HONG LIN, HAO LI, XINAN XU, JUNFENG YANG, JIRI SIMSA, BEN BLUM,
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Threads CNS What is a thread?  an independent unit of execution within a process  a "lightweight process"  an independent unit of execution within.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
What is the Cost of Determinism?
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Threads, Thread management & Resource Management.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science S HERIFF : Precise Detection & Automatic Mitigation of False Sharing Tongping Liu,
Games Development 2 Concurrent Programming CO3301 Week 9.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Department of Computer Science and Software Engineering
Tongping Liu, Charlie Curtsinger, Emery Berger D THREADS : Efficient Deterministic Multithreading Insanity: Doing the same thing over and over again and.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Threads. Readings r Silberschatz et al : Chapter 4.
Barriers and Condition Variables
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Chapter 4: Threads 羅習五. Chapter 4: Threads Motivation and Overview Multithreading Models Threading Issues Examples – Pthreads – Windows XP Threads – Linux.
 Dan Ibanez, Micah Corah, Seegyoung Seol, Mark Shephard  2/27/2013  Scientific Computation Research Center  Rensselaer Polytechnic Institute 1 Advances.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Embedded Real-Time Systems
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Dthreads Tongping Liu, Charlie Curtsinger and Emery D. Berger, all of UMass Presented by Chris Smowton.
Chapter 4 – Thread Concepts
Alex Kogan, Yossi Lev and Victor Luchangco
Chapter 4 – Thread Concepts
Async or Parallel? No they aren’t the same thing!
Faster Data Structures in Transactional Memory using Three Paths
Chapter 4: Threads 羅習五.
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
Introduction to Operating Systems
Department of Computer Science University of California, Santa Barbara
Changing thread semantics
CSE 542: Operating Systems
Department of Computer Science University of California, Santa Barbara
CSE 542: Operating Systems
Presentation transcript:

DTHREADS: Efficient Deterministic Multithreading Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: Lokesh Gidra

Concurrent Programming is hard! Prone to deadlocks and race conditions. Thread interleavings are non-deterministic  Hard to debug! Deterministic Multithreaded System (DMT) eliminates this non-determinism. Same program with same input  same result. Simplifies debugging. Simplifies record and replay (eliminates need to track memory operations). Multiple replicated execution for fault tolerance.

Contributions DTHREADS guarantees deterministic execution. Straightforward deployment: replaces libpthread. No recompilation required. Eliminates cache-line false sharing (as a side effect). Makes printf debugging practical!

Basic Idea Isolated memory access between different threads. Replace threads with processes. Replace pthread_create() with clone system call. Memory mapped files are used to share memory (globals and the heap). Heap Thread 1 Thread 2

Fence and Global Token

Commit Protocol

Deterministic Synchronization (Global token is the key!) Locks If held by someone else, pass the token. Release the token only when lock count is 0. Condition Variables Pthread_cond_wait: Remove from token’s Q and add to variable’s Q. Pthread_cond_signal: remove first thread in variable Q and add to token’s Q.

Contd… Barriers (similar to condition variable) Thread Creation If not last to enter: move self from token Q to barrier Q. otherwise, move all from barrier Q to token Q. Thread Creation Child: place on token Q; wait for || phase. Thread Exit/Cancellation Remove from Q, call pthread_exit()/kill()

Memory Allocation and OS Support Assign sub-heap to each thread using deterministic thread index. Superblocks allocated using locks  deterministic. Intercepts system calls which affect program execution (like sigwait). Intercepts read/write system calls: touch pages for COW, to avoid segfault.

Performance On 8-core machine with 16GB RAM, 4MB L2. Benchmarks from PARSEC and Phoenix suites. For 9 of 14 benchs, dthreads runs nearly as fast or faster than pthreads, while providing determinism.

Scalability Scales nearly as well or better than pthreads. Scales almost always as well or better than CoreDet.

Limitations Incurs substantial overhead for apps with large number of: short lived transactions. modified pages per-transaction. No control over external non-determinism. Apps using Ad-hoc synchronization are not supported. Sharing of stack variables is not supported. Increases program’s memory footprint. Will perform poorly if #threads > #cores.

Personal Observations (side-effects on NUMA systems) Substantially reduces TLB miss cost: For 64-bit apps, one TLB miss: Pthreads: ~1500 cycles Dthreads: ~500 cycles Diff-ing will be too expensive: 4K as compared to just few cache lines.

Take Away Deterministic Multithreaded Systems are good. Dthreads: an easy to deploy DMT system. Supports all pthread APIs. Replaces threads with processes for memory isolation. Uses twin pages and diff-ing to commit changes. Avoids cache-line false sharing. Good for apps with less transactions. Or, can we say for scalable apps? Doesn’t support Ad-hoc synchronization.

Optimizations Lazy Commit Lazy twin creation and diff elimination Single threaded execution Lock ownership Parallelization