Tongping Liu, Charlie Curtsinger, Emery Berger D THREADS : Efficient Deterministic Multithreading Insanity: Doing the same thing over and over again and.

Slides:



Advertisements
Similar presentations
PEREGRINE: Efficient Deterministic Multithreading through Schedule Relaxation Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang Software.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Grace: Safe Multithreaded Programming for C/C++ Emery Berger University of Massachusetts,
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang,
Calvin: Deterministic or Not? Free Will to Choose Derek R. Hower, Polina Dudnik, Mark D. Hill, David A. Wood.
Troubleshooting SDN Control Software with Minimal Causal Sequences COLIN SCOTT, ANDREAS WUNDSAM, BARATH RAGHAVANAUROJIT PANDA, ANDREW OR, JEFFERSON LAI,EUGENE.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
October 2003 What Does the Future Hold for Parallel Languages A Computer Architect’s Perspective Josep Torrellas University of Illinois
Transaction Management and Concurrency Control
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
Parrot: A Practical Runtime for Deterministic, Stable, and Reliable threads HEMING CUI, YI-HONG LIN, HAO LI, XINAN XU, JUNFENG YANG, JIRI SIMSA, BEN BLUM,
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
DTHREADS: Efficient Deterministic Multithreading
Rex: Replication at the Speed of Multi-core Zhenyu Guo, Chuntao Hong, Dong Zhou*, Mao Yang, Lidong Zhou, Li Zhuang Microsoft ResearchCMU* 1.
Deterministic Replay of Java Multithreaded Applications Jong-Deok Choi and Harini Srinivasan slides made by Qing Zhang.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Multi-Threading and Load Balancing Compiled by Paul TaylorCSE3AGR Stolen mainly from Orion Granatir
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science S HERIFF : Precise Detection & Automatic Mitigation of False Sharing Tongping Liu,
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Laboratory - 4.  Threading Concept  Threading in.NET  Multi-Threaded Socket  Example.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Threads Tutorial #7 CPSC 261. A thread is a virtual processor Each thread is provided the illusion that it owns a core – Copy of the registers – It is.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
HXY Debugging Made by Contents 目录 History of Java MT Sequential & Parallel Different types of bugs Debugging skills.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University.
Techniques and Structures in Concurrent Programming Wilfredo Velazquez.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Debugging on Shared Memory Introduction to Valgrind Multithreads Tools & Principles 林孟潇
Threads. Readings r Silberschatz et al : Chapter 4.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Read-Log-Update A Lightweight Synchronization Mechanism for Concurrent Programming Alexander Matveev (MIT) Nir Shavit (MIT and TAU) Pascal Felber (UNINE)
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Agenda  Quick Review  Finish Introduction  Java Threads.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.
Dthreads Tongping Liu, Charlie Curtsinger and Emery D. Berger, all of UMass Presented by Chris Smowton.
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Threads and Scheduling
Async or Parallel? No they aren’t the same thing!
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
Introduction to Operating Systems
Transaction Properties
Changing thread semantics
Light-weight Contexts: An OS Abstraction for Safety and Performance
Lecture 6: Transactions
Chapter 10 Transaction Management and Concurrency Control
Lecture 22: Consistency Models, TM
Rui (Ray) Wu Atomic Operation Rui (Ray) Wu
A Novel Home Migration Protocol in Home-based DSM
EECE.4810/EECE.5730 Operating Systems
DMP: Deterministic Shared Memory Multiprocessing
Presentation transcript:

Tongping Liu, Charlie Curtsinger, Emery Berger D THREADS : Efficient Deterministic Multithreading Insanity: Doing the same thing over and over again and expecting different results.

2 In the Beginning…

3 There was the Core.

4 And it was Good.

5 It gave us our Daily Speed.

6 Until the Apocalypse.

7 And the Speed was no Moore.

8 And then came a False Prophet…

9

10 Want speed?

11 I BRING YOU THE GIFT OF PARALLELISM!

12 color = ; row = 0; // globals void nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++; } for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe); for (n = 0; n < 9; n++) pthread_join(t[n]); JUST USE THREADS…

13

14

15

16

17

18 pthreads race conditions atomicity violations deadlock order violations

19 Salvation?

20

21 pthreads race conditions atomicity violations deadlock order violations D THREADS deterministic Dthreads

22 D THREADS Enables… Race-free Executions Replay Debugging w/o Logging Replicated State Machines

23 Overhead with CoreDet 7.8 D THREADS : Efficient Determinism Usually faster than the state of the art

24 Overhead with CoreDet 7.8 D THREADS : Efficient Determinism Generally as fast or faster than pthreads

25 % g++ myprog.cpp –l thread D THREADS : Easy to Use p

26 Isolation shared address space disjoint address spaces

27 Performance: Processes vs. Threads threads processes Thread Execution Time (ms) Normalized Execution Time

28 Performance: Processes vs. Threads threads processes Thread Execution Time (ms) Normalized Execution Time

29 Performance: Processes vs. Threads threads processes Thread Execution Time (ms) Normalized Execution Time

30 “Shared Memory”

31 Snapshot pages before modifications “Shared Memory”

32 Write back diffs “Shared Memory”

33 “Thread” 1 “Thread” 2 “Thread” 3 ParallelSerial Update in Deterministic Time & Order Parallel mutex_lock cond_wait pthread_create

34 D THREADS performance analysis

35 Thread 1 Main Memory Core 1 Thread 2 Core 2 Invalidate The Culprit: False Sharing

36 Thread 1Thread 2 Invalidate Main Memory Core 1 Core 2 The Culprit: False Sharing 20x

37 Process 1Process 2 Global State Core 1 Core 2 Process 2 Process 1 D THREADS : Eliminates False Sharing!

38 Dthreads detailed analysis D THREADS : Detailed Analysis

39 Dthreads detailed analysis D THREADS : Detailed Analysis

40 Dthreads detailed analysis D THREADS : Detailed Analysis

41 Scalability D THREADS : Scalable Determinism

42 Scalability D THREADS : Scalable Determinism

43 Scalability D THREADS : Scalable Determinism

44 D THREADS Dthreads % g++ myprog.cpp –l thread p

45 End

46 Scheduler Determinism

47 Excluding Outliers D THREADS : Without Outliers Just 5% slower than pthreads

48 Commit Protocol Time Twin Page Diff Global State Local State

49 a0b0a1b1 D THREADS Example Execution a0 b0a0b0a0b0 if(a == 0) b = 1; if(b == 0) a = 1; Global State Committed State a1 b1

50 No Problem a0b0 if(a == 0) b = 1; if(b == 0) a = 1; a1b1

51 That’s Better. a0b0 lock(); if(a == 0) b = 1; unlock(); lock(); if(b == 0) a = 1; unlock(); b1

52 a0b0a1 lock(); if(a == 0) b = 1; unlock(); lock(); if(b == 0) a = 1; unlock(); Or is it?

53 Determinism Is this enough?

54 Robust Determinism

55 External Nondeterminism socket = open_socket(80); listen(socket);

56 Problem already solved

57 Overhead

58 Wrap-Up Determinism Robust Determinism Internal Determinism

59 Wrap-Up Threads to Processes Commit Before Synch. Commit In Token Order

60 Overhead with CoreDet 7.8 [ASPLOS 10] Performance: D THREADS & CoreDet vs. pthreads

61 How D THREADS Provides Determinism Isolation Deterministic Time Deterministic Order

62 Evaluation Phoenix