Serialization Sets A Dynamic Dependence-Based Parallel Execution Model Matthew D. Allen Srinath Sridharan Gurindar S. Sohi University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Threads. Readings r Silberschatz et al : Chapter 4.
Chapter 1 OO using C++. Abstract Data Types Before we begin we should know how to accomplish the goal of the program We should know all the input and.
Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz April 7 th, 2010 Youngjoon Jo.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Distribution of Marks Internal Sessional Evaluation Assignments – 10 Quizzes – 10 Class Participation Attendence – 5 Mid – Term Test – 25 External Evaluation.
What is Concurrent Programming? Maram Bani Younes.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Comparison of OO Programming Languages © Jason Voegele, 2003.
The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.
Programming Languages and Paradigms Object-Oriented Programming.
Chapter 15 – Inheritance, Virtual Functions, and Polymorphism
C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?
國立台灣大學 資訊工程學系 Chapter 4: Threads. 資工系網媒所 NEWS 實驗室 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
An Introduction to Java Chapter 11 Object-Oriented Application Development: Part I.
Multithreading in Java Project of COCS 513 By Wei Li December, 2000.
111 © 2002, Cisco Systems, Inc. All rights reserved.
Core Java: Essential Features 08/05/2015 Kien Tran.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Introduction to Object Oriented Programming CMSC 331.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
OOP: Encapsulation,Abstraction & Polymorphism. What is Encapsulation Described as a protective barrier that prevents the code and data being randomly.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Operating System 2 Overview. OPERATING SYSTEM OBJECTIVES AND FUNCTIONS.
An Object-Oriented Approach to Programming Logic and Design Chapter 3 Using Methods and Parameters.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Fortress John Burgess and Richard Chang CS691W University of Massachusetts Amherst.
Rethinking Parallel Execution Guri Sohi (along with Matthew Allen, Srinath Sridharan, Gagan Gupta) University of Wisconsin-Madison.
Rethinking Parallel Execution Guri Sohi (along with Matthew Allen, Srinath Sridharan, Gagan Gupta) University of Wisconsin-Madison.
Automatically Exploiting Cross- Invocation Parallelism Using Runtime Information Jialu Huang, Thomas B. Jablin, Stephen R. Beard, Nick P. Johnson, and.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Concurrency Control 1 Fall 2014 CS7020: Game Design and Development.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Sep/05/2001PaCT Fusion of Concurrent Invocations of Exclusive Methods Yoshihiro Oyama (Japan Science and Technology Corporation, working in University.
Object-Oriented Programming Chapter Chapter
ISBN Object-Oriented Programming Chapter Chapter
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Barriers and Condition Variables
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CS 100Lecture71 CS100J Lecture 7 n Previous Lecture –Computation and computational power –Abstraction –Classes, Objects, and Methods –References and aliases.
A Survey of Object-Oriented Concept Oscar Nierstrasz.
Agenda  Quick Review  Finish Introduction  Java Threads.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
ISBN Chapter 12 Support for Object-Oriented Programming.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
Introduction to OpenMP
Parallel Programming By J. H. Wang May 2, 2017.
Computer Engg, IIT(BHU)
Martin Rinard Laboratory for Computer Science
Chapter 4: Threads.
Chapter 4: Threads.
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
CS100J Lecture 7 Previous Lecture This Lecture Java Constructs
Fundaments of Game Design
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Presentation transcript:

Serialization Sets A Dynamic Dependence-Based Parallel Execution Model Matthew D. Allen Srinath Sridharan Gurindar S. Sohi University of Wisconsin-Madison

Motivation Multicore processors ubiquitous – Performance via parallel execution Multithreaded programming is problematic – Dependences encoded statically – Difficult to reason about locks, synchronization – Many errors not found in sequential programs – Execution is nondeterministic Need better parallel execution models! February 16, 20092PPoPP 2009

Serialization Sets Overview Sequential program with annotations – Identify potentially independent methods – Associate a serializer with these methods Serializer groups dependent method invocations into serialization sets – Runtime executes in order to honor dependences Serializer attempts to map independent methods invocations into different sets – Runtime opportunistically parallelizes execution February 16, 2009PPoPP 20093

Serialization Sets Overview Sequential program with no locks and no explicit synchronization Deterministic, race-free execution Comparable performance to multithreading – Sometimes better! February 16, 2009PPoPP 20094

Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP 20095

Running Example February 16, 2009PPoPP trans_t* trans; while ((trans = get_trans ()) != NULL) { account_t* account = trans->account; if (trans->type == DEPOSIT) account->deposit (trans->amount); if (trans->type == WITHDRAW) account->withdraw (trans->amount); } Several static unknowns! # of transactions? Points to? Loop-carried dependence?

Multithreading Strategy February 16, 2009PPoPP trans_t* trans; while ((trans = get_trans ()) != NULL) { account_t* account = trans[i]->account; if (trans->type == DEPOSIT) account->deposit (trans->amount); if (trans->type == WITHDRAW) account->withdraw (trans->amount); } 1)Read all transactions into an array 2)Divide chunks of array among multiple threads Oblivious to what accounts each thread may access! → Methods must lock account to → ensure mutual exclusion

Serialization Sets Potentially independent methods – Modify only data owned by object – Fields / Data members – Pointers to non-shared data – Consistent with OO practices (modularity, encapsulation, information hiding) Modifying methods for independence – Store return value in object, retrieve with accessor – Copy pointer data February 16, 2009PPoPP 20098

Serialization Sets Divide program into isolation epochs – Data partitioned into domains Privately writable: data that may be read or written by a single serialization set – Object or set of objects – Serializer dynamically identifies serialization set for each method invocation Shared read-only: data that may be read (but not written) by any method February 16, 2009PPoPP 20099

writable pw_account_t; begin_isolation (); trans_t* trans; while ((trans = get_trans ()) != NULL) { pw_account_t* account = trans->account; if (trans->type == DEPOSIT) delegate(account, deposit, trans->amount); if (trans->type == WITHDRAW) delegate(account, withdraw, trans->amount); } end_isolation (); End isolation epoch Example with Serialization Sets February 16, 2009PPoPP Declare privately-writable account Begin isolation epoch Delegate indicates potentially- independent operations Serializer type: uses account number to compute serialization set At execution, delegate: 1)Executes serializer 2)Identifies serialization set 3)Inserts invocation in serialization set

delegate February 16, 2009PPoPP deposit acct=100 $2000 SS #100SS #200SS #300 withdraw acct=300 $350 withdraw acct=200 $1000 withdraw acct=100 $50 deposit acct=300 $5000 withdraw acct=100 $20 withdraw acct=200 $1000 deposit acct=100 $300 Program context Delegate context Serializer: computes SS with account number ss_t ss = account->get_number();

Program thread Delegate threads Program context February 16, 2009PPoPP deposit acct=100 $2000 SS #100SS #200SS #300 withdraw acct=300 $350 withdraw acct=200 $1000 withdraw acct=100 $50 deposit acct=300 $5000 withdraw acct=100 $20 withdraw acct=200 $1000 deposit acct=100 $300 Delegate context Delegate 0Delegate 1 deposit acct=100 $2000 withdraw acct=100 $50 withdraw acct=100 $20 deposit acct=100 $300 withdraw acct=200 $1000 withdraw acct=300 $350 deposit acct=300 $5000 withdraw acct=200 $1000 delegate Race-free, deterministic execution without synchronization!

Parallel Execution w/o Sharing 1.Vary data in privately-writable/read-only domains in alternating epochs Outputs of one epoch become inputs of the next 2.Associative, commutative methods Operate on local copy of state Reduction to summarize result 3.Containers manipulated by program context Delegate operations on underlying data February 16, 2009PPoPP

Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP

Prometheus: C++ Library for SS Template library – Compile-time instantiation of SS data structures – Metaprogramming for static type checking Runtime orchestrates parallel execution Portable – x86, x86_64, SPARC V9 – Linux, Solaris February 16, 2009PPoPP

Prometheus Serializers Serializers – Subclass serializer base class and override method – Or use built-in serializer supplied by library Reducibles – Subclass reducible base class and override virtual reduce method – Reduction automatically performed on first use after isolation epoch ends February 16, 2009PPoPP

Prometheus Runtime February 16, 2009PPoPP Program Thread Delegate Thread 0 Delegate Thread 2 Delegate Thread 1 Delegate assignment: SS % NUM_THREADS Communication queues: Fast-Forward [PPoPP 2008] + Polymorphic interface

Debugging Support Tag all data accessed by serialization set – Objects – Smart pointers Any data accessed by multiple serialization sets indicates programmer error Problem: can’t detect some kinds of missing annotations – Future work: static checking of annotations February 16, 2009PPoPP

Debugging Support Deterministic model means we can simulate SS execution in sequential program – Prometheus support for compiling debug version – Do all debugging on sequential program! Correct sequential → correct parallel (caveat: for a given input) February 16, 2009PPoPP

Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP

Evaluation Methodology Benchmarks – Lonestar, NU-MineBench, PARSEC, Phoenix Conventional Parallelization – pthreads, OpenMP Prometheus versions – Port program to sequential C++ program – Idiomatic C++: OO, inheritance, STL – Parallelize with serialization sets February 16, 2009PPoPP

Results Summary February 16, 2009PPoPP Socket AMD Barcelona (4-way multicore) = 16 total cores

Results Summary February 16, 2009PPoPP

Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP

Related Work Actors / Active Objects – Hewitt [JAI 1977] MultiLisp – Halstead [ACM TOPLAS 1985] Inspector-Executor – Wu et al. [ICPP 1991] Jade – Rinard and Lam [ACM TOPLAS 1998] Cilk – Frigo et al. [PLDI 1998] OpenMP February 16, 2009PPoPP

Conclusions Sequential program with annotations – No explicit synchronization, no locks Programmers focus on keeping computation private to object state – Consistent with OO programming practices Dependence-based model – Deterministic race-free parallel execution Performance close to, and sometimes better, than multithreading February 16, 2009PPoPP

Questions February 16, 2009PPoPP