Enabling Thread Level Speculation via A Transactional Memory System Richard M. YooGeorgia Tech Hsien-Hsin Sean LeeGeorgia Tech Helper Transactions In Workshop.

Slides:



Advertisements
Similar presentations
TRAMP Workshop Some Challenges Facing Transactional Memory Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign.
Advertisements

Beyond Auto-Parallelization: Compilers for Many-Core Systems Marcelo Cintra University of Edinburgh
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer /18-740: Recent Research in Architecture October 14, 2009.
CSCI 4717/5717 Computer Architecture
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
More on Thread Level Speculation Anthony Gitter Dafna Shahaf Or Sheffet.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Code Generation Mooly Sagiv html:// Chapter 4.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
Processes CSCI 444/544 Operating Systems Fall 2008.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Instruction Level Parallelism (ILP) Colin Stevens.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Transaction Management and Concurrency Control
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
Multiscalar processors
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.
1. 2 FUNCTION INLINE FUNCTION DIFFERENCE BETWEEN FUNCTION AND INLINE FUNCTION CONCLUSION 3.
Topic ? Course Overview. Guidelines Questions are rated by stars –One Star Question  Easy. Small definition, examples or generic formulas –Two Stars.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Adaptive Transaction Scheduling for Transactional Memory Systems Richard M. Yoo Hsien-Hsin S. Lee Georgia Tech.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
UNIT - 1Topic - 3. Computer software is a program that tells a computer what to do. Computer software, or just software, is any set of machine-readable.
Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
ECE 4100/6100 Advanced Computer Architecture Lecture 2 Instruction-Level Parallelism (ILP) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
MULTIPLEX: UNIFYING CONVENTIONAL AND SPECULATIVE THREAD-LEVEL PARALLELISM ON A CHIP MULTIPROCESSOR Presented by: Ashok Venkatesan Chong-Liang Ooi, Seon.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
System calls for Process management Process creation, termination, waiting.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
System Components Operating System Services System Calls.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Lecture 20: Consistency Models, TM
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Computer Engg, IIT(BHU)
Hardware Multithreading
Changing thread semantics
Transactional Memory Semaphores, monitors, and conditional critical regions all suffer from limitations based on lock semantics Naïve synchronization may.
How to improve (decrease) CPI
Transactional Memory An Overview of Hardware Alternatives
Chapter 30 Condition Variables
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Transactions with Nested Parallelism
Lecture 10: Consistency Models
UNIT V Run Time Environments.
The University of Adelaide, School of Computer Science
Chapter 2: Operating-System Structures
The University of Adelaide, School of Computer Science
Lecture 11: Consistency Models
Presentation transcript:

Enabling Thread Level Speculation via A Transactional Memory System Richard M. YooGeorgia Tech Hsien-Hsin Sean LeeGeorgia Tech Helper Transactions In Workshop on Parallel Execution of Sequential Programs on Multi-core (PESPMA-08)

Helper Transactions: Yoo & Lee 2 Exploiting Multi-Core Performance Thread Level Speculation (TLS) –Extract new threads from single-threaded applications Transactional Memory (TM) –Help the existing threads perform better Where are ILP Techniques ?

Helper Transactions: Yoo & Lee 3 TLS versus TM Contention Management Sequential Ordering Context Passing Task Spawning Checkpointing Dependency Violation Detection Result Buffering Replay Transaction Scheduling TLS and TM share multiple hardware components TLS TM

Helper Transactions: Yoo & Lee 4 Helper Transactions Goal: Enable TLS with a TM-ready system Support “out-of-order procedure fall-thru speculation” on TM Amortize TLS implementation cost on a TM-ready system

Helper Transactions: Yoo & Lee 5 Agenda Thread-Level Speculation (TLS) Mapping TLS onto A Transactional Memory System Extending TM System

Helper Transactions: Yoo & Lee 6 Spawning Points of Thread Level Speculation Loop SpeculationIf-then-else Speculation Procedure Fall-Thru Speculation

Helper Transactions: Yoo & Lee 7 Out-of-order Spawn The spawn order of tasks ( ) disagrees with the sequential order ( ) Complicate sequential ordering maintenance Out-of-Order Procedure Fall-Thru Speculation Helper Transactions focus on out-of-order procedure fall-through speculation Spawning

Helper Transactions: Yoo & Lee 8 Agenda Thread-Level Speculation (TLS) Mapping TLS onto A Transactional Memory System Extending TM System

Helper Transactions: Yoo & Lee 9 TLS||TM Basics main() { foo() foo() code foo2() depth=1 foo2() code depth=2 Green light to commit buffer Green light to commit depth=1 depth=0 Execution timeline fallthru code Differ from conventional TM –transactions execute different code –Sequential order among transactions Differ from conventional TM –transactions execute different code –Sequential order among transactions

Helper Transactions: Yoo & Lee 10 Procedure Fall-Thru Speculation on TM Each task in TLS = a transaction –Function body is guarded with begin_transaction and commit_transaction –Spawned thread starts a transaction itself upon start –TM detects memory dependency violation

Helper Transactions: Yoo & Lee 11 Alternative Approach to Out-of-Order Spawn Spawn a new thread with function body –Reduces traffic used to convert register dependencies into memory dependencies –Simplifies compiler implementation Requires partial abort support from the TM system Out-of-Order Procedure Fall-Through Speculation on TM (Revised)

Helper Transactions: Yoo & Lee 12 Agenda Thread-Level Speculation (TLS) Mapping TLS onto A Transactional Memory (TM) System Extending TM System

Helper Transactions: Yoo & Lee 13 Required Support Compared to TLS, TM lacks 1.Thread spawning mechanism 2.Context passing mechanism 3.Sequential ordering mechanism Compiler support –Thread spawning mechanism –Context passing mechanism Hardware extension –Sequential ordering mechanism

Helper Transactions: Yoo & Lee 14 Compiler Support int main ( int argc, char* argv[]) { int a, b, c; … foo( a, b, c); … } int foo ( int arg0, int arg1, int arg2) { … // function foo body } Volatile int in_memory_a, in_memory_b, in_memory_c; int main ( int argc, char* argv[]) { int a, b, c; … in_memory_a = a; in_memory_b = b; in_memory_c = c; create_thread( _tls_foo); begin_transaction; … } void* _tls_foo ( void* arg) { int arg0, arg1, arg2; arg0 = in_memory_a; arg1 = in_memory_b; arg2 = in_memory_c; begin_transaction; foo( arg0, arg1, arg2); commit_transaction; } int foo ( int arg0, int arg1, int arg2) { … // function foo body } Sample Code Before TLS Transformation Sample Code After TLS Transformation Encountering a function call Guard the function body with transaction Function call is replaced with a thread spawn to the clone function Fall-through thread is also guarded with a transaction Create a clone function whose body is the function call Register dependencies should be changed into memory dependencies Store the function call arguments in memory… …and retrieve them back

Helper Transactions: Yoo & Lee 15 Hardware Extension 1: Binary Tree to Encode the Sequential Ordering Sequential order determines 1.Which transaction to abort upon conflict 2.Which transaction should stall on commit Use binary tree to represent sequential ordering –Child_X executing “function body” appends 1 to its parent’s encoding –Child_X executing “fall-thru code” appends 0 root foo()main() FT goo()foo() FThoo()main() FT goo() foo() X3 X1 hoo() X0X2 main() X3 X1X2X0 Sequential ordering

Helper Transactions: Yoo & Lee 16 Hardware Extension 1: Binary Tree to Encode the Sequential Ordering Sequential order determines 1.Which transaction to abort upon conflict 2.Which transaction should stall on commit Use binary tree to represent sequential ordering –Child_X executing “function body” appends 1 to its parent’s encoding –Child_X executing “fall-thru code” appends 0 root foo()main() FT goo()foo() FThoo()main() FT goo() foo() X3 X1 hoo() X0X2 main() X3 X1X2X0

Helper Transactions: Yoo & Lee 17 Hardware Extension 2: Aborting a Subtree of Transactions Upon a transaction abort –More speculative transactions are all aborted –Conservatively abort a subtree of transactions More Speculative, Abort the entire subtree of transactions root conflict

Helper Transactions: Yoo & Lee 18 Hardware Extension 3: Ordering the Commits A central module to serialize the commits –Similar to ROB –Transaction consults it to determine commit or stall Can I commit? Module Generating Stall Signal

Helper Transactions: Yoo & Lee 19 Summary TM can be extended to support out-of-order TLS Two-fold approach –Compiler support for thread spawning and context passing –Hardware support for sequential ordering Extend the usage scope of a TM system Amortize TLS implementation cost onto a TM-ready system

Thank You! Georgia Tech ECE MARS Labs

Helper Transactions: Yoo & Lee 21 TLS versus TM Thread-Level Speculation –Divide a program into possibly non-conflicting tasks –Hardware speculate tasks to execute in parallel –Inter-task dependency maintained by detecting, squashing and rolling back conflicting tasks Transactional Memory –Transaction A sequence of instructions that executes in atomic fashion These instructions either commit or abort as a single large operation –Speculatively execute transactions within a critical section –Underlying TM system detects and aborts transactions that violate memory dependency

BACKUP FOILS

Helper Transactions: Yoo & Lee 23 Helper Transactions Goal: Enable TLS with a TM-ready system –Support “out-of-order procedure fall-through speculation” on TM –Amortize TLS implementation cost on a TM-ready system ImplementationCategoryArchitectureOperand Passing Network Out-of-order Spawn Lock Parallelization MultiscalarTLSDedicatedY SVCTLSSMPY HydraTLSCMPY PolyFlowTLSSMTY TLS4OutOrderTLSCMPY VoltronTLSCMPYY? Helper TransactionsTM + TLSCMPYY TCCTMCMPY UTMTMCMPY LogTMTMCMPY Comparison of Various Parallelization Techniques

Helper Transactions: Yoo & Lee 24 The Basics (Cont’d) Differ from conventional TM –transactions execute different code –Sequential order among transactions In this example, function_X sequentially precedes fallthru_X 1.When conflict, TM should abort fall- thru_X in favor of function_X 2.Upon commit_transaction, fallthru_X should be stalled until function_X commits. OR Commit of function_X implicitly triggers the commit of the fallthru_X (implicit commit) Procedure Fall-Thru Speculation on TM Aborted transaction may improve performance due to cache warm-up Aborted transaction may improve performance due to cache warm-up function_X fallthru_X

Helper Transactions: Yoo & Lee 25 Spawning Points of Thread Level Speculation Task boundaries Determined by high level programming language –E.g., Loops, if-then-else statements, procedure fall-throughs, etc. Loop SpeculationIf-then-else Speculation Procedure Fall-Thru Speculation

Helper Transactions: Yoo & Lee 26 Supporting Out-of-Order Spawn Map out-of-order spawning to nested transactions –A transaction may have multiple concurrent transactions –At spawn point, the spawning thread increments its nesting level –The spawned thread starts a transaction at the same level Maintain sequential order by 1.Upon conflict, abort more speculative transaction 2.Stall the explicit commit of the more speculative transaction until the less speculative transaction commits Out-of-Order Procedure Fall-Thru Speculation on TM