TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.

Slides:

Advertisements

Similar presentations

Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.

Advertisements

On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

Software Transactional Memory and Conditional Critical Regions Word-Based Systems.

Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Concurrency Control II. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Scalable and Lock-Free Concurrent Dictionaries

Virendra J. Marathe, William N. Scherer III, and Michael L. Scott Department of Computer Science University of Rochester Presented by: Armand R. Burks.

Elad Gidron, Idit Keidar, Dmitri Perelman, Yonathan Perez

Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.

Ali Saoud Object Based Transactional Memory. Introduction Resent trends go towards object based SMT because it’s dynamic Word-based STM systems are more.

Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.

Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.

Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.

More on transactions…. Dealing with concurrency (OR: how to handle the pressure!) Locking Timestamp ordering Multiversion protocols Optimistic protocols.

1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:

EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.

Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.

CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.

CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.

SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.

Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.

CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.

Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.

Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.

Chapter 11 Concurrency Control. Lock-Based Protocols  A lock is a mechanism to control concurrent access to a data item  Data items can be locked in.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.

Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.

CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.

November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.

MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.

Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.

Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.

CSC Multiprocessor Programming, Spring, 2012 Chapter 10 – Avoiding Liveness Hazards Dr. Dale E. Parson, week 11.

Maurice Herlihy and J. Eliot B. Moss, ISCA '93

EECE571R -- Harnessing Massively Parallel Processors ece

Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun

Part 2: Software-Based Approaches

Håkan Sundell Philippas Tsigas

Parallel Programming By J. H. Wang May 2, 2017.

Faster Data Structures in Transactional Memory using Three Paths

Concurrency Control.

Challenges in Concurrent Computing

A Lock-Free Algorithm for Concurrent Bags

Anders Gidenstam Håkan Sundell Philippas Tsigas

A Qualitative Survey of Modern Software Transactional Memory Systems

Chapter 15 : Concurrency Control

Hybrid Transactional Memory

Introduction of Week 13 Return assignment 11-1 and 3-1-5

Software Transactional Memory Should Not be Obstruction-Free

Lecture 23: Transactional Memory

CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) . Parallel.

Presentation transcript:

TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry

Overview  Software Transactional Memory (STM) has long been suggested as a possible model for parallel programming  However, its practicality is debated  By exploring how to design an STM for GPUs, we want to bring this debate over to the graphics processor domain

Introduction

Problem Scenario Binary Tree 12

17 12 Problem Scenario Binary Tree ?

Pseudocode find position in tree; insert element; Step 1 Step 2 Tree might change here!

Software Transactional Memory STMs provides a construction that divides code into transactions that are guaranteed to be executed atomically

Modified Pseudocode find position in tree; insert element; Step 1 atomic { } Just one step!

Scenario with Transactions Concurrent Transaction ATransaction B

Scenario with Transactions Concurrent CommittedTransaction B 12 17

Scenario with Transactions Concurrent CommittedAborted 12 17

Progress Guarantees Lock Scheme Lock Granularity Log or Undo-Log Conflict Detection Time … STM Design

One Lock  No concurrency  Busy waiting  Convoying Done yet? Done yet? Done yet? Done yet? Done yet? Done yet?

Multiple Locks  Better concurrency  Difficult  Deadlocks C B A D N W X Q U S L Z T R Y Process A Process B

20 Dynamic Locks Transaction Log Read 20 Read 30 Read 30 Write 20 Write 15

Two STM Designs  Transactions that fail to acquire a lock are aborted Busy waits, but avoids deadlocks  Transactions that fail to acquire a lock can steal it or abort the other transaction  No busy wait  Based on STM by Tim Harris and Keir Fraser "Language support for lightweight transactions", OOPSLA 2003 A Blocking STMA Non-Blocking STM

Lock Granularity Object A int var1; int var2; int var3; int var4; int var5; int var1; int var2; int var3; int var4; int var5; Object B  Object  Word  Stripe  Tradeoff between false conflicts and the number of locks  Wide memory bus makes it quick to copy object

Log or Undo-Log  Better with high contention  No visibility problem  Optimistic  Faster when no conflict  Problem with visibility Log ChangesLog Original Content The visibility problem is hard to handle in a non-managed environment

Conflict Detection Time  Conflicts detected early  More false conflicts  Late conflict detection  Locks are held shorter time Acquire locks immediatelyAcquire locks at commit time Holding locks longer will lead to more transactions aborting long transactions in the non-blocking STM

CUDA Specifics  Minimal use of processor local memory  Better left to the main application  SIMD instruction used where possible  Mostly used to coalesce reads and writes  No Recursion  Rewritten to use stack

More CUDA Specifics  No Functions  Rewritten to use goto statements  Possible Scheduler Problems  Bad scheduling can lead to infinite waiting time  No Dynamic Memory during Runtime  Allocates much memory

Experiments

 Queue (enqueue/dequeue)  Many conflicts expected  Hash-map (inserts)  Fewer conflicts, but longer transactions as more items are inserted  Binary Tree (inserts)  Fewer conflicts as the tree grows

Experiments  Skip-List  Insert/Lookup/Remove  Scales similar to tree  Comparison with lock-free skip-list  H. Sundell and P. Tsigas, Scaleable and lock-free concurrent dictionaries, SAC04.

Contention Levels We performed the experiments using different levels of contention High Contention (No pause between transactions) Low Contention (500ms of work distributed randomly)

Backoff Strategies  Lowers contention by waiting before aborted transactions are tried again  Increases the probability that at least one transaction is successful

Hardware  Nvidia GTX 280 with 30 multiprocessors  1-60 thread blocks  32 threads in each block

Backoff Strategy (Queue, Full Con)

Blocking Queue (Full Contention)

Queue (Full Contention)

Binary Tree (Full Contention)

Hash-map (Full Contention)

Hash-Map Aborts (Full Contention) Long and expensive transactions!

Skip-List (Full Contention)

Lock-Free Skip-List (Full Contention)

Conclusion  Software Transactional Memory has attracted the interest of many researchers over the recent years  We have evaluated a blocking and a non- blocking STM design on a graphics processor  The non-blocking design scaled better than the blocking  For simplicity you pay in performance  The debate is still open!

For more information: Thank you!