Part 1: Concepts and Hardware- Based Approaches

Slides:



Advertisements
Similar presentations
Maurice Herlihy (DEC), J. Eliot & B. Moss (UMass)
Advertisements

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Transactional Memory Part 1: Concepts and Hardware- Based Approaches 1Dennis Kafura – CS5204 – Operating Systems.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Transactional Memory: Architectural Support for Lock- Free Data Structures Herlihy & Moss Presented by Robert T. Bauer.
Synchron. CSE 471 Aut 011 Some Recent Medium-scale NUMA Multiprocessors (research machines) DASH (Stanford) multiprocessor. –“Cluster” = 4 processors on.
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Synchron. CSE 4711 The Need for Synchronization Multiprogramming –“logical” concurrency: processes appear to run concurrently although there is only one.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Transactional Memory CDA6159. Outline Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional.
1 Hardware Transactional Memory (Herlihy, Moss, 1993) Some slides are taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Advanced Operating Systems (CS 202) Transactional memory Jan, 27, 2016 slide credit: slides adapted from several presentations, including stanford TCC.
Multiprocessors – Locks
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:
Lecture 21 Synchronization
Part 2: Software-Based Approaches
G.Anuradha Reference: William Stallings
Lecture 19: Coherence and Synchronization
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Atomic Operations in Hardware
Lecture 18: Coherence and Synchronization
Challenges in Concurrent Computing
Anders Gidenstam Håkan Sundell Philippas Tsigas
The University of Adelaide, School of Computer Science
A Qualitative Survey of Modern Software Transactional Memory Systems
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Lecture 6: Transactions
Lecture 21: Synchronization and Consistency
Lecture 22: Consistency Models, TM
Lecture: Coherence and Synchronization
Hybrid Transactional Memory
Semaphores Chapter 6.
Lecture 25: Multiprocessors
Software Transactional Memory Should Not be Obstruction-Free
Lecture 4: Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture: Consistency Models, TM
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Advanced Operating Systems (CS 202) Memory Consistency and Transactional Memory Feb. 6, 2019.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Part 1: Concepts and Hardware- Based Approaches Transactional Memory Part 1: Concepts and Hardware- Based Approaches

Avoids problems with explicit locking Introduction Provide support for concurrent activity using transaction-style semantics without explicit locking Avoids problems with explicit locking Software engineering problems Priority inversion Convoying Deadlock Approaches Hardware (faster, size-limitations, platform dependent) Software (slower, unlimited size, platform independent) Word-based (fine-grain, complex data structures) Object-based ( course-grain, higher-level structures) CS 5204 – Fall, 2008

History * Lomet* proposed the construct: <identifier>: action( <parameter-list> ); <statement-list> end; where the statement-list is executed as an atomic action. The statement-list can include: await <test> then <statement-list>; so that execution of the process/thread does not proceed until test is true. * D.B. Lomet, “Process structuring, synchronization, and recovery using atomic actions,” In Proc. ACM Conf. on Language Design for Reliable Software, Raleigh, NC, 1977, pp. 128–137. CS 5204 – Fall, 2008

Transaction Pattern repeat { BeginTransaction(); /* initialize transaction */ <read input values> success = Validate(); /* test if inputs consistent */ if (success) { <generate updates> success = Commit(); /* attempt permanent update */ if (!success) Abort(); /* terminate if unable to commit */ } EndTransaction(); /* close transaction */ } until (success); CS 5204 – Fall, 2008

Guarantees Wait-freedom Lock-freedom Obstruction-freedom All processes make progress in a finite number of their individual steps Avoid deadlocks and starvation Strongest guarantee but difficult to provide in practice Lock-freedom At least one process makes progress in a finite number of steps Avoids deadlock but not starvation Obstruction-freedom At least one process makes progress in a finite number of its own steps in the absence of contention Avoids deadlock but not livelock Livelock controlled by: Exponential back-off Contention management CS 5204 – Fall, 2008

Hardware Instructions Compare-and-Swap (CAS): word CAS (word* addr, word test, word new) { atomic { if (*addr == test) { *addr = new; return test; } else return *addr; Usage: a spin-lock inuse = false; … while (CAS(&inuse, false, true); Examples: CMPXCHNG instruction on the x86 and Itaninium architectures CS 5204 – Fall, 2008

Hardware Instructions LL/SC: load-linked/store-conditional word LL(word* address) { return *address; } boolean SC(word* address, word value){ atomic { if (address updated since LL) return false; else { address = value; return true; Usage: repeat { while (LL(inuse)); done = SC(inuse, 1); } until (done); Examples: ldl_l/stl_c and ldq_l/stq_c (Alpha), lwarx/stwcx (PowerPC), ll/sc (MIPS), and ldrex/strex (ARM version 6 and above). CS 5204 – Fall, 2008

Hardware-based Approach Replace short critical sections Instructions Memory Load-transactional (LT) Load-transactional-exclusive (LTX) Store-transactional (ST) Transaction state Commit Abort Validate Usage pattern Use LT or LTX to read from a set of locations Use Validate to ensure consistency of read values Use ST to update memory locations Use Commit to make changes permanent Definitions Read set: locations read by LT Write set: locations accessed by LTX or ST Data set: union of Read set and Write set CS 5204 – Fall, 2008

Example typedef struct list_elem { struct list_elem *next; /* next to dequeue */ struct list_elem *prev; /* previously enqueued */ int value; } entry; shared entry *Head, *Tail; void list_enq(entry* new) { entry *old_tail; unsigned backoff = BACKOFF_MIN; unsigned wait; new->next = new->prev = NULL; while (TRUE) { old_tail = (entry*) LTX(&Tail); if (VALIDATE()) { ST(&new->prev, old_tail); if (old_tail == NULL) {ST(&Head, new); } else {ST(&old_tail->next, new); } ST(&Tail, new); if (COMMIT()) return; } wait = random() % (01 << backoff); /* exponential backoff */ while (wait--); if (backoff < BACKOFF_MAX) backoff++; CS 5204 – Fall, 2008

Hardware-based Approach CS 5204 – Fall, 2008

. . . Cache Implementation cache Bus Shared Memory address value state tag cache . . . Processor caches and shared memory connected via shared bus. Caches and shared memory “snoop” on the bus and react (by updating their contents) based on observed bus traffic. Each cache contains an (address, value) pair and a state; transactional memory adds a tag. Cache coherence: the (address, value) pairs must be consistent across the set of caches. Basic idea: “any protocol capable of detecting accessibility conflicts can also detect transaction conflict at no extra cost.” CS 5204 – Fall, 2008

. . . Line States cache Bus Shared Memory Name Access Shared? Modified? invalid none --- valid R yes no dirty R, W reserved Shared Memory Bus address value state tags cache . . . CS 5204 – Fall, 2008

. . . Transactional Tags cache Bus Shared Memory Name Meaning EMPTY contains no data NORMAL contains committed data XCOMMIT discard on commit XABORT discard on abort Shared Memory Bus address value state tags cache . . . CS 5204 – Fall, 2008

. . . Bus cycles cache Bus Shared Memory Name Kind Meaning New access address value state tags cache . . . Name Kind Meaning New access READ regular read value shared RFO exclusive WRITE both write back T_READ transaction T_WRITE BUSY refuse access unchanged CS 5204 – Fall, 2008

LT instruction LTX instruction ST instruction Scenarios If XABORT entry in transactional cache: return value If NORMAL entry Change NORMAL to XABORT Allocate second entry with XCOMMIT (same data) Return value Otherwise Issue T_READ bus cycle Successful: set up XABORT/XCOMMIT entries BUSY: abort transaction LTX instruction Same as LT instruction except that T_RFO bus cycle is used instead and cache line state is RESERVED ST instruction Same as LTX except that the XABORT value is updated CS 5204 – Fall, 2008

Performance Simulations TTS comparison methods TTS – test/test-and-set (to implement a spin lock) LL/SC – load-linked/store-conditional (to implement a spin lock) MCS – software queueing QOSB – hardware queueing Transactional Memory LL/SC MCS QOSB TM CS 5204 – Fall, 2008