Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.
Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Michael Bond (Ohio State) Milind Kulkarni (Purdue)
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)
Transactional Memory – Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1.
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Ali Saoud Object Based Transactional Memory. Introduction Resent trends go towards object based SMT because it’s dynamic Word-based STM systems are more.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
TOWARDS A SOFTWARE TRANSACTIONAL MEMORY FOR GRAPHICS PROCESSORS Daniel Cederman, Philippas Tsigas and Muhammad Tayyab Chaudhry.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
EPFL - March 7th, 2008 Interfacing Software Transactional Memory Simplicity vs. Flexibility Vincent Gramoli.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Selfishness in Transactional Memory Raphael Eidenbenz, Roger Wattenhofer Distributed Computing Group Game Theory meets Multicore Architecture.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
The Cost of Privatization Hagit Attiya Eshcar Hillel Technion & EPFLTechnion.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the Open64 Compiler Dhruva R. Chakrabarti HP Labs, USA.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Maximum Benefit from a Minimal HTM Owen Hofmann, Chris Rossbach, and Emmett Witchel The University of Texas at Austin.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
CS5204 – Operating Systems Transactional Memory Part 2: Software-Based Approaches.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
WG5: Applications & Performance Evaluation Pascal Felber
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
© 2008 Multifacet ProjectUniversity of Wisconsin-Madison Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal, Michael.
CS492B Analysis of Concurrent Programs Transactional Memory Jaehyuk Huh Computer Science, KAIST Based on Lectures by Prof. Arun Raman, Princeton University.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
Hardware and Software transactional memory and usages in MRE
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Transactional Memory Student Presentation: Stuart Montgomery CS5204 – Operating Systems 1.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Part 2: Software-Based Approaches
PHyTM: Persistent Hybrid Transactional Memory
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Faster Data Structures in Transactional Memory using Three Paths
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
A Qualitative Survey of Modern Software Transactional Memory Systems
Changing thread semantics
Yiannis Nikolakopoulos
Hybrid Transactional Memory
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Design and Implementation Issues for Atomicity
Software Transactional Memory Should Not be Obstruction-Free
Locking Protocols & Software Transactional Memory
Deferred Runtime Pipelining for contentious multicore transactions
Lecture 23: Transactional Memory
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Presentation transcript:

Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond

Do We Need Efficient STM? 2

Problem Solved! 3 Blue Gene/Q

HTM is limited… 4 Problem Solved?

Best-effort HTM: no completion guarantee 1 Performance penalty: short transactions 2 Language-level support for atomic blocks: STM fallback [1] I. Calciu et al. Invyswell: A Hybrid Transactional Memory for Haswell’s Restricted Transactional Memory. In PACT, [2] R. M. Yoo et al. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing. In SC, atomic { from.balance -= amount; to.balance += amount; } transaction Problem Solved?

Existing STMs add high overhead 1,2,3 6 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.

Existing STMs add high overhead 1,2,3 Related challenges: scalability, progress guarantees, strong semantics 7 Software Transactional Memory Is Slow [1] C. Cascaval et al. Software Transactional Memory: Why Is It Only a Research Toy? In CACM, 2008 [2] A. Dragojevi´c, et al. Why STM Can Be More than a Research Toy. In CACM, 2011 [3] R. M. Yoo et al. Kicking the Tires of Software Transactional Memory: Why the Going Gets Tough. In SPAA, 2008.

Challenge Expensive to detect conflicts T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … } 8 o.f = … T2

Challenge Expensive to detect conflicts 9 p.g = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

Challenge Expensive to detect conflicts 10 t.k = … T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

Challenge Expensive to detect conflicts 11 instrumentation ? T2 T1 atomic { … … = o.f; … = p.g; … o.f = …; p.g = …; … }

12

 Adds very low overhead  Achieves good scalability by using a hybrid approach  Provides strong progress guarantees  Provides strong atomicity 13 LarkTM Contributions

Key Insight Avoid high instrumentation costs by minimizing instrumentation costs for non-conflicting accesses 14

LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, 2006.

LarkTM Design Per-object biased reader-writer locks 1,2 Eager concurrency control Piggybacking conflict detection and conflict resolution on lock transfers M. D. Bond et al. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOSPLA, B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, Minimal instrumentation and synchronization for both transactional and non-transactional non-conflicting accesses Does not release locks even if transactions commit

17 Biased Locks f lock state object o

18 Biased Locks ∈ {WrEx T, RdEx T, RdSh} f lock state object o

19 Time T1 Multi-thread Execution f lock state T2 WrEx T1 object o

transaction start txn id: 42 o.f = 1 20 Time T1 Multi-thread Execution f lock state T2 last txn WrEx T1 object o

transaction start txn id: 42 o.f = 1 21 Time T1 Multi-thread Execution f lock state T2 update last txn 42 WrEx T1 object o

transaction start txn id: 42 o.f = 1 22 Time T1 Multi-thread Execution f lock state T2 add o.f undo log last txn 42 … WrEx T1 object o

transaction start txn id: 42 o.f = 1 23 Time T1 T2 Multi-thread Execution f lock state update last txn 1 42 … WrEx T1 object o

transaction start txn id: 42 o.f = 1 24 Time T1 T2 o.f = 2 Multi-thread Execution f lock state last txn 1 42 … WrEx T1 object o

transaction start txn id: 42 o.f = 1 … 25 Time T1 T2 o.f = 2 Multi-thread Execution f lock state No synchronization on T1’s accesses to o Problem! last txn 1 42 … WrEx T1 object o

transaction start txn id: Time T1 T2 o.f = 2 Multi-thread Execution f lock state T2 starts coordination o.f = 1 … last txn 1 42 … WrEx T1 object o

transaction start txn id: Time T1 T2 o.f = 2 Coordination f lock state update o.f = 1 … last txn 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Coordination f lock state request o.f = 1 … last txn 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point last txn 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Coordination f lock state request … = o.f o.f = 1 … safe point Detecting Conflicts last txn 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 A Transactional Conflict f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts Contention Management detected conflicts Resolving Conflicts last txn 1 42 … Int T2 object o

transaction start 32 Time T1 T2 o.f = 2 Not A Transactional Conflict f lock state safe point no conflict request … safe point Detecting Conflicts last txn txn id: … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Coordination f lock state request … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o

transaction start 34 Time T1 T2 o.f = 2 Coordination f lock state response waiting request txn id: 42 … = o.f safe point o.f = 1 … Detecting Conflicts last txn 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort response 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 o.f = 2 Strong Progress Guarantees f lock state request safe point o.f = 1 … … = o.f may abort Detecting Conflicts last txn waiting may abort Starvation and livelock freedom response 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 Strong Atomicity Semantics f lock state transactional access o.f = 2 request safe point o.f = 1 … … = o.f abort Detecting Conflicts last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o

transaction start retry transaction start txn id: Time T1 T2 Strong Atomicity Semantics f lock state transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Transactional Conflict response 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 Strong Atomicity Semantics f lock state safe point non-transactional access request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 Strong Atomicity Semantics f lock state non-transactional access retry request o.f = 2 safe point o.f = 1 … … = o.f Detecting Conflicts abort last txn waiting Transactional vs. Non-transactional Conflict response 1 42 … Int T2 object o

41 Time T1 T2 Strong Atomicity Semantics non-transactional access request o.f = 2 response T1 transaction end safe point … = o.f o.f = … Non-transactional accesses  short transactions no setting up/tearing down cost

42 Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end transaction start txn id: 51 safe point Detecting Conflicts last txn waiting response 1 42 … Int T2 object o

transaction start txn id: Time T1 T2 No Transactional Conflict f lock state acquire lock o.f = 2 request transaction end safe point Detecting Conflicts last txn waiting response 1 42 … WrEx T2 object o

transaction start txn id: Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end update add o.f undo log safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o

transaction start txn id: Time T1 T2 No Transactional Conflict f lock state o.f = 2 request transaction end o.f undo log Two versions of coordination protocol o.f = 2 safe point Detecting Conflicts last txn waiting response 2 51 … WrEx T2 object o

LarkTM-O 46 Adds very low overhead and scales well for low-contention cases

txn: Time T1 T2 High-Contention Applications … = o.f … o.f = … … … = o.f … o.f = … txn: 42 txn: 43 txn: 52 … = o.f … o.f = … … o.f = …

48 Time T1 T2 High-Contention Applications request response … o.f = … … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … request response safe point txn: 51 txn: 42 txn: 43 txn: 52 request

LarkTM-S 49 Handling High Contention

50 Time T1 T2 LarkTM-S: Hybrid with Traditional Locking … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 o causes high contention

51 Time T1 T2 … = o.f … o.f = … … … = o.f … o.f = … … = o.f … o.f = … … txn: 51 txn: 42 txn: 43 txn: 52 … o.f = 1 LarkTM-S: Hybrid with Traditional Locking

52 Comparison Of Concurrency Control 1 B. Saha et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. In PPoPP, T. Shpeisman et al. Enforcing Isolation and Ordering in STM. In PLDI, L. Dalessandro et al. NOrec: Streamlining STM by Abolishing Ownership Records. In PPoPP, Write concurrency controlRead concurrency control LarkTM-O Eager per-object biased reader–writer lock LarkTM-SIntelSTM–LarkTM-O hybrid IntelSTM 1,2 Eager per-object lockLazy version validation NOrec 3 Lazy global seqlockLazy value validation

53 Instrumented accesses LarkTM-OAll accesses LarkTM-SAll accesses IntelSTMAll accesses NOrecAll transactional accesses Comparison Of Instrumentation except redundant accesses

54 Progress Guarantee LarkTM-OLivelock and starvation free LarkTM-SLivelock and starvation free IntelSTMNone NOrecLivelock free Comparison Of Progress Guarantees

55 Semantics LarkTM-OStrong Atomicity LarkTM-SStrong Atomicity IntelSTMStrong Atomicity NOrecSingle Global Lock Atomicity (SLA) Comparison Of Semantics

LarkTM-O, LarkTM-S, IntelSTM (McRT), and NOrec Developed in Jikes RVM All STMs share features as much as possible (e.g., inlining decisions, redundant barrier analysis, name-mangling) Source code publicly available on the Jikes RVM Research Archive 56 Implementation

Evaluation Methodology TM programs STAMP benchmarks STM comparison Norec IntelSTM LarkTM-O LarkTM-S Platform Eight 8-core processors (AMD Opteron 6272) Four 8-core processors (Intel Xeon E5-4620) 57

Single-Thread Performance 58

Single-Thread Performance

Single-Thread Performance

Single-Thread Performance

Single-Thread Performance

Single-Thread Performance % 73%

64 Speedup Geomean

65 Speedup Geomean

66 Speedup Geomean

67 Speedup Geomean

68 Toward Practical STM Low instrumentation overhead

69 Toward Practical STM scales well Low instrumentation overhead

70 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees

71 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics

72 Toward Practical STM scales well Low instrumentation overhead Strong progress guarantees Strong semantics Thank you