University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.

Slides:



Advertisements
Similar presentations
Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University.
Advertisements

Thread-Level Speculation as a Memory Consistency Protocol for Software DSM? Marcelo Cintra University of Edinburgh
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
University of Michigan Electrical Engineering and Computer Science 1 Application-Specific Processing on a General Purpose Core via Transparent Instruction.
Alias Speculation using Atomic Regions (To appear at ASPLOS 2013) Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Monitoring Data Structures Using Hardware Transactional Memory Shakeel Butt 1, Vinod Ganapathy 1, Arati Baliga 2 and Mihai Christodorescu 3 1 Rutgers University,
Hardware Transactional Memory for GPU Architectures Wilson W. L. Fung Inderpeet Singh Andrew Brownsword Tor M. Aamodt University of British Columbia In.
OSDI ’10 Research Visions 3 October Epoch parallelism: One execution is not enough Jessica Ouyang, Kaushik Veeraraghavan, Dongyoon Lee, Peter Chen,
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
University of Michigan Electrical Engineering and Computer Science Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Instruction Level Parallelism (ILP) Colin Stevens.
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
Rajiv Gupta Chen Tian, Min Feng, Vijay Nagarajan Speculative Parallelization of Applications on Multicores.
Multiscalar processors
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
A Transaction-Friendly Dynamic Memory Manager for Embedded Multicore Systems Maurice Herlihy Joint with Thomas Carle, Dimitra Papagiannopoulou Iris Bahar,
University of Michigan Electrical Engineering and Computer Science 1 Practical Lock/Unlock Pairing for Concurrent Programs Hyoun Kyu Cho 1, Yin Wang 2,
SAGE: Self-Tuning Approximation for Graphics Engines
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Thread-Level Speculation Karan Singh CS
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar Presented by: Ashay Rane Published in: SIGARCH Computer Architecture.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
Carnegie Mellon Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
1 The potential for Software-only thread- level speculation Depth Oral Presentation Co-Supervisors: Prof. Greg. Steffan Prof. Cristina Amza Committee Members.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
Dynamic Parallelization of JavaScript Applications Using an Ultra-lightweight Speculation Mechanism ECE 751, Fall 2015 Peng Liu 1.
University of Michigan Electrical Engineering and Computer Science Paragon: Collaborative Speculative Loop Execution on GPU and CPU Mehrzad Samadi 1 Amir.
Hyunchul Park†, Kevin Fan†, Scott Mahlke†,
CPU-GPU Collaboration for Output Quality Monitoring Mehrzad Samadi and Scott Mahlke University of Michigan March 2014 Compilers creating custom processors.
EECS 583 – Class 18 Research Topic 1 Breaking Dependences, Dynamic Parallelization University of Michigan November 14, 2012.
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
PHyTM: Persistent Hybrid Transactional Memory
Effective Data-Race Detection for the Kernel
Antonia Zhai, Christopher B. Colohan,
Flow Path Model of Superscalars
Superscalar Processors & VLIW Processors
Hardware Multithreading
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Lecture 6: Transactions
Efficient software checkpointing framework for speculative techniques
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
Fault Tolerant Systems in a Space Environment
Lecture 23: Transactional Memory
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software Transactional Memory Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan

Electrical Engineering and Computer Science Multicore Architectures Industry wide move to multicore –Higher throughput –More power efficient Great for parallel programs Sequential see little benefit 2 Intel 4 Core Nehalem AMD 4 Core ShanghaiSun Niagara 2IBM Cell

University of Michigan Electrical Engineering and Computer Science [Zhong ‘08] 3 Loop Parallelization i = 0-39 i = i = 0-19 No cross-iteration register or memory dependences Core 1 Core 0 Parallelizable loop Bad news: limited number of parallel loops in general purpose applications

University of Michigan Electrical Engineering and Computer Science Loop Parallelization 4 SPECfp [Zhong ‘08]

University of Michigan Electrical Engineering and Computer Science 5 Speculative Loop Parallelization i = 0-39 Pointer? i = Pointer? i = Pointer? i = 0-9 Pointer? i = Pointer? Core 1 Core 0 Loop Chunk Speculatively parallelizable loop Memory address is unresolvable statically

University of Michigan Electrical Engineering and Computer Science Speculative Loop Parallelization 6

University of Michigan Electrical Engineering and Computer Science Supporting Thread Level Speculation Execution of speculative loops requires –Conflict detection –Rollback mechanism Speculation can be supported by transactional memory –Software is slow –Hardware needs complex structures Previous TLS works require hardware –Hydra [Hammond ‘98], Stampede [Steffan ‘98], POSH [Liu ‘06] 7

University of Michigan Electrical Engineering and Computer Science Objectives Challenge –Can we get speedup supporting speculative loop parallelization without additional hardware? Build a specialized software system –Provide functionality needed for speculation with software transactional memory –Leverage existing loop parallelization framework from [Zhong ‘08] –Tightly couple STM with compiler to ensure low overhead 8

University of Michigan Electrical Engineering and Computer Science Traditional STM Execution Flow 9 Execute TX TX Commit Writeback WrSet to Memory Execution Transaction Start TXEnd TX WrSetRdSet Consistency Check AbortCommit High Overhead: Validating RdSet High Overhead: Global Locking

University of Michigan Electrical Engineering and Computer Science Ordering Transaction Commit TMs typically have no way of controlling commit order Loop iterations must commit in original order –Ensures proper rollback Requires centralized control to enforce ordering 10 TX 3 TX 1 Core 0 TX 4 TX 2 Core 1 i = i = i = 0-9 i = 20-29

University of Michigan Electrical Engineering and Computer Science STMlite Dedicated thread to control commits –Called the Transaction Commit Manager (TCM) –Performs consistency checks for all transactions –Provides point to easily enforce in-order commit Bloom-filter based signatures –Hash read and write sets –Similar technique used by HTMs like Bulk [Ceze ‘06] –Low-cost consistency checks during commit 11

University of Michigan Electrical Engineering and Computer Science Bloom-Filter Based Signatures Constant time insertion and find Linear time intersection (bitwise OR) 12 Decode Signature (Bit array) Address

University of Michigan Electrical Engineering and Computer Science STMlite Execution Flow 13 Execute TX TX Commit Writeback WrSet to Memory Execution Transaction Start TXEnd TX WrSetRdSet Consistency Check AbortCommit WrSigRdSig Transaction Commit Manager (TCM) Wait for Ready Flag Ready Ready Consistency Check Abort Commit

University of Michigan Electrical Engineering and Computer Science Experimental Setup Implemented framework in LLVM Compiler Benchmarks –Stanford STAMP transactional benchmarks –SPECfp benchmarks Run on Sunfire T2000 –8-core UltraSPARC T1 processor Baseline STM is Sun’s TL2 [Dice ‘06] 14

University of Michigan Electrical Engineering and Computer Science STAMP Benchmarks 15

University of Michigan Electrical Engineering and Computer Science SPECfp Benchmarks 16

University of Michigan Electrical Engineering and Computer Science Conclusion STMlite –Customized for speculative loop parallelization –Transaction commit ordering –Centralized consistency checks –Hashing read/write sets with signatures Parallelization of sequential applications is feasible on commodity hardware –Removes much of the slowdown traditionally associated with STM 17

University of Michigan Electrical Engineering and Computer Science Thank You! Questions? 18

University of Michigan Electrical Engineering and Computer Science Stale entries periodically removed from commit log Transaction Execution and Commit 19 Transaction Commit Manager (TCM) Transaction RdSig Commit Log WrSig End Start WrSig End ExecutingWaiting Ready Waiting Checking End Consistent? Commit Waiting Writeback