Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software Transactional Memory Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott Mahlke Advanced Computer Architecture Lab. University of Michigan

2 Electrical Engineering and Computer Science Multicore Architectures Industry wide move to multicore –Higher throughput –More power efficient Great for parallel programs Sequential see little benefit 2 Intel 4 Core Nehalem AMD 4 Core ShanghaiSun Niagara 2IBM Cell

3 University of Michigan Electrical Engineering and Computer Science [Zhong ‘08] 3 Loop Parallelization i = 0-39 i = 20-39 i = 0-19 No cross-iteration register or memory dependences Core 1 Core 0 Parallelizable loop Bad news: limited number of parallel loops in general purpose applications

4 University of Michigan Electrical Engineering and Computer Science Loop Parallelization 4 SPECfp [Zhong ‘08]

5 University of Michigan Electrical Engineering and Computer Science 5 Speculative Loop Parallelization i = 0-39 Pointer? i = 10-19 Pointer? i = 30-39 Pointer? i = 0-9 Pointer? i = 20-29 Pointer? Core 1 Core 0 Loop Chunk Speculatively parallelizable loop Memory address is unresolvable statically

6 University of Michigan Electrical Engineering and Computer Science Speculative Loop Parallelization 6

7 University of Michigan Electrical Engineering and Computer Science Supporting Thread Level Speculation Execution of speculative loops requires –Conflict detection –Rollback mechanism Speculation can be supported by transactional memory –Software is slow –Hardware needs complex structures Previous TLS works require hardware –Hydra [Hammond ‘98], Stampede [Steffan ‘98], POSH [Liu ‘06] 7

8 University of Michigan Electrical Engineering and Computer Science Objectives Challenge –Can we get speedup supporting speculative loop parallelization without additional hardware? Build a specialized software system –Provide functionality needed for speculation with software transactional memory –Leverage existing loop parallelization framework from [Zhong ‘08] –Tightly couple STM with compiler to ensure low overhead 8

9 University of Michigan Electrical Engineering and Computer Science Traditional STM Execution Flow 9 Execute TX TX Commit Writeback WrSet to Memory Execution Transaction Start TXEnd TX WrSetRdSet Consistency Check AbortCommit High Overhead: Validating RdSet High Overhead: Global Locking

10 University of Michigan Electrical Engineering and Computer Science Ordering Transaction Commit TMs typically have no way of controlling commit order Loop iterations must commit in original order –Ensures proper rollback Requires centralized control to enforce ordering 10 TX 3 TX 1 Core 0 TX 4 TX 2 Core 1 i = 10-19 i = 30-39 i = 0-9 i = 20-29

11 University of Michigan Electrical Engineering and Computer Science STMlite Dedicated thread to control commits –Called the Transaction Commit Manager (TCM) –Performs consistency checks for all transactions –Provides point to easily enforce in-order commit Bloom-filter based signatures –Hash read and write sets –Similar technique used by HTMs like Bulk [Ceze ‘06] –Low-cost consistency checks during commit 11

12 University of Michigan Electrical Engineering and Computer Science Bloom-Filter Based Signatures Constant time insertion and find Linear time intersection (bitwise OR) 12 Decode Signature (Bit array) Address 101010 00000000 00000000 1 1 101100 1 1

13 University of Michigan Electrical Engineering and Computer Science STMlite Execution Flow 13 Execute TX TX Commit Writeback WrSet to Memory Execution Transaction Start TXEnd TX WrSetRdSet Consistency Check AbortCommit WrSigRdSig Transaction Commit Manager (TCM) Wait for Ready Flag Ready Ready Consistency Check Abort Commit

14 University of Michigan Electrical Engineering and Computer Science Experimental Setup Implemented framework in LLVM Compiler Benchmarks –Stanford STAMP transactional benchmarks –SPECfp benchmarks Run on Sunfire T2000 –8-core UltraSPARC T1 processor Baseline STM is Sun’s TL2 [Dice ‘06] 14

15 University of Michigan Electrical Engineering and Computer Science STAMP Benchmarks 15

16 University of Michigan Electrical Engineering and Computer Science SPECfp Benchmarks 16

17 University of Michigan Electrical Engineering and Computer Science Conclusion STMlite –Customized for speculative loop parallelization –Transaction commit ordering –Centralized consistency checks –Hashing read/write sets with signatures Parallelization of sequential applications is feasible on commodity hardware –Removes much of the slowdown traditionally associated with STM 17

18 University of Michigan Electrical Engineering and Computer Science Thank You! Questions? 18

19 University of Michigan Electrical Engineering and Computer Science Stale entries periodically removed from commit log Transaction Execution and Commit 19 Transaction Commit Manager (TCM) Transaction RdSig Commit Log WrSig End Start WrSig End ExecutingWaiting Ready Waiting Checking End Consistent? Commit Waiting Writeback


Download ppt "University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software."

Similar presentations


Ads by Google