Download presentation
Presentation is loading. Please wait.
Published byDouglas Potter Modified over 9 years ago
1
Are New Languages Necessary for Manycore? David I. August Department of Computer Science Princeton University
2
David I. August THIS is the Problem! SPEC CPU INTEGER PERFORMANCE TIME ? 2004
3
David I. August Why New Multicore Languages Will Fail Money is earned by relieving customer pain The Market Legacy, Legacy, Legacy Programmers adopt new programming models Parallel programming is more difficult Parallel programming models have longevity issues Automatic Thread Extraction (ATE)
4
David I. August Automatic Thread Extraction “That isn't to say we are parallelizing arbitrary C code, that's a fool's errand!” – Richard Lethin “Compiler can’t determine a tree from a graph…” – Burton Smith “Compiler can’t determine dependences without type information. Even then…” – Burton Smith “Decades of automatic parallelization work has been a failure…” – James Larus “All that icky pointer chasing code...” – Tim Mattson
5
David I. August How To Get Parallelism For Multicore? Nine months ago, with an open mind… A priori select ALL C programs from SPEC CINT 2000 Our objective function (in priority order): 1.Extract meaningful parallelism 2.Prefer automatic over manual 3.Minimize impact to the programmer when manual
6
David I. August Our Results BenchmarkThreads at PeakSpeedupLOCs Changed 164.gzip32+29.9126 175.vpr153.591 176.gcc165.0617 181.mcf32+2.840 186.crafty32+25.189 197.parser32+24.502 253.perlbmk51.210 254.gap101.941 255.vortex32+4.920 256.bzip2126.720 300.twolf82.061 GEOMEAN175.54 ARITHMEAN209.81 M.L.O.P.: 5 Generations 32 Cores 5.3x Speedup
7
David I. August Our Recipe Recent Compiler Technology: Decoupled Software Pipelining (DSWP) [MICRO 05] Parallel-Stage DSWP (PS-DSWP) Speculative DSWP (Spec-DSWP) [PACT 07] Existing Technology: Speculative DOALL, TLS Targeted Memory Profiling Procedure Boundary Elimination [PLDI 06] Hardware Support: Compiler-Controlled Speculation Streaming Communication [MICRO 06]
8
David I. August Typical Example: 197.parser Threads run on multicore model with Itanium 2 cores. Find English Sentences Parse Sentences (95%) Emit Results DSWP PS-DSWP (Spec DOALL Middle Stage)
9
David I. August What We Learned 1.A new way of thinking about dependences: Go With the Flow 1.TLP is easier to extract than ILP 1.A holistic approach is better 1.A limitation exists in the sequential model: Determinism
10
David I. August Determinism: A Double Edged Sword while( ): x = Rand() int Rand(): state = f2(state) return f1(state) 1 1 234 2 34 DOALL SEQUENTIAL 56 LOCs in 11 programs: 22 annotations Only 2 programs needed more Most common culprit: Custom Allocators
11
David I. August What about Manycore? Multicore New languages aren’t necessary Legacy code easily adjusted Manycore Implicitly Parallel Sequential Programming No optimization for sequential (custom allocators) Points of non-determinism specified Parallel algorithms in sequential codes Debuggability, Understandability, Sanity
12
David I. August The Answer Originates with ATE The Old Way: PL folks would write languages, Architecture folks would make HW, and Compiler folks would dutifully connect the two. This will fail for Manycore: Unduly burden the programmer Performance will suffer There’s a New Way…
13
David I. August DO NOT POST ANYTHING AFTER THIS SLIDE
14
David I. August How Code Was Transformed BenchmarkLOC (All) LOC (Model) Model Techniques Compiler Techniques Applied 164.gzip262Y-BranchTLS Memory, DSWP 175.vpr11PUREAlias, Value, & Control Spec, TLS Mem, DSWP 176.gcc177PUREAlias & Control Spec, TLS MEM, DSWP 181.mcf00Alias, Silent Store, & Control Spec, TLS Mem, DSWP, Nested 186.crafty99PURETLS Mem, DSWP, Nested 197.parser22PURETLS Mem, DSWP 253.perlbmk00Alias, Control, & Value Spec, DSWP 254.gap11PURETLS Memory, DSWP, Alias Spec 255.vortex00Alias & Value Spec, TLS Mem, DSWP 256.bzip200TLS Memory, DSWP 300.twolf11PUREAlias & Control Spec, TLS Mem, DSWP
15
David I. August PURE
16
David I. August Y-Branch
17
David I. August SPEC 2006: 403.gcc Threads run on multicore model with Itanium 2 cores.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.