Download presentation
Presentation is loading. Please wait.
Published byChristine Barton Modified over 8 years ago
1
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by: Chris Colohan from CMU Greg Steffan
2
Dynamic Region Selection for Thread Level Speculation University of Toronto 2 Multithreading on a Chip is here TODAY! Supercomputers Threads of Execution Desktops Chip Multiprocessor Cache Proc (IBM Power4/5, SUN MAJC, Ultrasparc 4) but what can we do with them? Simultaneous- Multithreading (ALPHA 21464, Intel Xeon, Pentium IV) Cache Proc
3
Dynamic Region Selection for Thread Level Speculation University of Toronto 3 C C P C P C P C P C C P With a bunch of independent applications: Execution Time improves throughput (total work per second) Processor Caches Applications Improving Performance with a Chip Multiprocessor
4
Dynamic Region Selection for Thread Level Speculation University of Toronto 4 C C P C P C P C P C C P With a single application: need parallel threads to reduce execution time C C P C P C P C P Exec. Time Improving Performance with a Chip Multiprocessor
5
Dynamic Region Selection for Thread Level Speculation University of Toronto 5 Thread-Level Speculation: the Basic Idea exploit available thread-level parallelism Exec. Time TLS … *q *p … Recover … *q violation
6
Dynamic Region Selection for Thread Level Speculation University of Toronto 6 Support for TLS: What Do We Need? Break programs into speculative threads »to maximize thread-level parallelism Track data dependences »to determine whether speculation was safe Recover from failed speculation »to ensure correct execution three key elements of every TLS system
7
Dynamic Region Selection for Thread Level Speculation University of Toronto 7 Lots of research has been done on TLS hardware »Tracking data dependence »Recover from violation We focus on how to select regions to run in parallel »A region is any segment of code that you want to speculatively parallelize »For this work, region == loop, iterations == speculative threads Support for TLS: What Do We Need?
8
Dynamic Region Selection for Thread Level Speculation University of Toronto 8 Why is static region selection hard? Extensive profiling information Regions can be nested for ( i = 1 to N ) { <= 2x faster in parallel …. for ( j = 1 to N ) { <= 3x faster in parallel …. for ( k = 1 to N ) { <= 4x faster in parallel …. } Which loop should we parallelize? } Dynamic behaviour Dynamic Region Selection is a potential solution
9
Dynamic Region Selection for Thread Level Speculation University of Toronto 9 Compiler transforms all candidate regions into parallel and sequential versions Through dynamic profiling, we decide which regions are to be run in parallel Key Questions: »Is there any dynamic behaviour between region instances? »What is a good algorithm for selecting regions? »Are there performance trade-offs for doing dynamic profiling? »Is there any dynamic behaviour within region instances? (not the focus of this research) Dynamic Region Selection
10
Dynamic Region Selection for Thread Level Speculation University of Toronto 10 Outline The role of the TLS compiler Characterizing dynamic behaviour Dynamic Region Selection (DRS) algorithms Results Conclusions Open questions and future work
11
Dynamic Region Selection for Thread Level Speculation University of Toronto 11 Current Compilation for TLS LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH SequentialParallel LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH
12
Dynamic Region Selection for Thread Level Speculation University of Toronto 12 DRS Compilation LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH
13
Dynamic Region Selection for Thread Level Speculation University of Toronto 13 DRS Compilation 1 Extract candidate region E
14
Dynamic Region Selection for Thread Level Speculation University of Toronto 14 DRS Compilation 2 Create sequential and parallel versions of the region (Clone) E E 1 Extract candidate region
15
Dynamic Region Selection for Thread Level Speculation University of Toronto 15 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) E E
16
Dynamic Region Selection for Thread Level Speculation University of Toronto 16 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) 4 Introduce a DRS algorithm to make the decision at runtime E E DRS Algorithm
17
Dynamic Region Selection for Thread Level Speculation University of Toronto 17 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) 4 Introduce a DRS algorithm to make the decision at runtime E E DRS Algorithm DRS Compilation by Colohan
18
Dynamic Region Selection for Thread Level Speculation University of Toronto 18 Characterizing TLS Region Behaviour
19
Dynamic Region Selection for Thread Level Speculation University of Toronto 19 Characterizing TLS Region Behaviour
20
Dynamic Region Selection for Thread Level Speculation University of Toronto 20 DRS Algorithms 1)Sample Twice 2)Continuous Monitoring 3)Continuous Resample 4)Path Sensitive Sampling
21
Dynamic Region Selection for Thread Level Speculation University of Toronto 21 Sample Twice Algorithm Effective if behaviour is constant. When a region is encountered: »1 st Time: Run sequential version and record execution time t 1 »2 nd Time: Run parallel version (if possible) and record execution time t p »Subsequent instances: if t p < t 1 then run parallel version else run sequential version Note that by using execution time as a metric, it is assumed that the amount of work done from instance to instance remains relatively constant. Using throughput (IPC) as a metric eliminates the need for this assumption but adds additional complexity.
22
Dynamic Region Selection for Thread Level Speculation University of Toronto 22 Sample Twice Example Sample Sequential? Sample Parallel? Decided
23
Dynamic Region Selection for Thread Level Speculation University of Toronto 23 Continuous Monitoring Extension to sample twice method. Continuously monitor all regions and reevaluate your decision if speedup changes. »Not doing much more besides monitoring continuously -> the overhead is free. When a region is encountered: »1 st Time: Run sequential version and record execution time t 1 »2 nd Time: Run parallel version (if possible) and record execution time t p »Subsequent instances: if t p < t 1 then run parallel version and update t p else run sequential version and update t 1 Effective if behaviour is continuously degrading.
24
Dynamic Region Selection for Thread Level Speculation University of Toronto 24 Continuous Monitoring Example Sample Sequential? Sample Parallel? Decided t 1 = NA t p = NA t 1 = 5 t p = NA t 1 = 5 t p = 3 t 1 = 5 t p = 4 t 1 = 5 t p = 6 t 1 = 4 t p = 6
25
Dynamic Region Selection for Thread Level Speculation University of Toronto 25 Continuous Resample Effective if behaviour is continuously changing. Continuously resample by flushing values t 1 and t p periodically. Adds new overhead. This algorithm has not yet been explored.
26
Dynamic Region Selection for Thread Level Speculation University of Toronto 26 Path Sensitive Sampling If the behaviour is periodic, a means of filtering is required. One intuitive solution is to sample when the invocation path or region nesting path changes.
27
Dynamic Region Selection for Thread Level Speculation University of Toronto 27 Path Sensitive Sampling Sample when region nesting path changes »Makes the assumption that state stays the same if the invocation path does not change void foo() { while(cond) moo(); } void bar() { while(cond) moo(); } void moo() { while(cond) moo(); } foo_whilebar_while moo_while
28
Dynamic Region Selection for Thread Level Speculation University of Toronto 28 Results – Static analysis Average number of per-path instances for all regions
29
Dynamic Region Selection for Thread Level Speculation University of Toronto 29 Interesting Region in IJPEG Number of speculative threads per region instance Program execution
30
Dynamic Region Selection for Thread Level Speculation University of Toronto 30 Interesting Region in Perl Program execution Number of instructions per region instance
31
Dynamic Region Selection for Thread Level Speculation University of Toronto 31 Experimental Framework SPEC benchmarks TLS compiler MIPS architecture TLS profiler and simulator
32
Dynamic Region Selection for Thread Level Speculation University of Toronto 32 Outline The role of the TLS compiler Characterizing dynamic behaviour Dynamic Region Selection (DRS) algorithms Results Conclusions Open questions and future work
33
Dynamic Region Selection for Thread Level Speculation University of Toronto 33 Is there any dynamic behavior between region instances?
34
Dynamic Region Selection for Thread Level Speculation University of Toronto 34 Results – Dynamic behavior Regions with high coverage have low instruction variance between instances
35
Dynamic Region Selection for Thread Level Speculation University of Toronto 35 Results – Dynamic behavior Regions with high coverage have low violation variance between instances
36
Dynamic Region Selection for Thread Level Speculation University of Toronto 36 Results – Dynamic behavior Regions with high coverage have low speculative thread count variance between instances
37
Dynamic Region Selection for Thread Level Speculation University of Toronto 37 What is a good algorithm for selecting regions?
38
Dynamic Region Selection for Thread Level Speculation University of Toronto 38 static optimal faster slower Continuous monitoring 1% better on average than sample twice About 10% worse than static ‘optimal’ selection
39
Dynamic Region Selection for Thread Level Speculation University of Toronto 39 How often did we agree with the ‘optimal’ selection?
40
Dynamic Region Selection for Thread Level Speculation University of Toronto 40 static optimal Sample twice agrees 57% of the time, on average Continuous monitoring agrees 43% of the time, on average Levels of agreement are close no dynamic behavior?
41
Dynamic Region Selection for Thread Level Speculation University of Toronto 41 Agreeing with static ‘optimal’ gives better performance? Another sign of no dynamic behaviour?
42
Dynamic Region Selection for Thread Level Speculation University of Toronto 42 Sample twice often leaves regions undecided Overall, undecided regions represent low coverage
43
Dynamic Region Selection for Thread Level Speculation University of Toronto 43 Outline The role of the TLS compiler Characterizing dynamic behaviour Dynamic Region Selection (DRS) algorithms Results Conclusions Open questions and future work
44
Dynamic Region Selection for Thread Level Speculation University of Toronto 44 Conclusions This is an unexplored research topic (as far as we know) Is there any dynamic behavior between region instances? We have good indications that there isn’t tons of it What is the best algorithm for selecting regions? Continuous sampling does 1% better than sample twice Within 10% of the static ‘optimal’ without any sampling done! Any performance trade-offs for doing dynamic profiling? The code size is increased by at most 30% The runtime performance overhead is believed to be negligible Is there any dynamic behavior within a region instance? We don’t know yet
45
Dynamic Region Selection for Thread Level Speculation University of Toronto 45 Open Questions The dynamic optimal is the theoretical optimal How close are we from the dynamic optimal? How close is the static ‘optimal’ to the dynamic optimal? How do the other proposed algorithms perform? What should be implemented in hardware/software?
46
Dynamic Region Selection for Thread Level Speculation University of Toronto 46 Questions?
47
Dynamic Region Selection for Thread Level Speculation University of Toronto 47 AUXILIARY SLIDES
48
Dynamic Region Selection for Thread Level Speculation University of Toronto 48 Results – Potential Study Execution time versus invocation (IJPEG)
49
Dynamic Region Selection for Thread Level Speculation University of Toronto 49 Results – Potential Study Execution time versus invocation (CRAFTY)
50
Dynamic Region Selection for Thread Level Speculation University of Toronto 50 Results – Potential Study Execution time versus invocation (LI)
51
Dynamic Region Selection for Thread Level Speculation University of Toronto 51 Results – Potential Study Execution time versus invocation (PERL)
52
Dynamic Region Selection for Thread Level Speculation University of Toronto 52 Results – Static analysis
53
Dynamic Region Selection for Thread Level Speculation University of Toronto 53 Results – Dynamic behavior
54
Dynamic Region Selection for Thread Level Speculation University of Toronto 54 Results – Dynamic behavior
55
Dynamic Region Selection for Thread Level Speculation University of Toronto 55 Results – Dynamic behavior
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.