Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:

Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by: Chris Colohan from CMU Greg Steffan

Dynamic Region Selection for Thread Level Speculation University of Toronto 2 Multithreading on a Chip is here TODAY! Supercomputers Threads of Execution Desktops Chip Multiprocessor Cache Proc (IBM Power4/5, SUN MAJC, Ultrasparc 4)  but what can we do with them? Simultaneous- Multithreading (ALPHA 21464, Intel Xeon, Pentium IV) Cache Proc

Dynamic Region Selection for Thread Level Speculation University of Toronto 3 C C P C P C P C P C C P With a bunch of independent applications: Execution Time  improves throughput (total work per second) Processor Caches Applications Improving Performance with a Chip Multiprocessor

Dynamic Region Selection for Thread Level Speculation University of Toronto 4 C C P C P C P C P C C P With a single application:  need parallel threads to reduce execution time C C P C P C P C P  Exec. Time Improving Performance with a Chip Multiprocessor

Dynamic Region Selection for Thread Level Speculation University of Toronto 5 Thread-Level Speculation: the Basic Idea   exploit available thread-level parallelism Exec. Time TLS …  *q *p  … Recover …  *q  violation

Dynamic Region Selection for Thread Level Speculation University of Toronto 6 Support for TLS: What Do We Need?  Break programs into speculative threads »to maximize thread-level parallelism  Track data dependences »to determine whether speculation was safe  Recover from failed speculation »to ensure correct execution    three key elements of every TLS system

Dynamic Region Selection for Thread Level Speculation University of Toronto 7  Lots of research has been done on TLS hardware »Tracking data dependence »Recover from violation  We focus on how to select regions to run in parallel »A region is any segment of code that you want to speculatively parallelize »For this work, region == loop, iterations == speculative threads Support for TLS: What Do We Need?

Dynamic Region Selection for Thread Level Speculation University of Toronto 8 Why is static region selection hard?  Extensive profiling information  Regions can be nested for ( i = 1 to N ) { <= 2x faster in parallel …. for ( j = 1 to N ) { <= 3x faster in parallel …. for ( k = 1 to N ) { <= 4x faster in parallel …. } Which loop should we parallelize? }  Dynamic behaviour  Dynamic Region Selection is a potential solution

Dynamic Region Selection for Thread Level Speculation University of Toronto 9  Compiler transforms all candidate regions into parallel and sequential versions  Through dynamic profiling, we decide which regions are to be run in parallel  Key Questions: »Is there any dynamic behaviour between region instances? »What is a good algorithm for selecting regions? »Are there performance trade-offs for doing dynamic profiling? »Is there any dynamic behaviour within region instances? (not the focus of this research) Dynamic Region Selection

Dynamic Region Selection for Thread Level Speculation University of Toronto 10 Outline  The role of the TLS compiler  Characterizing dynamic behaviour  Dynamic Region Selection (DRS) algorithms  Results  Conclusions  Open questions and future work

Dynamic Region Selection for Thread Level Speculation University of Toronto 11 Current Compilation for TLS LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH SequentialParallel LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH

Dynamic Region Selection for Thread Level Speculation University of Toronto 12 DRS Compilation LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH LoopA LoopB EndB LoopC LoopD EndD EndC EndA LoopE LoopF EndF EndE LoopG LoopH EndH

Dynamic Region Selection for Thread Level Speculation University of Toronto 13 DRS Compilation 1 Extract candidate region E

Dynamic Region Selection for Thread Level Speculation University of Toronto 14 DRS Compilation 2 Create sequential and parallel versions of the region (Clone) E E 1 Extract candidate region

Dynamic Region Selection for Thread Level Speculation University of Toronto 15 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) E E

Dynamic Region Selection for Thread Level Speculation University of Toronto 16 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) 4 Introduce a DRS algorithm to make the decision at runtime E E DRS Algorithm

Dynamic Region Selection for Thread Level Speculation University of Toronto 17 DRS Compilation 3 Add some extra overhead to monitor the region’s performance 1 Extract candidate region 2 Create sequential and parallel versions of the region (Clone) 4 Introduce a DRS algorithm to make the decision at runtime E E DRS Algorithm  DRS Compilation by Colohan

Dynamic Region Selection for Thread Level Speculation University of Toronto 18 Characterizing TLS Region Behaviour

Dynamic Region Selection for Thread Level Speculation University of Toronto 19 Characterizing TLS Region Behaviour

Dynamic Region Selection for Thread Level Speculation University of Toronto 20 DRS Algorithms 1)Sample Twice 2)Continuous Monitoring 3)Continuous Resample 4)Path Sensitive Sampling

Dynamic Region Selection for Thread Level Speculation University of Toronto 21 Sample Twice Algorithm  Effective if behaviour is constant.  When a region is encountered: »1 st Time: Run sequential version and record execution time t 1 »2 nd Time: Run parallel version (if possible) and record execution time t p »Subsequent instances:  if t p < t 1 then run parallel version  else run sequential version  Note that by using execution time as a metric, it is assumed that the amount of work done from instance to instance remains relatively constant. Using throughput (IPC) as a metric eliminates the need for this assumption but adds additional complexity.

Dynamic Region Selection for Thread Level Speculation University of Toronto 22 Sample Twice Example Sample Sequential? Sample Parallel? Decided

Dynamic Region Selection for Thread Level Speculation University of Toronto 23 Continuous Monitoring  Extension to sample twice method. Continuously monitor all regions and reevaluate your decision if speedup changes. »Not doing much more besides monitoring continuously -> the overhead is free.  When a region is encountered: »1 st Time: Run sequential version and record execution time t 1 »2 nd Time: Run parallel version (if possible) and record execution time t p »Subsequent instances:  if t p < t 1 then run parallel version and update t p  else run sequential version and update t 1  Effective if behaviour is continuously degrading.

Dynamic Region Selection for Thread Level Speculation University of Toronto 24 Continuous Monitoring Example Sample Sequential? Sample Parallel? Decided t 1 = NA t p = NA t 1 = 5 t p = NA t 1 = 5 t p = 3 t 1 = 5 t p = 4 t 1 = 5 t p = 6 t 1 = 4 t p = 6

Dynamic Region Selection for Thread Level Speculation University of Toronto 25 Continuous Resample  Effective if behaviour is continuously changing.  Continuously resample by flushing values t 1 and t p periodically.  Adds new overhead.  This algorithm has not yet been explored.

Dynamic Region Selection for Thread Level Speculation University of Toronto 26 Path Sensitive Sampling  If the behaviour is periodic, a means of filtering is required.  One intuitive solution is to sample when the invocation path or region nesting path changes.

Dynamic Region Selection for Thread Level Speculation University of Toronto 27 Path Sensitive Sampling  Sample when region nesting path changes »Makes the assumption that state stays the same if the invocation path does not change void foo() { while(cond) moo(); } void bar() { while(cond) moo(); } void moo() { while(cond) moo(); } foo_whilebar_while moo_while

Dynamic Region Selection for Thread Level Speculation University of Toronto 28 Results – Static analysis Average number of per-path instances for all regions

Dynamic Region Selection for Thread Level Speculation University of Toronto 29 Interesting Region in IJPEG Number of speculative threads per region instance Program execution 

Dynamic Region Selection for Thread Level Speculation University of Toronto 30 Interesting Region in Perl Program execution  Number of instructions per region instance

Dynamic Region Selection for Thread Level Speculation University of Toronto 31 Experimental Framework  SPEC benchmarks  TLS compiler  MIPS architecture  TLS profiler and simulator

Dynamic Region Selection for Thread Level Speculation University of Toronto 33  Is there any dynamic behavior between region instances?

Dynamic Region Selection for Thread Level Speculation University of Toronto 34 Results – Dynamic behavior  Regions with high coverage have low instruction variance between instances

Dynamic Region Selection for Thread Level Speculation University of Toronto 35 Results – Dynamic behavior  Regions with high coverage have low violation variance between instances

Dynamic Region Selection for Thread Level Speculation University of Toronto 36 Results – Dynamic behavior  Regions with high coverage have low speculative thread count variance between instances

Dynamic Region Selection for Thread Level Speculation University of Toronto 37  What is a good algorithm for selecting regions?

Dynamic Region Selection for Thread Level Speculation University of Toronto 38 static optimal faster slower  Continuous monitoring 1% better on average than sample twice  About 10% worse than static ‘optimal’ selection

Dynamic Region Selection for Thread Level Speculation University of Toronto 39  How often did we agree with the ‘optimal’ selection?

Dynamic Region Selection for Thread Level Speculation University of Toronto 40 static optimal  Sample twice agrees 57% of the time, on average  Continuous monitoring agrees 43% of the time, on average  Levels of agreement are close  no dynamic behavior?

Dynamic Region Selection for Thread Level Speculation University of Toronto 41  Agreeing with static ‘optimal’ gives better performance?  Another sign of no dynamic behaviour?

Dynamic Region Selection for Thread Level Speculation University of Toronto 42  Sample twice often leaves regions undecided  Overall, undecided regions represent low coverage

Dynamic Region Selection for Thread Level Speculation University of Toronto 44 Conclusions  This is an unexplored research topic (as far as we know)  Is there any dynamic behavior between region instances?  We have good indications that there isn’t tons of it  What is the best algorithm for selecting regions?  Continuous sampling does 1% better than sample twice  Within 10% of the static ‘optimal’ without any sampling done!  Any performance trade-offs for doing dynamic profiling?  The code size is increased by at most 30%  The runtime performance overhead is believed to be negligible  Is there any dynamic behavior within a region instance?  We don’t know yet

Dynamic Region Selection for Thread Level Speculation University of Toronto 45 Open Questions  The dynamic optimal is the theoretical optimal  How close are we from the dynamic optimal?  How close is the static ‘optimal’ to the dynamic optimal?  How do the other proposed algorithms perform?  What should be implemented in hardware/software?

Dynamic Region Selection for Thread Level Speculation University of Toronto 46 Questions?

Dynamic Region Selection for Thread Level Speculation University of Toronto 47 AUXILIARY SLIDES

Dynamic Region Selection for Thread Level Speculation University of Toronto 48 Results – Potential Study Execution time versus invocation (IJPEG)

Dynamic Region Selection for Thread Level Speculation University of Toronto 49 Results – Potential Study Execution time versus invocation (CRAFTY)

Dynamic Region Selection for Thread Level Speculation University of Toronto 50 Results – Potential Study Execution time versus invocation (LI)

Dynamic Region Selection for Thread Level Speculation University of Toronto 51 Results – Potential Study Execution time versus invocation (PERL)

Dynamic Region Selection for Thread Level Speculation University of Toronto 52 Results – Static analysis

Dynamic Region Selection for Thread Level Speculation University of Toronto 53 Results – Dynamic behavior

Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:

Similar presentations

Presentation on theme: "Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:

Similar presentations

Presentation on theme: "Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:"— Presentation transcript:

Similar presentations

About project

Feedback