Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.

Similar presentations


Presentation on theme: "Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by."— Presentation transcript:

1 Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by Samsung

2 2 Motivation (1/2)  Parallel programming is hard  What if there is a tool that helps parallel programming?  Already we have some tools like race detectors  However, not many tools on guiding parallel programming itself  A program wants to parallelize a serial code  Where to parallelize?  How to parallelize?

3 3 Motivation (2/2)  We propose Prospector  A set of dynamic program analyzers to help parallelization of serial code  Goals  Give information to find right parallelization targets  Provide advices on writing correct and optimized parallelized code

4 4 Overview of Prospector Func1(){ Loop1; Loop2; Func2(); } Func1(){ Loop1; Loop2; Func2(); } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } # of core Speedup248 CPUGPU Func1(){ Loop1; Loop2; Func2(); } Func2() { Loop3 } Func1(){ Loop1; Loop2; Func2(); } Func2() { Loop3 } Source code or Binary Input Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop1 Invocation: Iteration: Max Iter: Min Iter: 8 5,000 1,600 40

5 5 Prospector: Loop-Centric Profiler  Q: Which code section would good for parallelization?  Mostly frequently executed loops  Legacy profilers only report hot functions and instructions  We provide details of loop execution  # of trip count  Sufficient work?  # of invocation  Low fork/join overhead?  Stats of the length of loop iteration  Balanced?  Min, Max, Stdev Loop1 Invocation: Iteration: Max Iter: Min Iter: 8 5,000 1,600 40

6 6 Prospector: Parallel Speedup Predictor (1/2)  Q: What would be expected speedup?  Analytical models (e.g., Amdahl’s Law) are not practical to predict speedup in the presence of locks  Our approach  Dynamically predicting speedup based on light profiling  Challenges  How to model architecture factors (e.g., caches, memory)? # of core Speedup248

7 7 Prospector: Parallel Speedup Predictor (2/2)  Mechanisms  Programmers annotate the serial code  Describe the behaviors of parallel execution + locks  Fast and light profiling  Measure time between annotations  Emulation  Obtain estimated parallel execution time for speedup  Modeling architectural parameters  Sampling memory accesses  Using an analytical model for cache hit/miss prediction

8 8 Prospector: Parallelizable Section Finder (1/3)  Q: Is this code section parallelizable?  Data dependences determine the parallelizability  Compilers may not be good due to pointers and complex control flows  Our approach  Dynamic data-dependence profiling  Provides detailed dependence information for a given input  Challenges  Too much overhead; Smart algorithm is needed Func1(){ Loop1; Loop2; Func2(); } Func1(){ Loop1; Loop2; Func2(); } Parallelizable!Parallelizable!

9 9 Prospector: Parallelizable Section Finder (2/3)  Mechanisms  A dynamic profiler by using instrumentations  Instrumentation can be either binary and source level  At instrumentation time (or static time)  Analyzes control flow graphs and loop structures  At runtime  We observe memory addresses (no pointer-to analysis)  These memory addresses are stored and analyzed to discover data dependences

10 10 Prospector: Parallelizable Section Finder (3/3)  Mechanisms  Scalability  Current tools require too much memory and time to analyze data dependence  Prospector implements a new scalable algorithm for data dependence profiling  Key ideas  Using compression and parallelization (MICRO ‘10)

11 11 Prospector: Parallelism Pattern Advisor  Q: How can I transform the serial code?  If dependences are easily removable  I.e., Embarrassingly parallel loops with some reductions  Guide parallelization strategy directly  E.g., Use OpenMP pragma here  If severe dependences exist  Can we give advice on avoiding these dependences?  General solutions are extremely hard  Instead data-dependence pattern analysis  E.g., pipeline parallelism, a certain form of locking Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; }

12 12 Prospector: Parallel Architecture Advisor  Q: Which parallel hardware would be better?  Can we predict performances on different hardware?  E.g., Speedups on multicore and GPGPU  Challenges  Need to model more architectural factors SpeedupCPUGPU

13 13 Prospector: Parallel Performance Analyzer  Q: What is the reason of poor speedup?  There are a couple of profiler for this purpose  Analyzes the degree of concurrency  Profiles lock contentions (wait time)   Too low-level information to understand problems  Alternative  Macroscopic profiling of parallelized programs  An alternative form of visualizations Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; }

14 14 Related Work  State-of-the-art tools  Parallel Advisor from Intel Parallel Studio 2011  Speedup Predictor: cannot model architectures  Parallelizable Section Finder: scalability issues  vfAnalyst from VectorFabric  Parallelizable Section Finder: scalability issues

15 15 Current Status and Timeline  June 2010  Initial Prospector’s idea is presented in HotPar ‘10  Dec 2010  Scalable data-dependence profiling algorithm (for Parallelizable Section Finder and Pattern Advisor) will be presented in MICRO ’10  Beta version will be released as open source  Loop-centric profiler  Parallelizable Section Finder (i.e. Data-Dependence profiler)  Parallel speedup predictor  Mar 2010  Parallel Speedup Predictor will be released  Aug 2010  First Parallelism Pattern Advisor will be released

16 16 Conclusion  We need a new type of tool to help parallel programming  Prospector is a set of parallel programming advisor based on dynamic program analysis  Finds good parallelization target  Analyzes serial code to understand the behavior  Predicts speedup  Provides advice on code changes

17 17 Thank you!  Q&A  References  Overall tool architecture  Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "Prospector: Helping Parallel Programming by A Data-Dependence Profiler", 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10), June 2010.  Scalable data-dependence profiling  Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "SD3: A Scalable Approach To Dynamic Data-Dependence Profiling", Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2010.


Download ppt "Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by."

Similar presentations


Ads by Google