Download presentation
Presentation is loading. Please wait.
Published byLaureen Whitehead Modified over 9 years ago
1
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by Samsung
2
2 Motivation (1/2) Parallel programming is hard What if there is a tool that helps parallel programming? Already we have some tools like race detectors However, not many tools on guiding parallel programming itself A program wants to parallelize a serial code Where to parallelize? How to parallelize?
3
3 Motivation (2/2) We propose Prospector A set of dynamic program analyzers to help parallelization of serial code Goals Give information to find right parallelization targets Provide advices on writing correct and optimized parallelized code
4
4 Overview of Prospector Func1(){ Loop1; Loop2; Func2(); } Func1(){ Loop1; Loop2; Func2(); } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } # of core Speedup248 CPUGPU Func1(){ Loop1; Loop2; Func2(); } Func2() { Loop3 } Func1(){ Loop1; Loop2; Func2(); } Func2() { Loop3 } Source code or Binary Input Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop1 Invocation: Iteration: Max Iter: Min Iter: 8 5,000 1,600 40
5
5 Prospector: Loop-Centric Profiler Q: Which code section would good for parallelization? Mostly frequently executed loops Legacy profilers only report hot functions and instructions We provide details of loop execution # of trip count Sufficient work? # of invocation Low fork/join overhead? Stats of the length of loop iteration Balanced? Min, Max, Stdev Loop1 Invocation: Iteration: Max Iter: Min Iter: 8 5,000 1,600 40
6
6 Prospector: Parallel Speedup Predictor (1/2) Q: What would be expected speedup? Analytical models (e.g., Amdahl’s Law) are not practical to predict speedup in the presence of locks Our approach Dynamically predicting speedup based on light profiling Challenges How to model architecture factors (e.g., caches, memory)? # of core Speedup248
7
7 Prospector: Parallel Speedup Predictor (2/2) Mechanisms Programmers annotate the serial code Describe the behaviors of parallel execution + locks Fast and light profiling Measure time between annotations Emulation Obtain estimated parallel execution time for speedup Modeling architectural parameters Sampling memory accesses Using an analytical model for cache hit/miss prediction
8
8 Prospector: Parallelizable Section Finder (1/3) Q: Is this code section parallelizable? Data dependences determine the parallelizability Compilers may not be good due to pointers and complex control flows Our approach Dynamic data-dependence profiling Provides detailed dependence information for a given input Challenges Too much overhead; Smart algorithm is needed Func1(){ Loop1; Loop2; Func2(); } Func1(){ Loop1; Loop2; Func2(); } Parallelizable!Parallelizable!
9
9 Prospector: Parallelizable Section Finder (2/3) Mechanisms A dynamic profiler by using instrumentations Instrumentation can be either binary and source level At instrumentation time (or static time) Analyzes control flow graphs and loop structures At runtime We observe memory addresses (no pointer-to analysis) These memory addresses are stored and analyzed to discover data dependences
10
10 Prospector: Parallelizable Section Finder (3/3) Mechanisms Scalability Current tools require too much memory and time to analyze data dependence Prospector implements a new scalable algorithm for data dependence profiling Key ideas Using compression and parallelization (MICRO ‘10)
11
11 Prospector: Parallelism Pattern Advisor Q: How can I transform the serial code? If dependences are easily removable I.e., Embarrassingly parallel loops with some reductions Guide parallelization strategy directly E.g., Use OpenMP pragma here If severe dependences exist Can we give advice on avoiding these dependences? General solutions are extremely hard Instead data-dependence pattern analysis E.g., pipeline parallelism, a certain form of locking Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; }
12
12 Prospector: Parallel Architecture Advisor Q: Which parallel hardware would be better? Can we predict performances on different hardware? E.g., Speedups on multicore and GPGPU Challenges Need to model more architectural factors SpeedupCPUGPU
13
13 Prospector: Parallel Performance Analyzer Q: What is the reason of poor speedup? There are a couple of profiler for this purpose Analyzes the degree of concurrency Profiles lock contentions (wait time) Too low-level information to understand problems Alternative Macroscopic profiling of parallelized programs An alternative form of visualizations Loop3 { Statements; Lock(); Statements; Unlock(); Statements; } Loop3 { Statements; Lock(); Statements; Unlock(); Statements; }
14
14 Related Work State-of-the-art tools Parallel Advisor from Intel Parallel Studio 2011 Speedup Predictor: cannot model architectures Parallelizable Section Finder: scalability issues vfAnalyst from VectorFabric Parallelizable Section Finder: scalability issues
15
15 Current Status and Timeline June 2010 Initial Prospector’s idea is presented in HotPar ‘10 Dec 2010 Scalable data-dependence profiling algorithm (for Parallelizable Section Finder and Pattern Advisor) will be presented in MICRO ’10 Beta version will be released as open source Loop-centric profiler Parallelizable Section Finder (i.e. Data-Dependence profiler) Parallel speedup predictor Mar 2010 Parallel Speedup Predictor will be released Aug 2010 First Parallelism Pattern Advisor will be released
16
16 Conclusion We need a new type of tool to help parallel programming Prospector is a set of parallel programming advisor based on dynamic program analysis Finds good parallelization target Analyzes serial code to understand the behavior Predicts speedup Provides advice on code changes
17
17 Thank you! Q&A References Overall tool architecture Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "Prospector: Helping Parallel Programming by A Data-Dependence Profiler", 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10), June 2010. Scalable data-dependence profiling Minjang Kim, Hyesoon Kim, Chi-Keung Luk, "SD3: A Scalable Approach To Dynamic Data-Dependence Profiling", Proceedings of the 43rd IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2010.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.