Download presentation
Presentation is loading. Please wait.
Published byJordon Stamer Modified over 9 years ago
1
Intel Parallel Advisor Workflow David Valentine Computer Science Slippery Rock University
2
Parallel Advisor: the goal Find the sections of your application that, when parallelized, will give you the best performance gains and scalability, while maintaining correct results
3
Advisor Workflow (from.NET) “Open Advisor Work flow” Can also get here from Tools- Intel Advisor XE 2013-Open Advisor XE Workflow Simple, 5-step process -all analysis done on your serial code
4
Different Builds needed at Different Steps in the Workflow Workflow StageRelease BuildDebug Build Survey Target (Annotate Sources) Check Suitability XXX Check Correctness (Add Parallel Framework) XXX Release BuildOptions C/C++ > General tabDebug Info Format > Program Database (/Zi) Compiler: OptimizationOptimization > Maximize Speed (/O2) Inline Function Expansion > Only_inline (/Ob1) Compiler: Code GenerationRuntime Library > MultiThreaded DLL (/MD or /MDd) Linker OptionsGenerate Debug Info > Yes (/DEBUG)
5
Different Builds needed at Different Steps in the Workflow Workflow StageRelease BuildDebug Build Survey Target (Annotate Sources) Check Suitability XXX Check Correctness (Add Parallel Framework) XXX Debug BuildOptions C/C++ > General tabDebug Info Format > Program Database (/Zi) or (/ZI) Compiler > OptimizationDisabled (/Od) Code GenerationRuntime Library > Multi-Threaded DLL (/MD) or Multi-Threaded Debug DLL (/MDd) Linker > DebuggingGenerate Debug Info > YES (/DEBUG)
6
Work Flow Step 1: Survey Target This “hot spot” tool needs a “Release Mode” configuration along with Project Properties: – C/C++ General tab: Debug info to /Zi or /ZI Optimization: Max Speed /O2 & Only_Inline /Ob1 Code Generation: Runtime Library to Multi-threaded /MD or /MDd – Linker- Debugging set to YES (/DEBUG) Build Project Click “Collect Survey Data”
7
Look at Survey Report 100% of time spent in the loop in function trap. Double click to see the code involved.
8
Summary Report We can see all the time was spent in a single, time- consuming loop. We now have a target to parallelize.
9
Step 2: Annotate Source The Workflow Advisor gives us 5 tasks: 1.Specify the Intel Advisor XE Include directory a.I prefer to set.NET IDE for this. Tools-Options-Projects & Solutions-VC++ Directories- b.Then drop the box “Show directories for” to Include Files c.Browse to “C:\Program Files\Intel\Advisor XE 2013\include” 2.Include the annotation definitions a)Go to top of code (in #include’s) and right click b)Select Intel Advisor XE 2013 – Insert Annotation Definitions Reference and the #include will be inserted for you. 3.Specify the library name and directory
10
Checking Suitability 4.Insert the actual Annotations: highlight & rt-click ANNOTATE_SITE_BEGIN(MySite1); for(int i=1; i<numIntervals; i++) {//get the interior points ANNOTATE_TASK_BEGIN(MyTask1); x = xLo + i*width; area += f(x); ANNOTATE_TASK_END(MyTask1); } ANNOTATE_SITE_END(MySite1);
11
Checking Suitability 5.Rebuild project (Release configuration) The Survey & Suitability tools take RELEASE build The Correctness tool (when we get there) takes a DEBUG build.
12
Suitability Report We can almost double speed on dual core But the tasks are VERY small
13
Check Correctness Rebuild Project with Debug configuration a.Compiler: Debug (/Zi) b.Compiler: Optimization Disabled (/Od) c.Compiler: Code Generation Runtime Library (/MD or /MDd) d.Linker Debugging: Generate Debug info YES (/DEBUG) And KNOCK DOWN THE ITERATIONS! Correctness takes a LONG LONG time
14
We find a data race error Each thread tries to update “area”; we have a data race. (There is also a bug in Advisor)
15
Fix data race with lock ANNOTATE_SITE_BEGIN(MySite1); for(int i=1; i<numIntervals; i++) {//get the interior points ANNOTATE_TASK_BEGIN(MyTask1); x = xLo + i*width; ANNOTATE_LOCK_ACQUIRE(&area) area += f(x);//add the interior value ANNOTATE_LOCK_RELEASE(&area) ANNOTATE_TASK_END(MyTask1); } ANNOTATE_SITE_END(MySite1);
16
Run Correctness again Clean bill of health!
17
Now add Parallel Framework #pragma omp parallel for default(none) \ //make newbie list ALL private(x) \//each thread has own x shared(numIntervals, xLo, width) \ //all share these reduction(+:area)//threads combine areas at end for(int i=1; i<numIntervals; i++) {//get the interior points x = xLo + i*width;//makes each iteration independent of others area += f(x);//add the interior value *** } Will also need to add: – #include – Properties-Configuration- C/C++ -Language-OpenMP Support > Yes
18
Watch it run! 100% core usage!
19
Now on to the Nifties… Please respect the work of colleagues DO NOT POST SOURCE CODE Give credit back to the authors DO NOT POST SOURCE CODE Feel free to tweak the assignments DO NOT POST SOURCE CODE
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.