Download presentation
Presentation is loading. Please wait.
Published byDustin Moore Modified over 8 years ago
1
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 How Does The Intel® Parallel Advisor Estimate My Program’s Parallel Speedup? Bevin R Brett Intel Parallel Advisor team 31 Mar 2011 6/4/20161
2
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 What is the Intel Parallel Advisor One of the tools bundled into Intel Parallel Studio Analyzes the execution behavior of either a serial or partially parallelized program Helps the user introduce additional parallelism into the program, by 1.helping them choose a set of tasks to do in parallel, and 2.helping them find the data races that must be fixed before attempting parallel execution
3
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Advisor supports an easy approach to introducing parallelism 6/4/20163 StepDetails SurveyChoose a possible site SuitabilityVerify it is suitable Create a Unit TestExercise just the site CorrectnessAny data races? Fix! Introduce parallelismTBB? Cilk? OpenMP? … Measure the benefitUse the unit test first then Use the real program Find and fix problemsData races & performance
4
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4 How does the user interact with Suitability? 6/4/20164 Suitability requires 1.Some minor source changes to annotate the proposed sites and tasks, and later to propose adding locks 2.A data collecting run of the program that is usually less than 5% slower than a normal run 3.Some post processing to analyze the collected data From this, it presents the user an estimate of whether the annotated sites, tasks, and locks will give a worthwhile speedup of the program on systems with a range of core counts
5
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 The result displayed within Visual Studio
6
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6 Talking today about how this estimate is made… 6/4/20166 This part of Advisor is very mysterious, people (even academics) wonder “how can it do that?” and hence “does it really do that?”. The point of my talk today is to describe “…the technology behind this estimate, so you can understand better its capabilities and limitations”
7
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 7 Previous approaches Amdahl’s law http://en.wikipedia.org/wiki/Amdahl%27s_law 1 / ((1-P)+P/S)) Cilk UnloadedWork / LoadedCriticalPathWork Neither coped with 1.Small work graphs on small numbers of cores 2.Effects of locks that cause partial interference Observation: If you want to predict a complex system, you have to run a detailed model of it
8
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 8 Example of simple repeatable case Do_parallel work(10) | work(5) ; Do_parallel work(3) |work(3)
9
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9 Overview 6/4/20169 Building the Model 1.How the collector gets invoked 2.What the collector does 3.How the collected data is turned into a model Running the Model Accuracy and Limitations
10
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10 How the collector gets invoked The user annotates their source code, then rebuilds #include “advisor-annotation.h” … ANNOTATE_SITE_BEGIN(name).. The ANNOTATE_ macros expand to code that call functions in libittnotify.dll This trivial.dll does nothing… It is only when the Advisor is collecting the data, that something happens
11
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11 The collector uses Pin to intercept the calls The collector is started from within Advisor within Visual Studio, and in runs your startup program. The collector uses Pin, a dynamic binary instrumentation tool, to instrument the called functions Learn more about Pin http://www.pintool.org/ http://en.wikipedia.org/wiki/Pin_(Computer_Program) The called functions are instrumented to call into the Advisor’s collector in a different.dll
12
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12 What the collector does The collector looks at 1.the parameters to the call A static variable hidden in the macro expansion is used to number the annotation 2.the frames on the call stack (first call only) Used to provide a stack trace showing where the site/task/acquire is in the user code 3.the QueryPerformanceCounter and/or RDTSC timing information Used to collect timing information
13
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13 The collected data A tree of Site/task/acquire information is written to a data file It contains counts and start and end times The tree is carefully compressed, losing unimportant details The compression exponentially increases, slowing the write rate as the data grows. Stops the data getting too big The compression is decreased (the write rate speeds up again) when something new is seen Aiming at less than 10% of the max disk write rate
14
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14 What the estimator does This data is used to make a model of the original program execution tree The emulator runs the model under a variety of assumptions about number of cores, parallel framework overhead, etc. The results of these emulator runs are used to populate the display
15
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15 The emulator language Statement ::= –Compute statement –Site statement –Task statement Compute Statement ::= unlocked_interval {lock&lock_interval; unlock_interval}*repeat unlocked_interval Site Statement ::= {Statement} The Statements are executed sequentially. Any Task Statements are started at the sequential time in the execution of the statement list, but run in parallel, and join at the end of the Site Statement Task Statement ::= {Statement}
16
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16 How fast is it? Three stages of data processing The collection –Linear on the number of events executed –Uses exponential decay to cope with long runs –A few hundred extra instructions per annotation –So, unless you have made a bad choice, no impact The emulator program building –Usually almost linear on the amount of data –Usually just a few seconds The emulation of the program under various assumptions –Uses an exponential compression technique on large counts –Usually just a few seconds
17
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17 Accuracy Some parallel programs have very repeatable times Some parallel programs are chaotic Alternative choices by the scheduler can cause arbitrary variations The Suitability estimator is designed to work well for 1.very repeatable times 2.the lots-of-small-non-interfering cases 3.simple locking interactions and to get a plausible time in a chaotic case
18
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 18 How accurate is it? We aim to get within 20% of the achieved speedup in the repeatable cases –ongoing experiments to study accuracy support this number, not yet ready to be published There are factors it is not currently modeling –when those factors are important, it can miss the 20% Some of the factors –exact parallel framework overheads –the memory subsystem
19
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 19 ANNOTATE_SITE_BEGIN(MySite); { ANNOTATE_TASK_BEGIN(MyTask1); burn(10); ANNOTATE_TASK_END(MyTask1); burn(5); for (int i = 0; i < 3; i++) { ANNOTATE_TASK_BEGIN(MyTask2); burn(5); ANNOTATE_TASK_END(MyTask2); } ANNOTATE_SITE_END(MySite); Example
20
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20 Summary The Suitability feature shows the worth of a proposal before it is implemented Not magic! Just applying modeling to a new realm When combined –with Survey to find a good site –and Correctness to fix issues before going parallel it makes Intel® Parallel Advisor a great tool for programmers adding parallelism
21
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
22
22 Logo
23
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products.www.intel.com/software/products BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2011. Intel Corporation. http://intel.com/software/products
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.