Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 How Does The Intel® Parallel.

Slides:



Advertisements
Similar presentations
Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.
Advertisements

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property.
The following 10 questions test your knowledge of desired configuration management in Configuration Manager Configuration Manager Desired Configuration.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
© 2014 Microsoft Corporation. All rights reserved.
Software & Services Group Developer Products Division Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property.
Lloyds 360 Risk Insight Dec 2010 Malcolm Harkins Malcolm Harkins Chief Information and Security Officer General Manager Intel Information Risk and Security.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Intel® Education Fluid Math™
System Design and Analysis
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.
Software Development Unit 6.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.
Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.
Getting Reproducible Results with Intel® MKL 11.0
Intel® Solid-State Drive Data Center TCO Calculator The data in this presentation is based on your analysis and business assumptions when using the Intel®
Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Introduction to Systems Analysis and Design Trisha Cummings.
Intel - Public Get Rich or Get Thin: The Secure Client Jeff Moriarty, CISSP Security Program Manager Intel Information Risk and Security.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.
Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.
IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.
Change Agent Role: A Successful Transformation into Agile Organization (Intel® MKL Case Study) Intel Agile and Lean Development Conference Presenter:
Conditions and Terms of Use
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
K-12 Blueprint Overview March An Overview The K-12 Blueprint offers resources for education leaders involved.
Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.
INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.
Kay-Ulrich Scholl Applying agile SW development methods in a non-agile friendly environment. May 22, Agile and Lean Development Conference 2014.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.
IBIS-AMI and Direction Decisions
IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.
Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
I Power Higher Computing Software Development The Software Development Process.
Results of self-organization in the service oriented team
A l a d d i n. c o m eSafe 6 FR2 Product Overview.
Hosting an Enterprise Financial Forecasting Application with Terminal Server Published: June 2003.
The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.
The Software Development Process
INTEL CONFIDENTIAL Shared Memory Considerations Introduction to Parallel Programming – Part 4.
© 2015 IBM Corporation Big Data Journey. © 2015 IBM Corporation 2.
Changing Developer Behavior Using Automatic Test Intel Agile and Lean Development Conference Chris Gearing 23 rd May 2014 Version 1.0.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:
Introduction to System Analysis and Design MADE BY: SIR NASEEM AHMED KHAN DOW VOCATIONAL & TECHNICAL TRAINING CENTRE.
Tuning Threaded Code with Intel® Parallel Amplifier.
This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information.
Connectivity to bank and sample account structure
Using Parallelspace TEAM Models to Design and Create Custom Profiles
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 1.1
Parallelspace PowerPoint Template for ArchiMate® 2.1 version 2.0
Many-core Software Development Platforms
CSCI1600: Embedded and Real Time Software
Intel® Parallel Studio and Advisor
Modeling Parallelism with Intel® Parallel Advisor
A Proposed New Standard: Common Privacy Vulnerability Scoring System (CPVSS) Jonathan Fox, Privacy Office/PDIT Harold A. Toomey, PSG/ISecG Jason M. Fung,
Ideas for adding FPGA Accelerators to DPDK
By Vipin Varghese Application Engineer (NCSD)
Expanded CPU resource pool with
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 How Does The Intel® Parallel Advisor Estimate My Program’s Parallel Speedup? Bevin R Brett Intel Parallel Advisor team 31 Mar /4/20161

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 What is the Intel Parallel Advisor One of the tools bundled into Intel Parallel Studio Analyzes the execution behavior of either a serial or partially parallelized program Helps the user introduce additional parallelism into the program, by 1.helping them choose a set of tasks to do in parallel, and 2.helping them find the data races that must be fixed before attempting parallel execution

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Advisor supports an easy approach to introducing parallelism 6/4/20163 StepDetails SurveyChoose a possible site SuitabilityVerify it is suitable Create a Unit TestExercise just the site CorrectnessAny data races? Fix! Introduce parallelismTBB? Cilk? OpenMP? … Measure the benefitUse the unit test first then Use the real program Find and fix problemsData races & performance

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4 How does the user interact with Suitability? 6/4/20164 Suitability requires 1.Some minor source changes to annotate the proposed sites and tasks, and later to propose adding locks 2.A data collecting run of the program that is usually less than 5% slower than a normal run 3.Some post processing to analyze the collected data From this, it presents the user an estimate of whether the annotated sites, tasks, and locks will give a worthwhile speedup of the program on systems with a range of core counts

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 The result displayed within Visual Studio

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6 Talking today about how this estimate is made… 6/4/20166 This part of Advisor is very mysterious, people (even academics) wonder “how can it do that?” and hence “does it really do that?”. The point of my talk today is to describe “…the technology behind this estimate, so you can understand better its capabilities and limitations”

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 7 Previous approaches Amdahl’s law 1 / ((1-P)+P/S)) Cilk UnloadedWork / LoadedCriticalPathWork Neither coped with 1.Small work graphs on small numbers of cores 2.Effects of locks that cause partial interference Observation: If you want to predict a complex system, you have to run a detailed model of it

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 8 Example of simple repeatable case Do_parallel work(10) | work(5) ; Do_parallel work(3) |work(3)

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9 Overview 6/4/20169 Building the Model 1.How the collector gets invoked 2.What the collector does 3.How the collected data is turned into a model Running the Model Accuracy and Limitations

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10 How the collector gets invoked The user annotates their source code, then rebuilds #include “advisor-annotation.h” … ANNOTATE_SITE_BEGIN(name).. The ANNOTATE_ macros expand to code that call functions in libittnotify.dll This trivial.dll does nothing… It is only when the Advisor is collecting the data, that something happens

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11 The collector uses Pin to intercept the calls The collector is started from within Advisor within Visual Studio, and in runs your startup program. The collector uses Pin, a dynamic binary instrumentation tool, to instrument the called functions Learn more about Pin The called functions are instrumented to call into the Advisor’s collector in a different.dll

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12 What the collector does The collector looks at 1.the parameters to the call A static variable hidden in the macro expansion is used to number the annotation 2.the frames on the call stack (first call only) Used to provide a stack trace showing where the site/task/acquire is in the user code 3.the QueryPerformanceCounter and/or RDTSC timing information Used to collect timing information

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13 The collected data A tree of Site/task/acquire information is written to a data file It contains counts and start and end times The tree is carefully compressed, losing unimportant details The compression exponentially increases, slowing the write rate as the data grows. Stops the data getting too big The compression is decreased (the write rate speeds up again) when something new is seen Aiming at less than 10% of the max disk write rate

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14 What the estimator does This data is used to make a model of the original program execution tree The emulator runs the model under a variety of assumptions about number of cores, parallel framework overhead, etc. The results of these emulator runs are used to populate the display

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15 The emulator language Statement ::= –Compute statement –Site statement –Task statement Compute Statement ::= unlocked_interval {lock&lock_interval; unlock_interval}*repeat unlocked_interval Site Statement ::= {Statement} The Statements are executed sequentially. Any Task Statements are started at the sequential time in the execution of the statement list, but run in parallel, and join at the end of the Site Statement Task Statement ::= {Statement}

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16 How fast is it? Three stages of data processing The collection –Linear on the number of events executed –Uses exponential decay to cope with long runs –A few hundred extra instructions per annotation –So, unless you have made a bad choice, no impact The emulator program building –Usually almost linear on the amount of data –Usually just a few seconds The emulation of the program under various assumptions –Uses an exponential compression technique on large counts –Usually just a few seconds

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17 Accuracy Some parallel programs have very repeatable times Some parallel programs are chaotic Alternative choices by the scheduler can cause arbitrary variations The Suitability estimator is designed to work well for 1.very repeatable times 2.the lots-of-small-non-interfering cases 3.simple locking interactions and to get a plausible time in a chaotic case

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 18 How accurate is it? We aim to get within 20% of the achieved speedup in the repeatable cases –ongoing experiments to study accuracy support this number, not yet ready to be published There are factors it is not currently modeling –when those factors are important, it can miss the 20% Some of the factors –exact parallel framework overheads –the memory subsystem

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 19 ANNOTATE_SITE_BEGIN(MySite); { ANNOTATE_TASK_BEGIN(MyTask1); burn(10); ANNOTATE_TASK_END(MyTask1); burn(5); for (int i = 0; i < 3; i++) { ANNOTATE_TASK_BEGIN(MyTask2); burn(5); ANNOTATE_TASK_END(MyTask2); } ANNOTATE_SITE_END(MySite); Example

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20 Summary The Suitability feature shows the worth of a proposal before it is implemented Not magic! Just applying modeling to a new realm When combined –with Survey to find a good site –and Correctness to fix issues before going parallel it makes Intel® Parallel Advisor a great tool for programmers adding parallelism

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

22 Logo

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © Intel Corporation.