Modeling Parallelism with Intel® Parallel Advisor

Slides:

Advertisements

Similar presentations

Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property.

Advertisements

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel ® Software Development.

Computer Software 3 Section A Software Basics CHAPTER PARSONS/OJA

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.

Software & Services Group Developer Products Division Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property.

Lloyds 360 Risk Insight Dec 2010 Malcolm Harkins Malcolm Harkins Chief Information and Security Officer General Manager Intel Information Risk and Security.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.

Intel® Education Fluid Math™

HEVC Commentary and a call for local temporal distortion metrics Mark Buxton - Intel Corporation.

Intel ® Server Platform Transitions Nov / Dec ‘07.

Intel® Education Read With Me Intel Solutions Summit 2015, Dallas, TX.

Intel® Education Learning in Context: Science Journal Intel Solutions Summit 2015, Dallas, TX.

Getting Reproducible Results with Intel® MKL 11.0

Intel® Solid-State Drive Data Center TCO Calculator The data in this presentation is based on your analysis and business assumptions when using the Intel®

Your Interactive Guide to the Digital World Discovering Computers 2012.

Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 

Intel - Public Get Rich or Get Thin: The Secure Client Jeff Moriarty, CISSP Security Program Manager Intel Information Risk and Security.

Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads.

Evaluation of a DAG with Intel® CnC Mark Hampton Software and Services Group CnC MIT July 27, 2010.

IBIS-AMI and Direction Indication February 17, 2015 Updated Feb. 20, 2015 Michael Mirmak.

Change Agent Role: A Successful Transformation into Agile Organization (Intel® MKL Case Study) Intel Agile and Lean Development Conference Presenter:

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Conditions and Terms of Use

K-12 Blueprint Overview March An Overview The K-12 Blueprint offers resources for education leaders involved.

Invitation to Computer Science, Java Version, Second Edition.

Intel® Education Learning in Context: Concept Mapping Intel Solutions Summit 2015, Dallas, TX.

Kay-Ulrich Scholl Applying agile SW development methods in a non-agile friendly environment. May 22, Agile and Lean Development Conference 2014.

Chapter 1 Working with strings. Objectives Understand simple programs using character strings and the string library. Get acquainted with declarations,

Enterprise Platforms & Services Division (EPSD) JBOD Update October, 2012 Intel Confidential Copyright © 2012, Intel Corporation. All rights reserved.

IBIS-AMI and Direction Decisions

IBIS-AMI and Direction Indication February 17, 2015 Michael Mirmak.

Copyright © 2006 Intel Corporation. WiMAX Wireless Broadband Access: The World Goes Wireless Michael Chen Director of Product & Platform Marketing Group.

Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.

Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.

Results of self-organization in the service oriented team

C++ Basics C++ is a high-level, general purpose, object-oriented programming language.

The Drive to Improved Performance/watt and Increasing Compute Density Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 How Does The Intel® Parallel.

Boxed Processor Stocking Plans Server & Mobile Q1’08 Product Available through February’08.

Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 

Changing Developer Behavior Using Automatic Test Intel Agile and Lean Development Conference Chris Gearing 23 rd May 2014 Version 1.0.

Template Library for Vector Loops A presentation of P0075 and P0076

JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.

INTEL CONFIDENTIAL Intel® Smart Connect Technology Remote Wake with WakeMyPC November 2013 – Revision 1.2 CDI/IBP #:

Introduction to Computing Systems and Programming Programming.

1 Game Developers Conference 2008 Comparative Analysis of Game Parallelization Dmitry Eremin Senior Software Engineer, Intel Software and Solutions Group.

Flexible Billings for Projects

Certification of Reusable Software Artifacts

INTRODUCTION TO ROBOTICS Part 5: Programming

Introduction to OpenMP

CSC201: Computer Programming

Computer Engg, IIT(BHU)

Many-core Software Development Platforms

Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application

Computer Programming.

Intel® Parallel Studio and Advisor

Chapter 10 Programming Fundamentals with JavaScript

A Proposed New Standard: Common Privacy Vulnerability Scoring System (CPVSS) Jonathan Fox, Privacy Office/PDIT Harold A. Toomey, PSG/ISecG Jason M. Fung,

12/26/2018 5:07 AM Leap forward with fast, agile & trusted solutions from Intel & Microsoft* Eman Yarlagadda (for Christine McMonigal) Hybrid Cloud – Product.

Ideas for adding FPGA Accelerators to DPDK

Enabling TSO in OvS-DPDK

By Vipin Varghese Application Engineer (NCSD)

Expanded CPU resource pool with

Server & Tools Business

SPL – PS1 Introduction to C++.

Trademark, Patent, or Copyright?

Chapter 15 Debugging.

Presentation transcript:

Modeling Parallelism with Intel® Parallel Advisor Paul Petersen 11/19/201811/19/2018 1

Before We Start All webinars are recorded and will be available on Intel Learning Lab within a week http://www.intel.com/go/learninglab Use the Question Module to ask questions If you have audio issues, give it 5 seconds to resolve. If audio issues persist, we suggest: Drop and reenter the webinar Call into the phone bridge 11/19/201811/19/2018 2

Where should we begin? You either start with a blank sheet of paper … or you don’t Almost no one starts from a blank sheet of paper Useful for writing explicitly parallel kernels and skeletons, but difficult to use when migrating legacy applications All others have to first get their ideas organized, usually by expressing a serial algorithm, which you then need to figure out how to express using parallelism Almost everyone is worried about how to improve something which already has demonstrated value if you need to parallelize it it can’t already be parallel so therefore it must be serial This is the assumption of this talk

Do you really mean that? for (int I = 0; I < N; ++I) This loop is equivalent to: for (int I = 0; I < N; ++I) A[I] = B[I] + C[I]; This loop is equivalent to: for (int I = 0; I < N; ++I) Work( &A[I] ); 1 2 I = 0; if (! (I < N)) goto done; A[0] = B[0] + C[0]; // I = 0 ++I; A[1] = B[1] + C[1]; // I = 1 … done: I = 0; if (! (I < N)) goto done; Work( &A[0] ); // I = 0 ++I; Work( &A[1] ); // I = 1 … done:

Or did you really mean this… for (int I = 0; I < N; ++I) A[I] = B[I] + C[I]; for (int I = 0; I < N; ++I) Work( &A[I] ); 1 2 A[0…N-1] = B[0…N-1] + C[0…N-1] foreach X in A[0…N-1] Work( &X ); or even… Work( &A[0…N-1] )

How do we bridge this gap? Say what you mean Serial languages force you to specify more than what you want to do Especially “legacy” languages which are the bulk of our existing software Developers have an abstract concept of what they want to express, but often existing implementation languages don’t allow the concept to be expressed foreach <VAR> in <COLLECTION> Random/parallel iteration Array notation Aggregate operations This implies a new language to express what you want to say, but all of your algorithms are coded in a “legacy” language How do we bridge this gap?

Digression: debugging What is still claimed to be the #1 debugging tool in use today? A “print” statement Inserting a “print” statement into your serial program typically does not change its behavior, but does allow you to observe what is happening We can use this same approach in order to understand the “parallelism potential” present in your existing serial implementation

Intel® Parallel Advisor A product in Intel® Parallel Studio 2011, which is a plug-in for Microsoft* Visual Studio A design tool that assists in making good decisions to transform a serial algorithm to use multi-core hardware A serial modeling tool using annotated serial code to calculate what might happen if that code were executed in parallel as specified by the annotations A methodology and workflow to help you learn where to use parallel programming

Step: Survey Target & Annotate Sources

Step: Check Suitability

Step: Check Correctness

Bridging the gap Annotations are statements that markup existing algorithms to express the parallel model requested Intel® Parallel Advisor accomplishes this with a core set of modeling annotations: SITE (where should I focus) TASK (what should I do) LOCK (it really is serial) Annotations can be considered as-if they are “print” statements to a special trace file

Annotation: SITE (where should I focus) Intel® Parallel Advisor uses a SITE to: Define a name for a section of the application Declare that all interesting things to analyze occur inside of this section Declare that any parallelism that is declared will be finished before the section exits

Example: SITE #include <advisor-annotate.h> … Annotations are just statements with no visible side-effects #include <advisor-annotate.h> … ANNOTATE_SITE_BEGIN(MySite); Work( &A[0] ); Work( &A[1] ); ANNOTATE_SITE_END(MySite);

Annotation: TASK (what should I do) Intel® Parallel Advisor uses a TASK to: Define a name for a section of the application Declare the statements in this section could be executed immediately or deferred Declare the statements in this section can overlap (concurrent/parallel) execution with other statements in the enclosing SITE

Example: TASK #include <advisor-annotate.h> … Annotations can partition the serial execution to be a specification for parallel execution #include <advisor-annotate.h> … ANNOTATE_TASK_BEGIN(MyTask0); Work( &A[0] ); ANNOTATE_TASK_END(MyTask0); ANNOTATE_TASK_BEGIN(MyTask1); Work( &A[1] ); ANNOTATE_TASK_END(MyTask1);

Annotation: LOCK (it really is serial) Intel® Parallel Advisor uses a LOCK to: Define an (non-unique) name for a section of the application Declare that sections with this name are not allowed to execute in parallel, they must be serialized

Example: LOCK #include <advisor-annotate.h> … ANNOTATE_LOCK_ACQUIRE(0); globalCounter = globalCounter + 1; ANNOTATE_LOCK_RELEASE(0); A LOCK is a concept familiar to developers, but it really means a serialized section of code This annotation may be implemented with other non-lock based synchronization mechanisms, such as atomic variables

What about loops? The combination of a SITE + TASK using a serial looping construct can be used to model a parallel loop #include <advisor-annotate.h> … ANNOTATE_SITE_BEGIN(MyLoopSite); for (int I = 0; I < N; ++I) { ANNOTATE_TASK_BEGIN(MyLoopIteration); Work( &A[I] ); ANNOTATE_TASK_END(MyLoopIteration); } ANNOTATE_SITE_END(MyLoopSite);

Step: Add Parallel Framework Final step: Convert annotations into uses of a parallel framework Examples in two parallel frameworks are used Intel® Cilk™ Plus Intel® Threading Building Blocks (Intel® TBB) Converting a parallel model into explicitly parallel implementation is straightforward Each parallel framework you might use has concepts similar to a SITE, TASK, or LOCK Learning how common parallel framework features match annotations allows easy conversion to an explicitly parallel implementation The final step in the Parallel Advisor workflow is to convert modeling annotations into usage of a parallel framework

Parallel Framework: Intel® Cilk™ Plus Specialized syntax can represent our parallel models efficiently #include <advisor-annotate.h> … ANNOTATE_SITE_BEGIN(MyLoopSite); for (int I = 0; I < N; ++I) { ANNOTATE_TASK_BEGIN(MyLoopIteration); Work( &A[I] ); ANNOTATE_TASK_END(MyLoopIteration); } ANNOTATE_SITE_END(MyLoopSite); cilk_for (int I = 0; I < N; ++I) Work( &A[I] );

Parallel Framework: Intel® Cilk™ Plus (cont.) Exploit unique parallel framework features #include <advisor-annotate.h> … ANNOTATE_ACQUIRE_BEGIN(0); std::cout << A[I] << std::endl; ANNOTATE_RELEASE_END(0); #include <cilk/reducer_ostream.h> … cilk::reducer_ostream cout_reducer(std::cout); cout_reducer << A[I] << std::endl;

Parallel Framework: Intel® Threading Building Blocks (Intel® TBB) Libraries can also represent our parallel models efficiently #include <advisor-annotate.h> … ANNOTATE_SITE_BEGIN(MyLoopSite); for (int I = 0; I < N; ++I) { ANNOTATE_TASK_BEGIN(MyLoopIteration); Work( &A[I] ); ANNOTATE_TASK_END(MyLoopIteration); } ANNOTATE_SITE_END(MyLoopSite); #include <stdio.h> void Work( int *X ) { printf( "%0x\n", X ); } void Processing( int *A, int N ) for (int K = 0; K < N; ++K) { [=](int I) { Work ( &A[I] ); } (K); int main( int argc, char **argv ) const int N = 10; int A[N]; for (int K = 0; K < N; ++K) Work( &A[K] ); printf("\n" ); Processing( A, N ); #include "tbb/parallel_for.h" … tbb::parallel_for (0, N, 1, [=](int I) { Work ( &A[I] ); } ); ….

Parallel Framework: Intel® TBB (cont.) Generic models can translate to specialized implementations #include <advisor-annotate.h> … ANNOTATE_LOCK_ACQUIRE(0); globalCounter = globalCounter + 1; ANNOTATE_LOCK_RELEASE(0); #include “tbb/atomic.h” … tbb::atomic<int> globalCounter; ++globalCounter;

Conclusion Sequential code can be understood in two ways an exact description of how a program must execute a specification of the computations that must happen Intel® Parallel Advisor uses the second interpretation by introducing modeling annotations that can be embedded into your sequential application These annotations allow you to precisely say where and how the sequential execution of your application is over-constrained what flexibility you are willing to utilize to harness parallel execution

Intel® Parallel Studio 2011 More information about Parallel Studio and Parallel Advisor is available online, including a 30-day free trial www.intel.com/go/parallel Supports Microsoft Visual Studio* 2005, 2008 and 2010. This URL is where you can get more information about parallel studio and parallel advisor. 26 26

Optimization Notice Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel Streaming SIMD Extensions 2 (Intel SSE2), Intel Streaming SIMD Extensions 3 (Intel SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110228

Legal Disclaimer http://intel.com/software/products INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2011. Intel Corporation. http://intel.com/software/products