Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Essential Performance Advanced Performance Distributed Performance Efficient Performance Building parallel application using Guided Auto Parallelization Om P Sachan Intel Compiler and Languages 1
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Optimization Notice 2 Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the Intel Compiler User and Reference Guides under Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Agenda Introduction to Guided Auto-parallelization. Run Guided Auto-parallelization. Analyze Guided Auto-parallelization reports. Implement Guided Auto-parallelization recommendations. Intel Confidential 3
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/20104 Parallelization in Mainstream Performance gains coming from more cores per die –Increasing clock frequencies play a smaller role Exposes parallelism to the programmer Every computer is a parallel computer –Implies most programs must execute in parallel Parallelism successful in HPC, servers, graphics,... –Not widespread in the client domain Client apps focused on –Quality user experience –Scalability –Programmer productivity (critical for time-to-market ) Development of multi-threaded apps is hard Need for a low-cost and effective way of threading apps
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/20105 Parallelization in Mainstream Requires multi-pronged approach: –Simpler parallel programming models and abstractions –Domain-specific parallel libraries –Compiler auto-parallelization, auto-vectorization, and data-transformation –Advise user on how to parallelize –Good debugging tools –Easy-to-use tools for performance analysis Tradeoffs between scalability and productivity Compiler can play an important role in enabling parallelism
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/20106 Workflow with Compiler as a Tool Compiler Application Source C/C++/Fortran Application Binary + Opt Reports Identify hotspots, problems Performance Tools Simplifies programmer effort in application tuning Application Source + Hotspots Compiler in advice- mode Advice messages Modified Application Source Compiler (extra options) Improved Application Binary
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 4/6/20107 Compiler as a Tool Use compiler as a tool to give selective advice Initially targets: –Automatic parallelization of loop-nests –Automatic vectorization of inner-loops –Data transformation suggestions Programmer writes serial code – then follows the compiler advice to assert new properties –Does not require a lot of extra time and effort from user Code remains performance-portable Programmer reasons about application properties Tool based on expertise of common pitfalls –Conservative disambiguation assumptions –Compiler assumes upper-bound is changing inside loop –...
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/20108 How it Works Targeted for Mainstream and HPC Users Advice may involve –suggestions for source-change –adding pragmas –adding new options Simple source changes that assert new properties –Add a new pragma for loop if semantics are satisfied –Use a local-variable for the upper-bound of a loop –Initialize scalar variable unconditionally at top of loop –Reorder fields of a structure (or split into two) Desired behavior –Each advice is specific using source-level variable names –User does semantic analysis – apply or reject each advice –Advice should be as localized as possible –Following the advice should result in better optimizations
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/20109 Intel Confidential Vectorization Example void mul(NetEnv* ne, Vector* rslt, Vector* den,Vector* flux1, Vector* flux2,Vector* num) { float *r, *d, *n, *s1, *s2; int i; r=rslt->data; d=den->data; n=num->data; s1=flux1->data; s2=flux2->data; for (i = 0; i len; ++i) { r[i] = s1[i] * s2[i] +n[i] * d[i]; } Create an assignment statement to store the upper-bound (ne->len) of loop at line 29 to a local variable if this does not alter program semantics. [VERIFY] Make sure that the upper-bound does not change during the execution of the loop Use pragma ivdep" to vectorize the loop at line 29, if these arrays in the loop do not have unsafe cross-iteration dependencies: r, s1, s2, n, d. [VERIFY] A cross-iteration dependency exists if a memory location is modified in an iteration of a loop and accessed (a read or a write) in another iteration of the loop. Make sure that there are no such dependencies, or that any cross-iteration dependencies can be safely ignored. The compiler guides the user on source-change and on what pragma to insert and on how to determine whether that pragma is correct for this case
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 10 Activity 1 Prepare and run Sample code Use lab document
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 4/6/ Usage Model Two main usage models: –Users compiling with auto-parallelization enabled –Users compiling with no auto-parallelization – but still can gain from improved vectorization User can specify regions of a file or routine that are considered hot –Advice will be restricted to the hot region –Default is to provide advice on entire compilation-unit Under guide-mode, no executable-code generated –Only output is a set of advice messages User not required to use advanced options (IPO, PGO), but advice may change based on options User may apply all (or a subset) of the advice –Recompile in normal-mode enables better optimizations
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/ Usage Model (contd.) Advice targeted only for improving application perf –Use tool during the perf-tuning part of the software development cycle Each advice has a VERIFY part –User is responsible for checking whether it is safe to apply each suggestion User not required to use adv options (IPO, PGO) –When IPO is ON in guide-mode, advice will get emitted as part of link-step There may be multiple msgs targeting same loop –User has to apply ALL to get desired optimization Default debug mode generates no GAP messages –/Zi implies /Od, override by adding /O2 explicitly
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 4/6/ Limitations User may have to deal with lots of messages –Duplicate messages –If no hot region is specified User is responsible for semantic verification–possibility of bugs –Adding an ivdep pragma in a loop is an assertion by the user –May lead to errors if user is not diligent with the verification –Good documentation with examples can help mitigate this More vector/par-loops – does not always guarantee perf gains Tool does not guide the user on how to write parallel code Not a general purpose mechanism to achieve maximum perf –Turning on GAP will not vectorize EVERY loop –Only a subset where compiler can do an intelligent workaround Not a panacea for all problems related to parallelization
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 4/6/ How to Use GAP Targeting Windows and Linux (IA32 & Intel64) With normal options for the app (-O2 and above), add: –-Qguide:3 (Mainstream) –-Qguide:4 (HPC) No code generation in gap-mode (no executable generated) Can be used with and without –Qparallel option
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 15 Activity 2 Implementing Guided Auto-parallelization Recommendations, use sample code Use lab document
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel Confidential 16 Summary Learned Guided Auto-parallelization. Analyze Guided Auto-parallelization reports. Implemented Guided Auto-parallelization recommendations.
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17 Intel Confidential
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Optimization Notice 18 Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the Intel Compiler User and Reference Guides under Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision # Intel Confidential
Software & Services Group Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Legal Disclaimer 19 INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © Intel Corporation. Intel Confidential