Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 2 Intel® C++ and Fortran Composer XE Most comprehensive multicore and standards support OpenMP* 3.0, auto-vectorization, auto-parallelization, parallel valarray, “parallel lint” Advanced compilers and libraries support Intel and compatible processors in a single binary 9/7/20152 Intel® C++ Composer XE components Intel® C++ Compiler XE12.0 Intel® Debugger with parallel debugging support (Linux*) Intel® Parallel Debugger Extension (Windows*) 12.0 Intel® Math Kernel Library ( Intel® MKL )10.3 Intel® Integrated Performance Primitives (Intel® IPP)7.0 Intel® Threading Building Blocks (Intel® TBB)3.0 Intel® Composer XE for Fortran components Intel® Fortran Compiler XE ( Linux*, MacOS* ) Intel® Visual Fortran Compiler XE (Windows*) 12.0 Intel® Debugger with parallel debugging support (Linux*) Intel® Parallel Debugger Extension (Windows*) 12.0 Intel® Math Kernel Library ( Intel® MKL )10.3

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 3 Important Notes Some of the names used here for features of the future compiler and library products are not finalized yet Feature set and functionality might change (slightly) for final product version

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 4 Intel® Compiler Architecture Profiler C++ Front End Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to Loop optimizations: data deps, prefetch, vectorizer, unroll/interchange/fusion/dist, auto-parallel/OpenMP Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim Code generation: instruction selection, scheduling, register allocation, code generation FORTRAN Front End Disambiguation: types, array, pointer, structure, directives

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 5 Interprocedural Optimization Extends optimizations across file boundaries Compile & Optimize file1.c file2.c file3.c file4.c Without IPO Compile & Optimize file1.c file4.cfile2.c file3.c With IPO -ipOnly between modules of one source file -ipoModules of multiple files/whole application

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 6 Profile-Guided Optimizations (PGO) Use execution-time feedback to guide (final) optimization Helps I-cache, paging, branch-prediction Enabled optimizations: –Basic block ordering –Better register allocation –Better decision on which functions to inline –Function ordering –Switch-statement optimization

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 7 Instrumented Compilation icc -prof_gen prog.c Instrumented Execution prog.exe (on a typical dataset) Feedback Compilation icc -prof_use prog.c DYN file containing dynamic info:.dyn Instrumented executable: prog.exe Merged DYN summary file:.dpi Delete old dyn files unless you want their info included Step 1 Step 2 Step 3 PGO Usage: Three Step Process

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 8 Some Generic Features Compatibility to standards ( ANSI C, ISO C++, ANSI C99, Fortran95, Fortran2003 ) Compatibility to leading open-source tools ( ICC vs. GCC, IDB vs GBD, ICL to CL, …) OpenMP support and Automatic Parallelization Sophisticated optimizations –Profile-guided optimization –Multi-file inter-procedural optimization Detailed compilation report generation Support of other Intel® tools

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 9 A few General Switches FunctionalityLinux* Disable optimization-O0 Optimize for speed (no code size increase)-O1 Optimize for speed (default)-O2 High-level optimizer ( e.g. loop unroll)-O3 Vectorization for x86, -xSSE2 is default Aggressive optimizations (e.g. -ipo, -O3, -no-prec-div, -static -xHost for x86 Linux*) -fast Create symbols for debugging-g Generate assembly files-S Optimization report generation/opt-report OpenMP support-openmp Automatic parallelization for OpenMP* threading-parallel

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 10 Optimization Report Options opt_report –generate an optimization report to stderr ( or file ) opt_report_file –specify the filename for the generated report opt_report_phase –specify the phase that reports are generated against opt_report_routine –reports on routines containing the given name opt_report_help –display the optimization phases available for reporting vec-report –Generate vectorization report

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 11 New Features of Intel® Composer XE –GAP – Guided Automatic Parallelization –SIMD Directives Provides additional information to compiler to enable vectorization of loops –Loop Profiler Report file contains: –Average, minimum, maximum iteration counter of loops –Call count of routines –Self-time and total-time of functions / loops –Static Security Analyzer Tool based on compiler technology to statically verify program –C/C++ specific: Vector Notation, Cilk, C++0x Features –Fortran specific : F2003 Status, CAF, DO-CONCURRENT

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 12 Memory Reference Disambiguation Options/Directives related to Aliasing -alias_args[-] -ansi-alias[-] -fno-alias: No aliasing in whole program -fno-fnalias: No aliasing within single units -restrict (C99): -restrict and restrict attribute –enables selective pointer disambiguation There are many more options – different for Windows* and Linux*, different for C/C++ and Fortran

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 13 GAP – Guided Automatic Parallelization Key design ideas: Use compiler infrastructure to help developer to detect what is blocking certain optimizations – in particular vectorization, parallelization and data transformations – and to change code correspondingly Very specific hints to fix problem Not a separate tool but feature of C/C++ and Fortran compiler Exploit multi-year experience brought into the compiler development Performance tuning knowledge based on dealing with numerous applications, benchmarks and compute kernels It is not: Automatic vectorizer or parallelizer in fact, no code is generated to accelerate analysis

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 14 GAP – How it Works Selection of most relevant Switches Multiple compiler switches to activate and fine-tune guidance analysis Activate messages individually for vectorization, parallelization, data transformations or all three -guide[=level] -guide-vec[=level] -guide-par[=level] -guide-data-trans[=level] Optional argument level=1,2,3,4 controls extend of analysis; Intel Composer only supports up to level 3 Control the source code part for which analysis is done -guide-opts= Samples: -guide-opts=“convert.c,'funca(int)'“

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 15 Vectorization Example void mul(NetEnv* ne, Vector* rslt Vector* den, Vector* flux1, Vector* flux2, Vector* num { float *r, *d, *n, *s1, *s2; int i; r=rslt->data; d=den->data; n=num->data; s1=flux1->data; s2=flux2->data; for (i = 0; i len; ++i) r[i] = s1[i]*s2[i] + n[i]*d[i]; } GAP Messages (simplified): 1. “Use a local variable to hoist the upper-bound of loop at line 29 (variable:ne->len) if the upper-bound does not change during execution of the loop” 2. “Use “#pragma ivdep" to help vectorize the loop at line 29, if these arrays in the loop do not have cross-iteration dependencies: r, s1, s2, n, d” -> Upon recompilation, the loop will be vectorized The compiler guides the user on source-change and on what pragma to insert and on how to determine whether that pragma is correct for this case

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 16 Data Transformation Example struct S3 { int a; int b; // hot double c[100]; struct S2 *s2_ptr; int d; int e; struct S1 *s1_ptr; char *c_p; int f; // hot }; peel.c(22): remark #30756: (DTRANS) Splitting the structure 'S3' into two parts will improve data locality and is highly recommended. Frequently accessed fields are 'b, f'; performance may improve by putting these fields into one structure and the remaining fields into another structure. Alternatively, performance may also improve by reordering the fields of the structure. Suggested field order:'b, f, s2_ptr, s1_ptr, a, c, d, e, c_p'. [VERIFY] The suggestion is based on the field references in current compilation … … for (ii = 0; ii < N; ii++){ sp->b = ii; sp->f = ii + 1; sp++; } …

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 17 Privatization Directives for “pragma parallel” Mark variables in loops to be ‘private’ for automatic parallelization similar to what OpenMP private-clause is doing –Available for Fortran and C/C++ –Used by GAP as an advice to add in when appropriate Syntax for C/C++ : #pragma parallel [ clause [ [,] clause ]…] #pragma parallel [ clause [ [,] clause ]…] where clause can be one of the following: where clause can be one of the following: always [ assert ] always [ assert ] private( var [ :expr ] [, var [ :expr ] ]...) private( var [ :expr ] [, var [ :expr ] ]...) lastprivate( var [ :expr ] [, var [ :expr ] ]...) lastprivate( var [ :expr ] [, var [ :expr ] ]...) where var is a variable name. where var is a variable name. Note: In case is missing, semantic corresponds to OpenMP 3.0

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 18 Concurrency-Safe Function Attribute Used by GAP too Windows syntax: __declspec(concurrency_safe[(profitable | cost(cycle- count))]) Linux syntax: __attribute__((concurrency_safe[(profitable | cost(cycle- count))])) ( In Fortran similar functionality via directive ) Semantics: __attribute__(concurrency_safe) The function has no “unacceptable” side effects when invoked in parallel profitable clause: The loops or blocks that contain calls to this function are profitable to parallelize.

Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 20 Intel® Code Coverage Tool Example of code coverage summary for a project. The workload applied in this test exercised 34 of 143 blocks, representing 5 of 19 functions in 2 of 3 modules. In the file, SAMPLE.C, 4 of 5 functions were exercised Clicking on SAMPLE.C produces a listing that highlights the code that was exercised. In this example, the pink-highlighted code was never exercised, the yellow was run but not exercised by any of the tests set up by the developer and the beige was partially covered.

Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.

Similar presentations

Presentation on theme: "Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.

Similar presentations

Presentation on theme: "Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG."— Presentation transcript:

Similar presentations

About project

Feedback