Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College.

Slides:



Advertisements
Similar presentations
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Advertisements

Implementing Domain Decompositions Intel Software College Introduction to Parallel Programming – Part 3.
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel ® Software Development.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Programmability Issues
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
INTEL CONFIDENTIAL Improving Parallel Performance Introduction to Parallel Programming – Part 11.
Chapter 8: Programming the Microprocessor. Copyright ©2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. The Intel.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
INTEL CONFIDENTIAL Confronting Race Conditions Introduction to Parallel Programming – Part 6.
INTEL CONFIDENTIAL OpenMP for Task Decomposition Introduction to Parallel Programming – Part 8.
Copyright © 2006, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel® Core™ Duo Processor.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
Hyper-Threading Intel Compilers Andrey Naraikin Senior Software Engineer Software Products Division Intel Nizhny Novgorod Lab November 29, 2002.
INTEL CONFIDENTIAL Reducing Parallel Overhead Introduction to Parallel Programming – Part 12.
Getting Reproducible Results with Intel® MKL 11.0
Overview of Intel® Core 2 Architecture and Software Development Tools June 2009.
INTEL CONFIDENTIAL Finding Parallelism Introduction to Parallel Programming – Part 3.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Multi-core Programming Tools. 2 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Topics General Ideas Compiler Switches Dual Core.
Programming Models using Windows* Threads Intel Software College.
1 Intel® Compilers For Xeon™ Processor.
1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.
Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
History of Microprocessor MPIntroductionData BusAddress Bus
Performance of mathematical software Agner Fog Technical University of Denmark
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
Correcting Threading Errors with Intel® Parallel Inspector.
*All other brands and names are the property of their respective owners Intel Confidential IA64_Tools_Overview2.ppt 1 修改程序代码以 利用编译器实现优化
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.
INTEL CONFIDENTIAL Shared Memory Considerations Introduction to Parallel Programming – Part 4.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
Auto-Vectorization Jim Hogg Program Manager Visual C++ Compiler Microsoft Corporation.
Single Node Optimization Computational Astrophysics.
SSE and SSE2 Jeremy Johnson Timothy A. Chagnon All images from Intel® 64 and IA-32 Architectures Software Developer's Manuals.
EECS 583 – Class 22 Research Topic 4: Automatic SIMDization - Superword Level Parallelism University of Michigan December 10, 2012.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Tuning Threaded Code with Intel® Parallel Amplifier.
1 Parallel Processing Fundamental Concepts. 2 Selection of an Application for Parallelization Can use parallel computation for 2 things: –Speed up an.
Visual C++ Optimizations Jonathan Caves Principal Software Engineer Visual C++ Microsoft Corporation.
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
Exploiting Parallelism
Optimization for the Linux kernel and Linux OS C. Tyler McAdams
Getting Started with Automatic Compiler Vectorization
Lecture 2: Intro to the simd lifestyle and GPU internals
Henk Corporaal TUEindhoven 2009
Many-core Software Development Platforms
Superscalar Processors & VLIW Processors
Henk Corporaal TUEindhoven 2011
Multi-Core Programming Assignment
Programming with Shared Memory Specifying parallelism
Presentation transcript:

Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Intel Software College

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Objectives At the successful completion of this module, you will be able to: Use key compiler optimization switches Optimize software for the Architecture Enhance performance with vectorization and other techniques

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Vectorization

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Key to optimizing: Intel ® Core™ Duo Exploiting Architectural Power requires Sophisticated Compilers Optimal use of Registers & functional units Dual-Core/Multi-processor SSE instructions Cache architecture

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version C++ Compatibility with Microsoft Source & binary compatible with VC2003 with /Qvc71, Source & binary compatible with w/ VC 2005 under /Qvc8. Microsoft* & Intel OpenMP binaries are not compatible. Use the one compiler for all modules compiled with OpenMP For more information, refer to the User’s Guide

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Use Intel Compiler in Microsoft IDE C++

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Intel® C++ compiler Dual Core Vectorization

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version General Optimizations Windows*Linux*Mac* /Od-O0 Disables optimizations /Zi-g Creates symbols /O1-O1 Optimize for Binary Size: Server Code /O2-O2 Optimizes for speed (default) /O3-O3 Optimize for Data Cache: Loopy Floating Point Code

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Multi-pass Optimization Interprocedural Optimizations (IPO) ip: Enables interprocedural optimizations for single file compilation ipo: Enables interprocedural optimizations across files Can inline functions in separate files Enhances optimization when used in combination with other compiler features Windows*Linux*Mac* /Qip-ip /Qipo-ipo

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Multi-pass Optimization - IPO Usage: Two-Step Process Linking Windows*icl /Qipo main.o func1.o func2.o Linux*icc -ipo main.o func1.o func2.o Mac*icc -ipo main.o func1.o func2.o Pass 1 Pass 2 virtual.o executable Compiling Windows*icl -c /Qipo main.c func1.c func2.c Linux*icc -c -ipo main.c func1.c func2.c Mac*icc -c -ipo main.c func1.c func2.c

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Profile Guided Optimizations (PGO) Use execution-time feedback to guide many other compiler optimizations Helps I-cache, paging, branch-prediction Enabled optimizations: Basic block ordering Better register allocation Better decision of functions to inline Function ordering Switch-statement optimization Better vectorization decisions

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Instrumented Compilation (Mac*/Linux*)icc -prof_gen[x] prog.c (Windows*)icl -Qprof_gen[x] prog.c Instrumented Execution Run program on a typical dataset Feedback Compilation (Mac/Linux)icc -prof_use prog.c (Windows)icl -Qprof_use prog.c DYN file containing dynamic info:.dyn Instrumented executable Merged DYN summary file:.dpi Delete old dyn files if you do not want the info included Step 1 Step 2 Step 3 Multi-pass Optimization PGO: Three-Step Process

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 13 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Auto Parallelization OpenMP Threading Diagnostics Vectorization

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 14 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Auto-parallelization Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives. Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze. Windows*Linux*Mac* /Qparallel-parallel /Qpar_report[n]-par_report[n]

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 15 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version OpenMP* Threading Technology Pragma based approach to parallelism Usage: OpenMP switches: -openmp : /Qopenmp OpenMP reports: - openmp-report : /Qopenmp-report #pragma omp parallel for for (i=0;i<MAX;i++) A[i]= c*A[i] + B[i];

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 16 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version OpenMP: Workqueueing Extension Example Intel Compiler’s Workqueuing extension Create Queue of tasks…Works on… Recursive functions Linked lists, etc. #pragma intel omp parallel taskq shared(p) { while (p != NULL) { #pragma intel omp task captureprivate(p) do_work1(p); p = p->next; }

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 17 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Parallel Diagnostics Source Instrumentation for Intel Thread Checker Allows thread checker to diagnose threading correctness bugs To use tcheck/Qtcheck you must have Intel Thread Checker installed See thread checker documentation mancetools/sb/CS htm Windows*Linux*Mac* /Qtcheck-tcheckNo support

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 18 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Agenda Introduction Compiler Switches Dual Core Vectorization SSE & Vectorization Vectorization Reports Explanations of a few specific vectorization inhibitors

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 19 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version SIMD – SSE, SSE2, SSE3 Support 16x bytes 8x words 4x dwords 2x qwords 1x dqword 4x floats 2x doubles MMX* SSE SSE2 SSE3 * MMX actually used the x87 Floating Point Registers - SSE, SSE2, and SSE3 use the new SSE registers

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 20 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version SIMD FP using AOS format* Thread Synchronization Video encoding Complex arithmetic FP to integer conversions HADDPD, HSUBPD HADDPS, HSUBPS MONITOR, MWAIT LDDQU ADDSUBPD, ADDSUBPS, MOVDDUP, MOVSHDUP, MOVSLDUP FISTTP * Also benefits Complex and Vectorization SSE3 Instructions

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 21 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Using SSE3 - Your Task: Convert This… 128-bit Registers A[0] B[0] C[0] A[1] B[1] C[1] not used for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 22 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version … Into This … 128-bit Registers A[3] A[2] B[3] B[2] C[3] C[2] + + A[1] A[0] B[1] B[0] C[1] C[0] + + for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 23 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Compiler Based Vectorization Processor Specific DescriptionUseWindows*Linux*Mac* Generate instructions and optimize for Intel ® Pentium ® 4 compatible processors including MMX, SSE and SSE2. W/QxW-xWDoes not apply Generate instructions and optimize for Intel ® processors with SSE3 capability including Core Duo. These processors support SSE3 as well as MMX,SSE and SSE2. P/QxP /QaxP -xP, -axP Vector- ization occurs by default

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 24 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Compiler Based Vectorization Automatic Processor Dispatch – ax[?] Single executable Optimized for Intel® Core Duo processors and generic code that runs on all IA32 processors. For each target processor it uses: Processor-specific instructions Vectorization Low overhead Some increase in code size

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 25 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Loops Don’t Vectorize Independence Loop Iterations generally must be independent Some relevant qualifiers: Some dependent loops can be vectorized. Most function calls cannot be vectorized. Some conditional branches prevent vectorization. Loops must be countable. Outer loop of nest cannot be vectorized. Mixed data types cannot be vectorized.

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 26 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Didn’t My Loop Vectorize? Windows* Linux* Macintosh* -Qvec_reportn-vec_reportn-vec_reportn Set diagnostic level dumped to stdout n=0: No diagnostic information n=1: (Default) Loops successfully vectorized n=2: Loops not vectorized – and the reason why not n=3: Adds dependency Information n=4: Reports only non-vectorized loops n=5: Reports only non-vectorized loops and adds dependency info

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 27 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Why Loops Don’t Vectorize “Existence of vector dependence” “Nonunit stride used” “Mixed Data Types” “Unsupported Loop Structure” “Contains unvectorizable statement at line XX” There are more reasons loops don’t vectorize but we will disucss the reasons above

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 28 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Existence of Vector Dependency” Usually, indicates a real dependency between iterations of the loop, as shown here: for (i = 0; i < 100; i++) x[i] = A * x[i + 1];

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 29 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Defining Loop Independence Iteration Y of a loop is independent of when (or whether) iteration X occurs. int a[MAX], b[MAX]; for (j=0;j<MAX;j++) { a[j] = b[j]; }

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 30 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Nonunit stride used” for (I=0;I<=MAX;I++) for (J=0;J<=MAX;J++) { c[I][J]+=1; // Unit Stride c[J][I]+=1; // Non-Unit A[J*J]+=1; // Non-unit A[B[J]]+=1; // Non-Unit if (A[MAX-J])=1 last1=J;}// Non-Unit End Result: Loading Vector may take more cycles than executing operation sequentially. Memory

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 31 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Mixed Data Types” An example: int howmany_close(double *x, double *y) { int withinborder=0; double dist; for(int i=0;i<MAX;i++) { dist=sqrtf(x[i]*x[i] + y[i]*y[i]); if (dist<5) withinborder++; } Mixed data types are possible – but complicate things i.e.: 2 doubles vs 4 ints per SIMD register Some operations with specific data types won’t work

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 32 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Unsupported Loop Structure” Example: struct _xx { int data; int bound; } ; doit1(int *a, struct _xx *x) { for (int i=0; i bound; i++) a[i] = 0; An unsupported loop structure means the loop is not countable, or the compiler for whatever reason can’t construct a run-time expression for the trip count.

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 33 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version “Contains unvectorizable statement” for (i=1;i<nx;i++) { B[i] = func(A[i]); } 128-bit Registers A[3] A[2] B[3] B[2] func A[1] A[0] B[1] B[0] func

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 34 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Reference Web-based and classroom training White papers and technical notes Product support resources

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 35 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 36 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 1 - raytrace2: Initial Compilation Set up environment and compile with both Microsoft* Visual C++.NET (MSVC*) and Intel® C++ Compiler (icl)

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 37 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 2 - raytrace2: O3 Compilation Use Intel compiler’s High Level Optimizer (-O3) for loop centric codes

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 38 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 3 - raytrace2: IPO Compilation Use Intel compiler’s Inter-procedural Optimization (-Qipo)

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 39 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 4 - raytrace2: PGO Compilation Use Intel compiler’s Profile-guided Optimization

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 40 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 5 – raytrace2: Vectorization Use Intel compiler’s Vectorization optimization (-QxP)

Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 41 Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version Activity 6 - raytrace2: Putting it all together Use all previous optimizations in tandem (-O3, -QxP, IPO and PGO)