Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application

Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application
Hello everyone. I’m Mark Davis from Intel. Today I want to tell you about a methodology and a set of tools that make it easier to add parallelism to your programs. I’ll be using a program called Duplo to demonstrate Advisor’s effectiveness. V0: copied from webinar10/ParallelAdvisor_Webinar v2.ppx; adjusted to new formatting; started putting in new screenshots from Duplo: survey. V1: new for putting in other screenshots and new slides. V2: into suitability, powerfail autosaved version. Finished most of the slides. Next step is to fix agenda slides and remove extra slides. V3: remove “recap” from agenda slides, remove 2 prefix slides, the 12 Recap slides V4: 6/6/11 result of review. Can’t use ninja slides. Use Duplo for all screenshots. Esti Lederer did legal/editor review – applied the changes. Mark Davis, Ph. D. Senior Principal Engineer Performance, Analysis, and Threading June 22, 2011

Before We Start All webinars are recorded and will be available on Intel Learning Lab within a week Use the Question Module to ask questions If you have audio issues, give it 5 seconds to resolve. If audio issues persist, we suggest: Drop and reenter the webinar Call into the phone bridge

Executive Summary & Agenda
Challenge: Parallel programming can be rewarding but daunting! Solution: Intel® Parallel Advisor is a methodology and set of tools to help you easily add correct and effective parallelism to your program, in this case, Duplo. Agenda Introduction Advisor Workflow Survey Add Annotations Model Suitability Check Correctness Add Parallel Framework Conclusion 3

Introduction Advisor Workflow Agenda Survey Add Annotations
Model Suitability Check Correctness Add Parallel Framework Conclusion

But why should you care about Parallelism?
In a word: “Performance” Serial optimizations may achieve less than 25% Data Parallelism, e.g., Vectorization may gain 2-4X Task Parallelism may provide speed-ups proportional to the number of cores, e.g., 4-8X Vectorization gains depend on vector length and amount of code that can vectorize. Don’t leave all that potential performance on the table!

Suppose you had a magical tool that
Lets you quickly write a serial program to implement your algorithm, Causes your program to run correctly even in the presence of coding bugs, Helps you find and fix the bugs, And also tells you the best performance to expect from your algorithm. Would this make you more productive? Of course it would! Suppose you have a correct algorithm and you want to be able to quickly code it up and run it to see the results and performance. Also suppose there is a magical tool that can run the program correctly in spite of coding problems, like off-by-one errors, and can also estimate the best performance you can get from your algorithm. Would you want to use such a tool? I know I would!

Suppose you had a magical tool that
Lets you quickly write a serial program to implement your algorithm, Causes your program to run correctly even in the presence of coding bugs, Helps you find and fix the bugs, And also tells you the best performance to expect from your algorithm. Would this make you more productive? Of course it would! I don’t have such a tool for serial programs, but this fanciful dream is actually similar to what Parallel Advisor can do when it helps you move your serial program to parallelism. This is similar to how Parallel Advisor works when you add Parallelism to your Serial program.

Intel® Parallel Advisor 2011
A product in last year’s release of Intel® Parallel Studio 2011, which is a plug-in for Microsoft* Visual Studio A design tool that assists in making good decisions to transform a serial algorithm to use multi-core hardware A parallel modeling tool that uses annotations in the serial code to calculate what might happen if that code were to execute in parallel as specified by the annotations A methodology and workflow to educate users on an effective method of using parallel programming Now that I have your attention, the tool I want to talk about is Intel Parallel Advisor. It’s part of the Intel Parallel Studio tools for parallelism suite, which are all plug-ins to Visual Studio on Windows. Advisor has a number of useful properties While you are adding parallelism to serial code, it guides you to make good decisions; in this way it works as a design tool. Several of the tools in Advisor use parallel modeling for serial programs to determine what effects the parallel regions that you specify using annotations might have. But your program is still serial. Finally, Advisor is based on a time-tested parallelization methodology, and can be used to learn the best approaches to parallel programming.

Digression – Intel® Parallel Studio
Decide where to add the parallelism Analyze the serial program Prepare it for parallelism Test the preparations Add the parallelism Intel® Cilk Plus, Intel® Threading Building Blocks, … Find logic problems Only fails sometimes Place of failure changes Find performance problems A brief digression to explain Intel Parallel Studio, a suite of development tools which works on C and C++ programs. The first tool, Advisor, today’s topic, helps you find opportunities for parallelism, and prepare your program for correct parallel execution. The second, Intel Parallel Composer, lets you express the parallelism – with a compiler, parallel language extensions, and parallel libraries. ((? How many of you know about Cilk? What about Threading Building Blocks?)) The next, Intel Parallel Inspector, helps you find memory problems and race conditions in your parallel program And finally, Intel Parallel Amplifier, helps you tune the performance of your parallel program

The Advisor Workflow Transforming many serial algorithms into parallel form takes 5 easy high-level steps Often existing algorithms are over-constrained by serial language semantics, and the underlying mathematics has a natural parallel expression if you can just find it The objective of parallelization is to find the parallel program lurking within your serial program! The parallelism may be hiding due to the serial program being over constrained; for example, having read-write global variables that cause no problems for serial code, but inhibit parallelism. Advisor is NOT an automatic parallelization tool. It is aimed at code that is larger and messier than loop nests. Instead it guides you through the set of decisions you must make, and provides data about your program at each step. The intent is that Advisor provide a lightweight methodology that allows you to easily experiment with parallelism in different places. The 5 basic steps of Advisor’s methodology are displayed in the Advisor Workflow pane in Visual studio here on the right – they help you find that hidden parallel program. First, you use the Survey tool to determine where your program spends most of its time – why is that important? Second, you insert annotations that indicate to Advisor where you might like to use parallelism. Third, you use the Suitability tool to determine if these locations will provide the desired parallel speedups. Fourth, you use the Correctness tool, discovering what data dependencies and race conditions will occur with this parallelism, and fix them. Can you correctly tease a parallel program out of the serial one? Finally, you actually convert your serial program to a parallel one by replacing annotations with parallel constructs. And once you have a parallel program, you can apply the rest of parallel studio. (Advisor toolbar)

Parallel Modeling for Serial Programs
Your application can’t fail due to bugs caused by incorrect parallel execution (it’s running serially) You can experiment with several different proposals before committing to a specific implementation You can refactor your code to prepare it to be easier to incorporate parallel frameworks later Many transformations that make your code easier to parallelize, also make it easier to read and maintain All of your test suites should still pass when validating the correctness of your transformations Modeling parallelism in your serial program is Advisor’s key technology. You don’t actually add parallelism to your code. You just indicate where you want it and the Advisor tools model how that parallelism would behave. This is a huge advantage over having to add parallel constructs. Your still serial program doesn’t crash due to race conditions, your test suites run the same as always. You don’t get different answers due to tasks running in different orders. This also allows you to refactor your program to remove race conditions and make it parallel-ready, while it is still serial. ? Are there any questions about why using parallel modeling is an advantage?

Duplo Application Used for Demonstration
Open-source application for finding duplicate blocks of code in a set of source files: input_filelist ( ) Duplo input_filelist output_file Standard output: one line per file: number of duplicate blocks; and total execution time Output_file: for each duplicate block: Two file names and starting line numbers Duplicate source code Duplo takes a list of file names as input and generates two outputs: Standard output prints one line per file and the number of duplicate blocks, with a summary at the end The duplicate blocks are written to the output file along with the two filenames for each block.

Survey Advisor Workflow Agenda Introduction Add Annotations
Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to Survey, the first step of Advisor’s workflow?

Amdahl's Law (paraphrased) “The benefit from parallelism is limited by the computation which remains serial” If you perfectly execute ½ of your application in parallel you will achieve < 2x speedup The implication of this is that you must focus your attention where your application spends its time Most of you have heard of Amdahl’s Law – a program’s parallel speedup is limited by the amount of code that remains serial. The obvious conclusion is that you need to discover where your program spends the most time and focus there to find parallelism.

Survey Find the places that are important to your application
So you want to run your program under profiling using the Survey tool to find the hot regions. You may already have an idea where to try parallelism and that is valuable – profiling will give you quantitative data about those sites. If you were applying serial optimizations to a program, you would look for hot spots that have the highest self time – the only way to speed it up is to reduce the time spent there. In contrast, for parallel optimization, you want to find regions with large TOTAL time – including time spent in called routines – and try to distribute that time over as many tasks as possible. ??Are there any questions about this point?? So you want to look along the call tree from the main routine to the hotspots for candidate regions for parallelism. This is exactly what the Survey tool displays: chains of loops and function calls, ordered by highest Total time. This screen shot shows the Survey Report for Duplo. Focus on the Total Time percentage column. Two hot chains are circled in Red, the first covering 85% of the time, and the second the remaining 15% of the time. Find the places that are important to your application

Two Candidate loops in Duplo::run()
15%: first loop reads the source lines from all files into memory, hashing each line. Complexity:O(n) 85%: second doubly-nested loop compares a file and all preceding files using Duplo::process(), writing output for matching code blocks. Complexity: O(n**2) These are the two hot chains. The 15% loop is the first loop in Duplo – it reads all of the files. The 85% loop is a doubly-nested loop, comparing a file against all preceding files for duplicate source code. Not only does the second loop take more time in this sample run, but it also has n-squared computational complexity. These make the second loop more important for parallelization than the first loop – but of course we can parallelize both.

Side Benefit: Inlining Opportunities Revealed
The two circled methods consume 25% of the time, and they are “leaf” methods, i.e., Total time = Self time Good candidates for inlining: either move method definition into Class definition, or just use Parallel Composer’s Multi-file interprocedural inlining: /Qipo Survey also helps us discover an opportunity for serial optimization. These two leaf functions in the red box consume 25% of the time. If you look at the code you see that they are each a single line and they are each called from a single site. They are great candidates for inlining, either by moving their definitions to the header file, or using Parallel Composer’s interprocedural inlining optimization. Doing so actually reduces the time by the full 25%, so the time was spent in function call overhead, and optimizations that were inhibited. Inlining: Serial time reduced by 25% !

Add Annotations Advisor Workflow Agenda Introduction Model Suitability
Survey Add Annotations Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to adding Advisor annotations?

Advisor Annotation Concepts
Advisor uses 3 primary concepts to create a model SITE A region of code in your application you want to transform into parallel code TASK The region of code in a SITE you want to execute in parallel with the rest of the code in the SITE LOCK Mark regions of code in a TASK which must be serialized All of these regions may be nested You may create more than one SITE Just macros, so work with any C/C++ compiler Having decided on some candidate parallel regions, you now insert Advisor annotations into your code to tell Advisor where you want to pretend to have parallelism These are the three most important annotations: A site is where you can have parallel tasks. The important part is that execution waits at the end of the region until all created tasks complete A task “pretends” to run in parallel with other tasks in the same parallel site. A Lock region “pretends” to limit execution to one task at a time, to protect against simultaneous access to the same object by different tasks. Annotations are actually macros that expand into calls to specially named dummy functions – the advisor tools recognize the names and model the corresponding parallel behavior. It’s almost like adding print statements to generate a trace of your program, and like printf statements, annotations don’t affect your program’s results. Since they are simple macros, annotations can be compiled by any compiler: like Microsoft Visual C++, as well as Intel Parallel Composer.

Second Loop (Compare) with Annotations
// Compare each file with each other ANNOTATE_SITE_BEGIN( MySite2 ); for(int i=0;i<(int)sourceFiles.size();i++){ ANNOTATE_TASK_BEGIN( MyTask2 ); std::cout << sourceFiles[i]->getFilename(); int blocks = 0; for(int j=0;j<(int)sourceFiles.size();j++){ if(i > j && !isSameFilename(sourceFiles[i]-> getFilename(), sourceFiles[j]-> getFilename())){ blocks+=process(sourceFiles[i], sourceFiles[j], outfile); } if(blocks > 0){ std::cout << " found " << blocks << " block(s)" << std::endl; } else { std::cout << " nothing found" << std::endl; blocksTotal+=blocks; ANNOTATE_TASK_END( MyTask2 ); ANNOTATE_SITE_END( MySite2 ); Here’s an example of how to pretend to make the second loop be a parallel loop, with SITE annotations (in blue) enclosing the loop, and TASK annotations (in green) enclosing the loop body. The names MySite2 and MyTask2 in the annotations are arbitrary. Note that YOU make the decisions about parallel regions, you insert the annotations – Advisor has a wizard that helps you get the syntax right. Propose how you would like to partition your algorithm

Annotations - LOCK Here’s Advisor’s Annotation Wizard in action. It is being used to insert lock_acquire and lock_release around a region of selected code in Duplo.

Model Suitability Advisor Workflow Agenda Introduction
Survey Add Annotations Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to modeling Suitability?

Suitability - Data Collection
Now that you have added annotations to both loops to indicate your parallel experiment, you can run the annotated Duplo under the Suitability tool to model what performance you can expect. Analyze your proposal to see if you made a suitable choice

Suitability Estimated Overall Speed-up Scalability Graph
The two sites seem very Suitable – the Estimated Overall Speed-up is 3X on 4 cpus. Both parallel loops are estimated to get about 3X speed-ups. However, notice that the first file-reading loop only contributes 1.1X to the overall program speed-up, whereas the second compare loop contributes 2.35X. But we expected this because of the times taken by each loop in the serial program from Survey. There are many pieces of information on this display, but the scalability graph summarizes most of it. It’s a log-log graph showing speed-up for different numbers of CPUs. If you’re in the green, you’re doing great If you’re yellow, you may need to make some adjustments Red means you’re slowing down – maybe you need to pick a different location. Scalability Graph See if your SITE meets your performance expectations

Suitability - Improvements
before Recommended Improvement after This slide shows how Suitability can also help improve your parallel-related performance. I tried adding a lock to Duplo in the second loop, but Suitability tells me it was a bad idea. The top “before” display is with locks; the circles are “in the red” and show no current speed-up. However, the bars indicate that there can be a range of performance depending on threading properties listed on the right. Advisor shows that significant speed-up can be achieved by reducing lock contention, and recommends doing so. In the “after” chart, by clicking that you promise to reduce lock contention, you will achieve much better estimated speed-up! Note, Advisor does not reduce the contention – you have to implement your decision when you convert to parallel code. User accepts recommendation Try alternatives to see how much they improve your model

Check Correctness Advisor Workflow Agenda Introduction
Survey Add Annotations Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to checking Correctness?

Correctness – Data Collection
By now you’re feeling pretty good about the parallel performance you may achieve on Duplo. But you still have to check if there are any problems that will arise when you go parallel, and if so, you need to fix them. That’s the purpose of Correctness Modeling. Build a debug version of the still serial Duplo containing annotations, and run it under the Correctness tool, using a cut-down data set. Correctness can run 100 times slower than the program, so observations are listed as they are encountered. There have already been some errors found, so you can stop the run early if you want. Analyze your annotations to see if you made a correct choice

Correctness - Problems
10 Race conditions found! Observations help identify problem Correctness modeling tracks all memory references and annotations as your program runs. It models what could go wrong if the tasks were run in parallel, with constraints of synchronization points such as locks. And when Correctness modeling finishes, it combines related observations into problems as displayed here. Although Duplo runs normally and generates the correct output, Correctness checking reveals that there are ten sharing errors if it was parallel – Ouch! The first error, P1, is Memory Reuse. This is an error when several tasks use the same object, but they always initialize it before using it so no values flow between the tasks. This incidental sharing can be fixed by giving each task its own copy of the object. Problem P4, a Data Communication error, is highlighted, and the source for several of the observations is displayed below. It means that values are flowing between tasks. This can often be fixed with a lock. Note that distinguishing these two problems would be very difficult in a true parallel program. But with parallel modeling, Advisor knows the EXACT order of memory references in the serial program. You may do have Data Communication or Memory Reuse problems to fix!

Correctness Source View Problem in stl::vector – not my mistake!?
Double-clicking on a problem takes you to the source view so you can perform more investigation, for example, seeing the call stack on the right used to reach a particular observation. You can click on a stack item to see the corresponding source code. This shows that the error is in the standard template library for Vector – this isn’t even my error, is it? Each problem shows how the observations in each TASK relate

Examine Source along Call Stack – Oops
Examine Source along Call Stack – Oops! Unprotected call to push_back in my code! Click on the entry on the call stack circled in green: it shows the source code from the run function, and reveals that the problem actually IS in Duplo’s code, circled in red, calling push_back, the fundamental problem is not in the standard template library. In fact, all of the problems from the first loop occur in this IF statement – updating global variables for each appropriate file. They can all be fixed by using lock annotations around the body of the IF. Actually, that’s the example used a few slides ago when demonstrating lock annotations!

Sharing Problems in Second Loop
Two problems involve m_pMatrix, a single matrix used as a work area while comparing 2 files. Fix: give each task its own copy of m_pMatrix The other two are global counters incremented by the tasks. Fix: lock annotations around the increment Advanced Fix: indicate the occurrence of a “reduction” via ANNOTATE_REDUCTION_USE(…); Success! The ten problems were not so hard to find and fix after all! The four problems in the second loop, MySite2 turn out to be relatively easy to fix, such as incidental sharing on the matrix. Shared counters are commonly fixed using locks, or a more specific reduction mechanism. So we have been able to fix all 10 problems. As a recap: Correctness modeling finds sharing problems and provides detailed information about them. But, you have to decide if you want to fix them, what changes to make, and then you edit the changes.

and then Repeat… You do not have to choose the perfect answer the first time, so you can go back and modify your choices Iterative refinement will either Create a suitable and correct annotation proposal Conclude no viable sites are possible Efficiently arriving at either answer is valuable You can fix the Correctness problems in the serial program, refactoring it so it is more parallel ready. Now repeat the last two steps: check that your performance is still Suitable, and if so, check Correctness to make sure you have eliminated all problems. You iteratively reach a program that is parallel-ready – or that is not suitable for parallelism. In either case, you have quickly determined if you have a good parallel site, so you’ve been more productive.

Correctness Checking only tracks memory dependencies between Tasks.
But what about the I/O in Duplo?? If we go Parallel, the Output will be Garbage! If the candidate loops are actually made parallel, the output lines will be interspersed from different tasks comparing different pairs of files! Correctness Checking only tracks memory dependencies between Tasks. Bonus from Parallel Modeling for Serial programs If we had gone parallel immediately after Survey, the jumbled output would have distracted us from determining the performance potential or finding and fixing the memory sharing problems! If the second loop is actually parallelized, then the output lines from duplicate blocks will be interspersed with each other, creating meaningless results. Why didn’t Correctness checking warn us of this problem? Because Correctness analysis only looks at memory dependencies between tasks. But what would have happened if we had to go immediately to parallel code instead of modeling parallelism in our serial program? The garbage output would have distracted us from getting Duplo parallel-ready.

Fixing Duplo’s I/O with Hyperobjects
Even if a lock protects function Duplo::reportSeq() (which outputs a block of duplicate code), the blocks will appear in arbitrary orders. Intel® Cilk™ Plus hyperobject types to the Rescue! reducer_ostream – “<<“ is guaranteed to keep output in sequential program order, even when called by multiple tasks in different orders reducer_opadd<int> - “++” safely performs the increment “reduction” on shared counters reducer_list_append – replaces stl::vector in first loop to keep candidate files in sequential order, but that’s another story… We can at least make each duplicate block contiguous by using a lock around function reportSeq, but they will be output in different orders. And that can be confusing, particularly if Duplo is used by QA folks in nightly testing – the output changes even though the blocks are the same. The Elegant solution is to use hyperobjects from Intel Cilk Plus! The key for keeping output in order is type reducer_ostream. And if we’re using Cilk Plus, we might as well use other hyperobjects like reducer_opadd and reducer_list_append.

Add Parallel Framework
Agenda Introduction Advisor Workflow Survey Add Annotations Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to adding parallel framework code?

Summary View; and pick Parallel Framework
The Summary view gives you a high-level overview of your annotations and analysis results After fixing any problems you will have an annotated parallel model which is suitable and correct. Next you choose a Parallel Framework; and replace the annotations listed in Summary with parallel constructs Advisor’s Summary view, which you reach by clicking on the Summary rectangle, shows for each parallel site: the modeled parallel performance, and the number of correctness errors. For example, the green box shows the estimated speedups for the loops in Duplo, and the red box shows the problems for each. For a large program, this is where you can decide which sites are worth working on, and which are too hard to fix or give too little speedup to bother with. After your program is parallel-ready, pick a parallel framework, and convert the annotations to parallel constructs. The Survey view shows where all the annotations are: double-click to enter the editor. Advisor documentation shows typical transformations from annotations to parallel constructs.

Intel® Parallel Building Blocks Tools to optimize app performance for the latest platform features
Intel Parallel Building Blocks provides several parallel frameworks you can use: Intel Cilk plus is a simple, 3-keyword language extension for task, data, and vector parallelism Intel Threading Building Blocks is a popular library using C++ templates that provides both task and data parallelism, as well as concurrent containers – some of the container classes from the Standard Template Library that are protected against race conditions. And Array Building Blocks, a C++ library for data and vector parallelism that dynamically adjusts to the chip it is running on. Array Building Blocks is currently in Beta test. You can mix and match new parallel models within an application to suit the developer’s environment / application and algorithms We cheated and already chose Cilk Plus so we could take advantage of Hyperobjects.

Parallel Framework – Intel® Cilk™ Plus Loops
This slide shows the Advisor documentation for translating from annotations for a parallel loop, to a Cilk Plus for loop.

Duplo: Parallel Results on 4 cores
This chart shows respectable speed-up of Duplo on 4 cores. The top group is the standard Duplo, the lower group has equals and getLine inlined. The Yellow bar is the measured speed-up; Orange shows the Suitability estimate of Gain. The pairs of bars have very similar lengths, which means the Suitability estimates are surprisingly accurate compared to the actual speed-up. The first Duplo result shows only parallelizing MySite2, the compare loop, which achieves 2.5X Next we also parallelize MySite1, the first loop, which gets us to 3X. Finally, we also parallelize MySite3, the inner of the nested compare loops, getting us to 3.5X . Note that parallelizing both nested loops provides lots of tasks, which improves load balancing.

Conclusion Agenda Introduction Advisor Workflow Survey Add Annotations
Model Suitability Check Correctness Add Parallel Framework Conclusion Are there questions before we move on to the conclusion?

Conclusion The Intel Parallel Advisor is a unique tool
assists you to work smarter though detailed modeling guides you through the necessary steps leaves you in full control of your code and architectural choices lets you transform serial algorithms into parallel form faster The parallel modeling methodology maintains your original application’s semantics and behavior helps find the natural opportunities to exploit parallel execution In conclusion: I hope I have been able to demonstrate the value of Advisor for adding parallelism to your programs. The modeling provides information about your parallel experiments Advisor’s methodology takes you through the necessary steps, but you remain in control - Advisor does not automatically change your program You progressively refactor your serial program into a parallel solution. I also hope I have convinced you of the advantage of using parallel modeling so you can stay with your serial program as long as possible.

Intel® Parallel Studio 2011
More information about Parallel Studio and Parallel Advisor is available online, including a 30-day free trial Supports Microsoft Visual Studio* 2005, 2008 and 2010. This URL is where you can get more information about parallel studio and parallel advisor.

Questions? Acknowledgement
Don’t forget to check for the next Parallel Webinar: Modeling The Parallelism Inherent in Applications by Paul Petersen (tentative) July 21th, 2011, 9am PDT Acknowledgement See the Advisor and Duplo chapters in Parallel Programming with Intel Parallel Studio; Stephen Blair-Chappell and Andrew Stokes, Wiley, ISBN: ISBN , due to be published November 2011.

Optimization Notice Optimization Notice
Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #

Legal Disclaimer http://intel.com/software/products
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © Intel Corporation.

Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application

Similar presentations

Presentation on theme: "Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application

Similar presentations

Presentation on theme: "Intel® Parallel Advisor 2011 Shows its Stuff on the Duplo Application"— Presentation transcript:

Similar presentations

About project

Feedback