Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners www.intel.com/software/products.

Slides:



Advertisements
Similar presentations
Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Using MapuSoft Instead of OS Vendor’s Simulators.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Chapter 2 Operating System Overview Operating Systems: Internals and Design Principles, 6/E William Stallings.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Optimizing Windows Vista Performance Lesson 10. Skills Matrix Technology SkillObjective DomainObjective # Introducing ReadyBoostTroubleshoot performance.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
- 1 - Copyright © 2004 Intel Corporation. All Rights Reserved. Maximizing Application’s Performance by Threading, SIMD and micro arcitecture tuning Koby.
Chapter 11 Operating Systems
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
Maintaining and Updating Windows Server 2008
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
1 Intel® Compilers For Xeon™ Processor.
1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
Threads, Thread management & Resource Management.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
CHAPTER TEN AUTHORING.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
1. 10/24/ Upon completion of this module, you will be able to: Use Thread Checker to detect and identify a variety of threading correctness issues.
* Third party brands and names are the property of their respective owners. Performance Tuning Linux* Applications LinuxWorld Conference & Expo Gary Carleton.
Intel Software Development Products. ZJU-Intel Embedded Technology Center VTune ™ Performance Analyzer  Helps you identify.
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
Correcting Threading Errors with Intel® Parallel Inspector.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Full and Para Virtualization
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Tuning Threaded Code with Intel® Parallel Amplifier.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
In an increasingly competitive industry is certified by a recognized provider as Microsoft exam will dramatically improve your chances busy. Microsoft.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Assembly Language for Intel-Based Computers, 5th Edition
Introduction to Operating System (OS)
Many-core Software Development Platforms
Intel® Parallel Studio and Advisor
Chapter 4: Threads.
Intel Software College
Tuning Threading Code with Intel® Thread Profiler for Explicit Threads
Operating Systems : Overview
Chapter 4: Threads & Concurrency
Operating Systems : Overview
Lecture 2 The Art of Concurrency
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Improve Application Performance on Windows*

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners What is the world’s biggest semiconductor company doing building software products?

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 3 Intel ® Software Development Products  Intel® Compilers Best way to get application performance on Intel processors  Intel® VTune™ Analyzers Quickly identify “hot spots” and how to fix them  Intel® Performance Libraries Highly optimized, ready to use building-block functions  Intel® Threading Tools Speeds, simplifies development & maintenance of threaded apps  Intel® Cluster Tools Create, analyze, optimize and deploy cluster-based applications Intel Software Development Products for Intel® Personal Internet Client Architecture processors, Pentium® M, Pentium® 4, Intel® Xeon™ and Itanium® 2 Processors

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 4 Intel® Software Development Products  Performance –Enable developers to deliver higher performance software  Compatibility –Compatible with the leading tools and development environments already used by many software developers –Easy to incorporate into the development process  Support –Premier Customer Support –Technical training offered through Intel Software College

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel Compilers

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 6 Compilers for Intel PCA, Intel® 32-bit, EM64T & Itanium® 2 Processors  Intel compilers for Intel PCA processor line support Intel® Wireless MMX™ technology  Intel 32-bit processor support: SSE3, Intel Net Burst® microarchitecture, Hyper-threading  Itanium® 2 processor support: software pipelining, improved branch prediction, branch reduction thru predication  Advanced optimization features of Intel compilers –Profile Guided Optimization, Inter-Procedural Optimization –Parallelism: Auto-parallelization, vectorization, OpenMP* support –Data prefetching –Processor dispatch on IA-32 processors  Intel® Premier Support: Compiler updates, support, expertise, customer interaction via compiler forums, architectural information, white papers and more

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 7 Intel Compilers Optimize for Specific Processors Instruction Scheduling –Schedule instructions to be optimal for specific processor –How? On Windows: /G1, /G2, /G5, /G7… Build target for specific processor –For target processor it uses processor specific opcodes & features like SSE, SSE2, Vectorization –Runs only the target processor –How? On Windows*: /QxK, /QxW, QxB… Automatic Processor Dispatch –Runs on all x86 processors –How? On Windows*: /QaxK, /QaxW, /QaxB…

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 8 Intel Compilers High-Level Optimizations High-Level Optimizer –Performs loop level optimizations, aids optimal memory access –How? On Windows: /O3 Inter-Procedural Optimization –Enables inter-procedural optimizations for single/ multiple files –How? On Windows*: /Qip, /Qipo Profile Guided Optimization –Use execution-time feedback to guide optimization –Aids paging, branch-prediction, basic block reordering –How? On Windows*: /Qprof_gen, /Qprof_use

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 9 Intel Compilers Using Parallel Programming Directives Auto-Parallelization –Automatically converts loops to use multiple processors –How? On Windows*: /Qparallel OpenMP Support –Intel Compilers supports multi-platform shared-memory parallel programming in C/C++ and FORTRAN on all platforms & OS –How? On Windows*: /Qopenmp OpenMP usage example #pragma omp parallel for for (i = 0;i < n; i++) { dy[i] = dy[i] + da*dx[i]; } dy[i] = dy[i] + da*dx[i]; }

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 10 Intel® Code Coverage Tool Example of code coverage summary for a project. The workload applied in this test exercised 34 of 143 blocks, representing 5 of 19 functions in 2 of 3 modules. In the file, SAMPLE.C, 4 of 5 functions were exercised Clicking on SAMPLE.C produces a listing that highlights the code that was exercised. In this example, the pink-highlighted code was never exercised, the yellow was run but not exercised by any of the tests set up by the developer and the beige was partially covered.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 11 Intel® Test Prioritization Tool  Helps guide and speed software testing, –Helps produce better code more quickly –Helps improve programmer productivity  Example: –These 3 achieve 52.17% block and 50.00% function coverage –Test 3 alone covers 45.65% of basic blocks or 87.50% of total block coverage from all tests –By adding Test 2, cumulative block coverage goes to 52.17%, or 100% of the total block coverage of Test 1, Test 2, and Test 3 –Eliminating Test 1 has no negative impact on block coverage and saves time

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 12 Intel® Compilers 8.1  C++ and Fortran  IA-32, Intel® Itanium® 2, EM 64T & Intel® PCA processor-based systems  Intel® Code-Coverage & Intel® Test-Prioritization tools  Threaded application support (Hyper-Threading Technology) –OpenMP* 2.0 standard support –Auto-Parallel feature that automatically generates threaded code  Windows specific: –Integrates into MS Visual Studio.NET* IDE –Support for MSVC.NET* language features (no support for C# or managed code) –Compaq Visual Fortran* language features with Intel code generation and optimization technology

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel VTune Performance Analyzer

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 14 Performance Tuning  Detecting common issues –Where to add threads, what to optimize? –Load imbalance? –Wait, blocked, or idle time? –Excessive overhead? –Processor architecture issues? –Application issues? No particular order: Address issues as needed

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 15 Intel ® VTune™ Performance Analyzer  VTune analyzer’s intimate knowledge of the processor enables it to provide extensive insights into how software utilizes CPU resources  Allows you to identify and locate performance bottlenecks in your code –Collects and displays software performance data –Features that help you identify and address performance issues:  Sampling that uses non-intrusive technologies  Call Graph that displays graphically the program’s flow of control  Analyzer that has detailed knowledge of the processor’s microarchitecture  Intel Tuning Assistant that suggests optimization techniques for your Windows code “The Intel VTune Performance Analyzer took a multi-day task and turned it into a sub-day task.” — — Randy Camp, V.P. Software Research and Development, MUSICMATCH, Inc.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 16 Sampling – Identifying Performance Bottlenecks  “Sample” the CPU’s execution context  As program runs, gather occasional CPU context snapshots triggered by CPU’s performance monitoring registers –Interrupt based sampling using CPU registers –Low intrusion – doesn’t change performance of the software –No special builds required  Sample rate set to provide statistically meaningful data –Based on CPU clock speed or can be auto-calibrated  Can measure performance sensitive CPU events –Cache misses, branch mispredictions, etc.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 17 How to use Intel VTune Performance Analyzer Build the application –Build the application in Release mode with compiler optimizations Find “Hotspots” using VTune –A “Hotspot” in an application or a system is a section of code where there is a significant amount of activity. –Finding “hotspots” would assist you in determining the compiler/ code optimizations required for gaining performance improvement. Symbols required for VTune Analyzer –Required Intel compiler switch (on Windows*): /Zi

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 18 Start New Project using Sampling Wizard Intel VTune Performance Analyzer Select Application Type to Profile Select Application to Launch

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 19 Understanding VTune Interface Choose Project/ Activity/ Run Choose Project/ Activity/ Run Different Views System-wide performance data Most Instructions Retired Statistics Summary Events Measured Sampling Analysis Per CPU Analysis Status Output

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 20 Hotspot Drill Down Function Statistics LINPACK performance data Symbols required for Hotspot Drill-down Events Measured Is this the Hotspot? More analysis needed. Use VTune Call Graph feature to obtain flow info!

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 21 Source Level View “Hotspot” source Efficiency (CPI) View Assembly

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 22 Using Sampling & Call Graph Together Why?  Use sampling to find which functions have hotspots.  Use call graph to find out who is calling these functions. Why?  Use sampling to find which functions have hotspots.  Use call graph to find out who is calling these functions.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 23 What Are Users Saying “SGI develops applications for its computers that employ many levels of parallelism, demanding the highest level of performance. The VTune Performance Analyzer for Windows provided invaluable insights to the correction of performance bottlenecks in these applications at the process, thread, and basic block levels." –Arthur Raefsky, Technical Lead, SGI, Mountain View, CA

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel Threading Tools

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 25 Threads Defined  OS creates process for each program loaded –Each process executes as a separate thread  Additional threads can be created within the process  All threads share code and data –Each thread has its own Stack and Instruction Pointer  OS creates process for each program loaded –Each process executes as a separate thread  Additional threads can be created within the process  All threads share code and data –Each thread has its own Stack and Instruction Pointer … Data Code thread2() Stack IP threadN() Stack IP Process thread1() Stack IP Threading Overview

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 26 Amdahl’s Law Threading Overview If only 1/2 of the code is parallel, 2X speedup is unlikely If only 1/2 of the code is parallel, 2X speedup is unlikely P = parallel portion of process N = number of processors (cores) O = parallel overhead time PP P(1-P) T Total

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 27 Correctness Bugs: Data Races Threading Overview: Challenges Unique to Threading Thread1 x = a + b Thread2 b = 42  What is value of x if: –Thread1 runs before Thread2? –Thread2 runs before Thread1?  Data race: concurrent read, modify, write of same address x = 3 x = 43  Suppose: a=1, b=2 Outcome depends on thread execution order

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 28 Solving Data Races: Synchronization Thread1 Acquire(L) a = 1 b = 2 x = a + b Release(L)  Acquisition of mutex L ensures atomic access –Only one thread can hold lock at a time  Example APIs: -EnterCriticalSection(), LeaveCriticalSection() -pthread_mutex_lock(), pthread_mutex_unlock() Thread2 Acquire(L) b = 42 Release(L) Threading Overview: Challenges Unique to Threading

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 29 Performance Penalty: Synchronization  Thread blocked waiting for Mutex –Thread not running, so no parallelism  Mutex Release, Acquire takes time –Release marks mutex free –Acquire must check for free  If free, mark as in use  If not free, thread put to sleep –Costs context switch out and in of processor Threading Overview: Challenges Unique to Threading

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 30 Problem Statement  Developing threaded applications is hard  New class of problems are caused by the interaction between concurrent threads –Correctness problems (data races, deadlocks, etc) –Performance problems (contention, imbalance, etc) Threading Overview

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 31 Software Development Cycle Introduce Threads –Intel® Performance libraries: IPP and MKL –OpenMP* (supports incremental threading) –Explicit threading (Win32*, Pthreads*) Debug for correctness –Intel® Thread Checker –Intel Debugger Tune for performance –Thread Profiler –VTune™ Performance Analyzer Scope of the Tools

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 32 Intel® Software Development Products  Intel® Thread Checker and Thread Profiler  VTune™ Performance Analyzer –Prerequisite for Intel® Threading Tools –VTune analyzer has thread support  Intel® Compilers support OpenMP* and the Threading tools –More detailed results are generated with the Intel compilers  Intel Performance Libraries are thread safe –Many functions are threaded

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 33 Common Threading Errors/Bugs  Race conditions –Unprotected concurrent access to shared variables by multiple threads –Most common error  Deadlocks –Multiple threads waiting on resources that are held by other threads  Thread stalls –Threads waiting on resources infinitely

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 34 Intel® Thread Checker Intro  Identifies threading bugs in applications threaded with: –Windows* threads on Windows* systems –OpenMP* on Windows* systems  Plugs into VTune™ environment –Windows* for IA-32 systems Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 35 Intel® Thread Checker Analysis  Dynamic monitoring as software runs –Data (workload) -driven execution  Includes monitoring of: –Thread and Sync APIs used –Thread execution order  Scheduler impacts results –Memory accesses between threads Only executed code path is analyzed Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 36 Thread Checker Usage  Dynamic Correctness tool –Dataset selection is important  Must touch all code paths –Multiple runs exercising different data paths yield best results –Use small data set for each path  Monitoring of all memory references is time consuming Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 37 Starting Thread Checker  Start VTune™Performance Analyzer 1 2 Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 38 Diagnostics List Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 39 Location in Source Code Each entry in the diagnostics list links to its source code line(s) Intel® Thread Checker

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 40 Common Performance Issues  Parallel Overhead –Due to thread creation, scheduling..  Synchronization –Excessive use of global data, contention for the same synchronization object –Implicit synchronization  Load balance –Improper distribution of parallel work  Granularity –No sufficient parallel work

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 41 Thread Profiler  Plugs in to the VTune™ performance environment  Identifies performance issues in OpenMP* or unstructured threaded applications using the Win32*  Pinpoints performance bottlenecks that directly affect execution time  Uses binary instrumentation technology Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 42 Thread Profiler  Uses critical path analysis  Provides a breakdown of execution time along the critical path –Provides insight into system utilization  Under-subscribed vs. over-subscribed –Thread state transitions  Blocked->Running, call stack information Allows comparison of multiple runs Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 43 Execution Flows and Critical Path  Multiple execution flows in applications  Flow splits when a thread creates new threads or signals another thread to continue  Flow ends when a thread stalls or terminates Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L critical path Longest flow is the critical path Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 44 Why use Critical Path?  Goal is to shorten the execution time  Shorten the critical path and you shorten the total execution time  Events recorded are events that impact the critical path –Lock/Unlock –Thread Creation, suspension, resume, termination –Blocking calls, external events Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 45 Critical Path Analysis  System Utilization –Idle, serial, parallel and oversubscribed –This is relative to the system the application is running on  Time categories along critical path (CP) –Cruise, overhead, blocking and impact time  Resulting view is a combination of utilization and execution time along CP Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 46 System Utilization  Examines processor utilization to determine parallel activity of the application  Concurrency is the number of threads that are active Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Thread Profiler: Critical Path Analysis Categorization shown for a system configuration with 2 processors Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L Idle Serial Parallel Under-subscribed Over-subscribed

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 47 Execution Time Categories  Analyze critical path by “colorizing” the time spent along it.  Associate spans of time with the objects that caused the critical path transitions Thread Profiler: Critical Path Analysis Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Cruise time Overhead Blocking time Impact time Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 48 Critical Path View Thread 1 Thread 2 Thread 3 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T8T8 T9T9 T 10 T 11 T 12 T 13 T 14 T 15 Thread Profiler: Critical Path Analysis Critical Path View Time  Start with the critical path  Break down by system utilization  Add overhead  Further categorize by behavior Acquire lock L Wait for Threads 2 & 3 Wait for L Release LWait for L Release L Idle Serial Parallel Under-subscribed Over-subscribed Categorization shown for a system configuration with 2 processors Cruise time Overhead Blocking time Impact time

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 49 Thread Profiler Views  Critical Path View –Shows breakdown of the critical path  Profile View –Shows the breakdown of selected critical paths –Use can select other views of the selected profile –Concurrency level, threads, objects..  Timeline View –Shows thread activity and critical path transitions for the entire application  Source View –Transition source view, creation source view Intel® Threading Tools: Thread Profiler

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 50 Intel® Thread Checker  Locates threading bugs: –Data races (storage conflicts) –Deadlocks (potential and actual)  Isolates bugs to source code line  Describes possible causes of errors and suggests resolutions  Categorizes errors by severity level  Identifies threading bugs in applications threaded with: –Windows* threads on Windows* systems –OpenMP* on Windows* systems  Plugs into VTune™ environment –Windows* for IA-32 systems

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 51 Thread Profiler 2.1  Plugs in to the VTune™ performance environment  Identifies performance issues in OpenMP* or unstructured threaded applications using the Win32*  Pinpoints performance bottlenecks that directly affect execution time  Uses binary instrumentation technology

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel Software College

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 53 Expert Intel ® Software College  High-quality training by expert trainers worldwide –Take advantage of the latest Intel processors, platforms, tools and technologies  Flexible training offerings –On-line, On-site, or at Intel facility  Classroom-based or online, self-paced or custom course offerings Visit the Intel Software College website: "I attended the VTune and Compiler courses at the ISC … I am able to apply what I learned at the ISC to optimizing applications that matter to my company's business. The ISC courses were probably the best that I have had as a professional in terms of delivering on what they said they would teach." — — Keith Fish - ISV Technical Consultant, Hewlett- Packard Company

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 54 “Registering for support was easy, and we value the security of knowing that Intel is there to help, even though we haven’t needed it so far.” — — Rob Hoffmann - Director of Marketing, NewTek, Inc. Intel Premier Support  Every purchase of an Intel software development product includes a year of support services  Provides access to Intel® Premier Support and all product updates during that time  Premier Support includes online access to Intel’s Premier Support Website –Primary support for all Intel Software products –Issue submission & tracking –Product updates & related downloads –FAQ’s & other proactive notices –128-bit encrypted communication protects confidentiality –Dedicated expert staff review submissions and respond within 4 Intel business hours

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 55 Intel ® Software Development Products From Supercomputers to Cell Phones, Intel Software Development Products Enable Application Development Across Intel Processors VTune™ Performance Analyzer Libraries Threading Tools Compilers Math Kernel Library Integrated Performance Primitives Thread Checker C++ C++ MS Windows* Win CE Intel Software Development Products Fortran NA ShippingFuture Performance Analyzers Cluster Tools NA Trace Analyzer / Collector NA Palm* Symbian* Nucleus* Debuggers C++ C++ NA

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners 56 Next Steps  Evaluate the Products –Download at:  Contact Vivek Venkatesh with questions –