VTune: Intel’s Visual Tuning Environment

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
An Analysis of SIMD Instructions in the Pentium III Microprocessor By Alexander J. Aved 05 DEC 2000 CS689 Ball State University Muncie, Indiana.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Lab6 – Debug Assembly Language Lab
1 Lecture 6 Performance Measurement and Improvement.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.
Chapter 10 Application Development. Chapter Goals Describe the application development process and the role of methodologies, models and tools Compare.
Types of software. Sonam Dema..
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
1 4.2 MARIE This is the MARIE architecture shown graphically.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Intel Software Development Products. ZJU-Intel Embedded Technology Center VTune ™ Performance Analyzer  Helps you identify.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Full and Para Virtualization
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Chapter 17 Looking “Under the Hood”
GCSE Computing - The CPU
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Introduction to Web Assembly
Chapter 1 Introduction.
Netscape Application Server
Microarchitecture.
Computer Architecture
Visit for more Learning Resources
Microprocessor and Assembly Language
INTEL HYPER THREADING TECHNOLOGY
Part IV Data Path and Control
Chapter 1 Introduction.
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Introduction Enosis Learning.
Vector Processing => Multimedia
Chapter 4: Threads.
Introduction to Pentium Processor
Part IV Data Path and Control
Intel® Parallel Studio and Advisor
Microprocessor & Assembly Language
Compiler Back End Panel
CIS16 Application Development – Programming with Visual Basic
CMSC 611: Advanced Computer Architecture
Introduction to .NET By : Mr. V. D. Panchal Content :
Assembly Language for Intel-Based Computers
Compiler Back End Panel
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
CMSC 611: Advanced Computer Architecture
Multi-Core Programming Assignment
Chapter 17 Looking “Under the Hood”
GCSE Computing - The CPU
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Dynamic Binary Translators and Instrumenters
Xilinx Alliance Series
Presentation transcript:

VTune: Intel’s Visual Tuning Environment K. Sridharan, VTune Development Manager

What is VTune? VTune is a performance tuning environment for Windows™ developers from Intel. VTune is now bundled with: Intel C/C++ and FORTRAN compilers Intel Performance Library Suite Intel Architecture tutorials, Processor Manuals and Computer Based Training Materials. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Overview of VTune VTune is a Performance tuning tool for Windows 95 and NT* developers to: Monitor the performance of all active software. Identify “HotSpots” in a program and analyze its performance as it executes on an Intel Architecture microprocessor platform. Examine each instruction and uncover problems at machine code level. Help optimize code using context-sensitive on-line tuning suggestions. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Time-Based Sampling A periodic interrupt drives the Sampling Interrupt sources can be VTD, RTC or NMI*. Sample data stored in buffer until its full. Buffer flush disables sampling during write. Analyze the sampling data. Data consists of cs:eip and module name data. Match addresses with an application or OS routine. Sampling data stored in Access 7.0 DB. Easy to import Sampling data into Excel. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Event-Based Sampling on Pentium®Pro Processor CPU Events report on internal CPU states. Event Sampling enabled by CPU and APIC VTune helps manage the choices Data Cache Misses Partial Stall Counts Branch Statistics Mispredictions and Total Branches Taken Clock Count Statistics Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

HotSpot Analysis From the System-wide view, you can zoom into a specific module of interest. Display all of the HotSpots organized by functions in the module, memory locations, class names and source files. For each HotSpot, VTune displays the symbol name, address, and the number of samples collected. For detailed information, double click on a HotSpot to open a source code or assembly view. The HotSpot view helps you to identify sections in your code that take the most CPU time and that have potential performance problems. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Static Analysis Shows performance information by instruction Target machine can be different than Host machine. Indicates pairing on Pentium® processor, decoder groups on PentiumPro Processor. Shows penalties incurred by the instruction. Shows execution clocks or micro-ops. Disassembly fits into a Portal which can be moved or enlarged. Size of Portal affects speed of analysis. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Dynamic Analysis Assists in pin-pointing the issues identified by sampling or static analysis. Uses advanced simulator technology to run, trace and simulate your actual code in one step Fine-tune performance attributes of key sections of your code, such as: Cache behavior analysis Branch Prediction results Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

VTune’s Code Coaches C and Fortran Code Coaches offer specific suggestions on the modifications to improve performance; Examples: Loop interchange and loop Invariant code motion Converts from scalar to mmx using intrinsics calls Traversing an array/list: binary search, hash tables Consider using MMX(tm) technology Consider using Intel’s Performance Libraries. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Intel Compiler Strategy Intel’s compilers track Intel microprocessor improvements Available as a plug in extension to the “Microsoft Developer’s Studio” Intel’s compiler optimizations and code generation is tuned to each new processor. This includes support for new instructions like PP cmov and MMX™ instructions. This compiler technology is available as a drop in product in the Microsoft Developer’s Studio Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Compiler Performance Features Profile guided optimizations Floating point optimizations Floating point alignment MMX™ Technology Today I’ll be talking about a few of the performance features in the Intel compilers Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Intel Performance Library Suite Signal Processing Library V3.0 DSP, Filtering, Transform and Telephony Recognition Primitives Library V3.0 Voice/Speech Recognition and Processing Image Processing Library V1.0 Beta Photo Editing, Enhancement and Transform Math Kernel Library V2.0 Scientific and Technical Computation Thank you for coming and listening to our presentation. My name is so and so. I am going to talk about the Intel Performance Library Suite, another software product offered by Intel. It is developed to enhance some of the computational intensive operations in typical media and scientific applications. There are two different types of libraries we will talk about. The first set is consisted of three C-callable multimedia libraries optimized for MMX that includes the Signal Processing Library, the Recognition Primitives Library and the Image Processing Library. And the second is the Math Kernel Library, which is a floating point Fortran callable library for scientific applications on Workstations. Each library is designed to address certain media market segments. For example : we are quite successful in getting speech vendors, like IBM, purespeech, NEC to use intel splib and rplib in their voice recognition software; companies like MGI have already used iplib in their imaging products. You can visit our demo station to see some of these applications. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Why Use Intel’s Performance Libraries ? Highly optimized for performance Alternative for code development Stay current with the latest Intel Architectures High level programming language interface Reduce development effort Why Use the Libs? Our libraries are optimized at the assembly level with tuned cache usage to strive for the highest performance. Our goal is to provide an alternative for code development which allows developers to access the performance benefits of the latest intel architecture, specifically the MMX™ Technology, even before development tools are available. Basically the performance libraries keep you current with present and future Intel architectures. As a developer, you can either read new instruction and cache specifications for each new processor that comes out and start another cycle of hand tuned assembly coding, OR, just get on the performance library train for an optimized ride. You'll stay current and optimized without re-coding a single line -- or at least, your competitors will. Our libraries are constantly updated to support new intel microprocessors, and their function interface are maintained the same across all platforms. We provide a high level programming language interface to our hand tuned and optimized code, so that you can just concentrate on your application. Potentially, can help to eliminate partial or all of your assembly coding efforts. All these benefits are translated to less development effort and therefore faster time to market. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

VTune Coming Attractions VTune Support For Java: First release is VTune 2.5: view performance in every component: including Java class files call graph working with major vendors. Beta this month. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

VTune Coming Attractions Beta later this year. All OS events (Perfmon) and Events over time Process, Processor Views. Call graph Memory Pattern Analysis C++/ASM coach Dynamic Analysis for PII Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Backup Slides

Key Features: Code Analysis of binary files Perform code analysis without the need to create an executable and sample. Accepts .exe, .dll or .obj files. Zoom into source code by double-clicking on the desired function from the summary. If no source code is available, the binary is disassembled. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Source View C/C++, FTN or Assembly or ?? displayed. Sample times shown by source line. Source and disassembly can be intermixed. Any of these file types can supply symbol and line number information: .SYM, .HDR, .DMP, .DBG, .MAP, .PDB , C7/CV (NB09), NB10, NB11, FB09, FB0A, DWARF 2. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Libraries Platform Support 32-bit operating system (Microsoft* Win 95 and Microsoft* Win NT) Compiler support Multi-Media Libraries (DLLs and Static Libraries) Intel C/C++ Compiler Plug-in MSVC/C++ V4.x Borland V5.0 Math Kernel Library (Static Libraries) Intel Fortran 77 Compiler Microsoft Powerstation Fortran V4.0 Watcom Fortran 77 V10.6 The library suite is designed for 32-bit operating system only, specifically Microsoft Win95 and Win NT. All the multi-media libraries are available in DLL and static library formats, which support Intel, Microsoft and Borland C/C++ compiler. The Math Kernel Library currently only comes in static library form which supports Intel, Microsoft and Watcom Fortran compiler. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Multi-Media Library Features High performance Optimized processor specific DLLs and static libraries for : Intel 486 TM Pentium® Processor Pentium® Processor with MMXTM Technology Pentium® Pro Processor Pentium® II Processor Scalable architecture Processor detection and processor specific DLL loading C-Callable programming interface Support for integer and floating point data Custom DLL Builder for minimum memory footprint Let’s move on to talking about some of the common features presented in those three multi-media libraies, which totally comprised of more than 350 functions. We distribute a full set of DLL and Static Libraries that are optimized for each processor listed here. From Intel 486 to the latest Pentium II processor, which will be launched in couple months. The libraries implemented a scalable architecture that contains code segment to detect what an application is currently running on, and dynamically load in the processor-specific DLL for optimal performance. Dynamic DLL Loading only works with DLL, not with static libraries. These libraries provide a C-callable function interface. Most library functions support both floating point and integer data type. Only integer functions can take advantage of the MMX™ Technology. We have also provided a utility to allow developer to custom build their own DLL. We will describe this in more details later slide. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation

Math Kernel Library Workstation technical computing library Basic Linear Algebra Subroutines (BLAS) Single and double precision FFTs Main features Optimized kernels at every level of BLAS Level 3 BLAS multithreaded - performance scales on SMP systems Optimized for Pentium® Pro Processor Interfaces to several FORTRAN compilers Static libraries DLL will be available The Math Kernel Library is different from the previous three mutli-media libraries. The Math library has a Fortran programming interface, and is targeted for technical computation on an Intel workstation platform. The Math Kernel Library implemented an optimized version of the Basic Linear Algebra Subrountines, which is a public standard, involving extensive vector and matrix operations. We have also included single and double precision FFTs for additional functionality. Our BLAS routines are highly optimized for Intel Pentium Pro at both the source level and the assembly level. Level 3 BLAS is also implemented with multi-thread for scalable performance on SMP systems. Matrix operations are blocked for optimal use the cache memory. Currently, the math kernel library is only available in static library form, DLL versions will be available in the future. For date of availability, please check our Web site for updates. Monday, April 22, 2019Monday, April 22, 2019 Intel Corporation