*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.

Slides:



Advertisements
Similar presentations
TM 1 ProfileMe: Hardware-Support for Instruction-Level Profiling on Out-of-Order Processors Jeffrey Dean Jamey Hicks Carl Waldspurger William Weihl George.
Advertisements

Memory.
Exploring P4 Trace Cache Features Ed Carpenter Marsha Robinson Jana Wooten.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 Lecture 6 Performance Measurement and Improvement.
Operating System Support Focus on Architecture
OPERATING SYSTEM OVERVIEW
Computer System Overview
Operating System Kernels1 Operating System Support for Performance Monitoring Witawas Srisa-an Chapter: not in the book.
Stored Program Concept: The Hardware View
Computer System Overview
Figure 1.1 Interaction between applications and the operating system.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Cortex-M3 Debugging System
Operating System Overview
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Introduction of Intel Processors
Performance Monitoring on the Intel ® Itanium ® 2 Processor CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 1 Computer System Overview.
Chapter 1: Introduction. 1.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 1: Introduction What Operating Systems Do Computer-System.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Intel Multimedia Extensions and Hyper-Threading Michele Co CS451.
1 Computer Architecture. 2 Basic Elements Processor Main Memory –volatile –referred to as real memory or primary memory I/O modules –secondary memory.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Computer Architecture Organization and Architecture
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA February 2005 Jim Callister Intel Corporation.
Computer Systems Overview. Lecture 1/Page 2AE4B33OSS W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
1 Computer System Overview Chapter 1. 2 Operating System Exploits the hardware resources of one or more processors Provides a set of services to system.
Chapter 1 Computer System Overview
What we need to be able to count to tune programs
CSCI 315 Operating Systems Design
Superscalar Pipelines Part 2
I/O Systems I/O Hardware Application I/O Interface
Tools.
Ghifar Parahyangan Catholic University August 22, 2011
Operating Systems : Overview
Tools.
Operating Systems : Overview
Operating Systems : Overview
Chapter 1 Computer System Overview
Operating Systems : Overview
Computer System Overview
Chapter 2 Operating System Overview
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring on Pentium® 4 * Processor IA 32 Performance Architect

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Outline Pentium® 4* Processor Performance Monitoring features Pentium® 4* Processor Performance Monitoring features Implementation Implementation How Intel® uses Performance Monitors How Intel® uses Performance Monitors Limitations Limitations Open issues Open issues

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Feature Overview Counters bit programmable counters bit programmable countersEvents 45 events in various parts of the machine 45 events in various parts of the machine Counter increment control –qualification by current privilege level (O/S, USER) –qualification by hardware thread id –edge detection –threshold comparison –interrupt on counter overflow Interface ( x86 instructions to set/read counters) –WRMSR (write machine status register) –RDMSR (read machine status register) –RDPMC (read performance monitoring counter) –RDTSC (read time-stamp counter)

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Features Overview, cont. Cascading Cascading –Second counter begins counting when first counter overflows –For instance, to measure cycles elapsed after the first counter overflowed. Tagging Tagging –Used to get non-speculative event counts –Tags micro-ops when they incur an event –Counts tagged micro-ops at retirement –Three tagging mechanisms: front-end, execution, and replay

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Precise Event Based Sampling Mechanism Mechanism –User allocates a PEBS buffer in memory –User programs a counter to tag micro-ops and count them as they retire –When the counter overflows, the Pentium® 4 Processor ’s retirement logic forces a microcode assist just before the next tagged micro-op –Microcode assist copies the program counter and GPRs into the PEBS buffer in memory Advantages Advantages –Precise: taken at instruction which had an event –Enables creation of data address profiles and locate cache lookup patterns and data relocation opportunities

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Implementation Overview

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries How Intel® Uses Performance Monitors Intel® uses Performance Monitoring for: Intel® uses Performance Monitoring for: –Performance Analysis –Compiler optimizations –System level optimizations –Performance and functional debug Many tools built for analyzing and collecting Performance monitoring counters Many tools built for analyzing and collecting Performance monitoring counters –Interval Sampler –Profiler

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Analysis Interval sampler Interval sampler –Gives the characteristics of the system VTune™ Performance Analyzer VTune™ Performance Analyzer –Event Profiler –Gives the distribution of events for the system over the whole application run –Available at: Interval Sampler points out which events to look for, VTune™ event profiles then help find the function, basic block or the IPs that have the performance problem. Interval Sampler points out which events to look for, VTune™ event profiles then help find the function, basic block or the IPs that have the performance problem.

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Limitations Not all counters can count all events. Not all counters can count all events. With hyperthreading, the counters may get divided among the logical processors. With hyperthreading, the counters may get divided among the logical processors.

*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Open Questions Centralized Vs. Distributed? Centralized Vs. Distributed? –Distributed is simpler but less flexible Add new events? Add new events? –New usage models –Multicore / Multithread scenarios Feedback is welcome! Feedback is welcome!