The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.

Slides:



Advertisements
Similar presentations
Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.
Advertisements

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Techniques for Speeding up Pin-based Simulation Harish Patil.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
Phase Detection Jonathan Winter Casey Smith CS /05/05.
Decomposing Memory Performance Data Structures and Phases Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences.
Spring Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information.
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
University of Colorado
Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved CHAPTER 9 Simulation Methods SIMULATION METHODS SIMPOINTS PARALLEL SIMULATIONS NONDETERMINISM.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Performance Monitoring on the Intel ® Itanium ® 2 Processor CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
PMaC Performance Modeling and Characterization A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection Michael Laurenzano 1, Joshua.
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
A Monte Carlo Model of In-order Micro-architectural Performance: Decomposing Processor Stalls Olaf Lubeck Ram Srinivasan Jeanine Cook.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
1/36 by Martin Labrecque How to Fake 1000 Registers Oehmke, Binkert, Mudge, Reinhart to appear in Micro 2005.
Performance Simulators José Nelson Amaral CMPUT 429 Dept. of Computing Science University of Alberta.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Copyright 2004 David J. Lilja1 Measuring Computer Performance SUMMARY.
Automatically Characterizing Large Scale Program Behavior Timothy Sherwood Erez Perelman Greg Hamerly Brad Calder Used with permission of author.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Sunpyo Hong, Hyesoon Kim
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Outline Motivation Project Goals Methodology Preliminary Results
Energy-Efficient Address Translation
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Hardware Counter Driven On-the-Fly Request Signatures
Request Behavior Variations
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation Presented as part of the Pin tutorial at ASPLOS 2004, Boston, MA 10/09/2004

ASPLOS’042PinPoints People PinPoints: Harish Patil, Robert Cohn, Mark Charney, Andrew Sun, Rajiv Kapoor, Anand Karunanidhi Pin: Robert Cohn, Artur Klauser, Geoff Lowney, CK Luk, Robert Muth, Harish Patil,Vijay Janapa Reddi, Steven Wallace Acknowledgements: Brad Calder, Michael Greenfield, Geoff Lowney, Joel Emer, Chris Weaver, Michael Adler, Kim Hazelwood, James Vash, Ram Ramanujam, Roger Golliver, Timothy Prince, Allan Knies, Youngsoo Choi, Nechama Katan, Chris Gianos, Hideki Saito, Mahesh Madhav …

ASPLOS’043PinPoints Representative Regions of Programs –Automatically chosen –Validated ( represent whole-program behavior) –For Trace-driven or Execution-driven Simulation Pin (Intel) : + SimPoint (UCSD) FFound/Validated PinPoints for long running (trillions of instructions) programs [IPF & x86]

ASPLOS’044PinPoints Outline Of the Talk Why PinPoints ? PinPoints methodology: How to find and validate representative regions for simulation Reference: Paper in MICRO-37: “Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation”, Patil et al. PinPoints [download]

ASPLOS’045PinPoints Motivation: Simulating Large Programs Problem: Whole-program simulation is very slow (can take months) Solution: Find representative simulation points – Programs have phases: random/blind selection may miss them –SimPoint approach: Find phases using basic block profile: one simulation point (PinPoint) per phase FPinPoints : < 1% of program execution –Capture whole-program behavior

ASPLOS’046PinPoints Motivation: Simulating Large Programs (continued) Problem: Porting programs to simulators is often not practical –license issues –extra resources (disks etc.) Solution: Drive simulation from native environment –Run under Pin FPin runs programs “out-of-the-box” (no porting required)

ASPLOS’047PinPoints The PinPoints Methodology isimpoint : Generate Dynamic Basic Block Profile SimPoint Tools: Analyze Basic Block Profile to find phases Scripts: Generate PinPoints files PinPoints file H/W counters-based Validation Sample Counters Match? Whole Program Weighted Sum for PinPoints Phase Detection + PinPoint Selection Trace Generation/Simulation

ASPLOS’048PinPoints Phases in gzip’s Execution Performance (IPC) Energy used per interval Instruction cache misses Data cache misses 2 nd level cache misses Branch misprediction Instructions

ASPLOS’049PinPoints SimPoint: You are what you execute Goal - track behavior of a program –Behavior caused by the path through code How - Track the code that is executing –Detect changes and similarities in code Basic Block Distribution Analysis –Generate and compare Code Signatures

ASPLOS’0410PinPoints Basic-Block Distribution Analysis B C A D E A B C D E

ASPLOS’0411PinPoints Basic-Block Distribution Analysis B C A D E A B C D E Capture using isimpoint Compare vectors Group similar vectors in clusters Choose one PinPoint per cluster

ASPLOS’0412PinPoints Phase Detection + PinPoint Selection PinPoint 1: Weight 30% PinPoint 2: Weight 70% pinpoints.pp …… …… Profiles(vectors) for Program Slices (100 Million Instructions each) … … Profile with isimpoint Analyze with SimPoint

ASPLOS’0413PinPoints PinPoints Generated for Some Programs (Commercial and SPEC2000) Program# Retired Instructions (billions) # Slices (250 million insts.) # PinPoints AMBER-rt Fluent-m LS-DYNA SPECINT SPECFP PinPoints : < 1% of program execution

ASPLOS’0414PinPoints PinPoints: Validation Do PinPoints capture whole-program behavior? Whole-Program CPI: Actual-CPI Predict using CPI for PinPoints: Predicted-CPI Predicted-CPI =  Weight i * CPI i % Delta = (Actual-CPI – Predicted-CPI)*100/ (Actual-CPI) Do they work across micro-architectures? –Predict performance on different configurations with the same binary/PinPoints : Compare with actual performance

ASPLOS’0415PinPoints Predicting Whole-program CPI with PinPoints (Itanium 2: 1.3 GHz)

ASPLOS’0416PinPoints Predicting Whole-program CPI with PinPoints (Pentium 4: 2.8 GHz)

ASPLOS’0417PinPoints Predicting Whole-program L2 Misses with PinPoints (Itanium 2: 900 MHz)

ASPLOS’0418PinPoints Speedup Prediction with PinPoints (Itanium 1, 2 varying Frequency) F Same binaries / Same set of PinPoints : Different Microarchitectures

ASPLOS’0419PinPoints Relevant Pin Tools isimpoint : generates basic block vectors in a format suitable for SimPoint analysis controller : allows fast-forwarding till a region of interest is reached Specifying a region of interest: –Skip N instructions –Specific code address + Count –PinPoints file + PinPoint number Available as “class CONTROL” in a Pin kit

ASPLOS’0420PinPoints Summary Finding simulation points : The Pin Advantage No special compiler/link flags or porting required Allows analysis of programs as they run PinPoints : < 1% of program execution Predict whole-program behavior Work across microarchitectures

ASPLOS’0421PinPoints Resources Timothy Sherwood, Erez Perelman, Greg Hamerly and Brad Calder. “Automatically Characterizing Large Scale Program Behavior” ASPLOS’02 SimPoint toolkit Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004). PinPoints toolkit: To be released soon (available upon request)