1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Using MapuSoft Instead of OS Vendor’s Simulators.
Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang 1, Tao Wei 1, Guofei Gu 2, Wei Zou 1 1 Peking.
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Techniques for Speeding up Pin-based Simulation Harish Patil.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
Performance of multiprocessing systems: Benchmarks and performance counters Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
CSE 451: Operating Systems Winter 2012 Module 18 Virtual Machines Mark Zbikowski and Gary Kimura.
Taming Hardware Event Samples for FDO Compilation Dehao Chen (Tsinghua University) Neil Vachharajani, Robert Hundt, Shih-wei Liao (Google) Vinodha Ramasamy.
High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.
Multi-core Programming VTune Analyzer Basics. 2 Basics of VTune™ Performance Analyzer Topics What is the VTune™ Performance Analyzer? Performance tuning.
Portions © Intel Corporation | Portions © Hewlett-Packard Corporation * Other brands and names may be claimed as the property of others.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
Performance of mathematical software Agner Fog Technical University of Denmark
Software Integrity Monitoring Using Hardware Performance Counters Corey Malone.
Eclipse Simple Profiler Ben Xu Mar 7,2011. About Eclipse simple profiler is a open source project to analyze your plug-ins/RCPs performance.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Apan.
Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
A Software Performance Monitoring Tool Daniele Francesco Kruse March 2010.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
OPERATING SYSTEM BY KINSHUK RASTOGI. WHAT IS AN OPERATING SYSTEM? What is an operating system in the first place? An operating system is a software that.
Best detection scheme achieves 100% hit detection with
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Computer System Structures
Two notions of performance
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Free Transactions with Rio Vista
Current Generation Hypervisor Type 1 Type 2.
Gift Nyikayaramba 30 September 2014
Outline Motivation Project Goals Methodology Preliminary Results
Effective Data-Race Detection for the Kernel
Many-core Software Development Platforms
Performance monitoring on HP Alpha using DCPI
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Computer Architecture Lecture 4 17th May, 2006
Understanding Performance Counter Data - 1
CMSC 611: Advanced Computer Architecture
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Free Transactions with Rio Vista
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Operating System Introduction.
CMSC 611: Advanced Computer Architecture
Introduction to Virtual Machines
Introduction to Virtual Machines
Overview of System Development for Windows CE.NET
What Are Performance Counters?
Dynamic Binary Translators and Instrumenters
Presentation transcript:

1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi Enterprise Platform Group Intel Corporation Presented at MICRO-37: Portland, OR, Dec. 6 th, 2004 IA32/EM64T/IPF

2 Enterprise Platforms Group Target: LARGE Applications With little/no manual intervention Within reasonable time Goal: Accurate Performance Prediction

3 Enterprise Platforms Group Instruction Counts : Some Itanium Applications SPECINT (average) SPECFP (average) RenderMan magic Fluent L2 Amber rt Ls-Dyna 3cars

4 Enterprise Platforms Group Whole-Program Simulation is Slow SPECINT (average) SPECFP (average) RenderMan magic Fluent L2 Amber rt Ls-Dyna 3cars

5 Enterprise Platforms Group Solution: Select Simulation Points Manually Randomly –Anywhere –From uniform regions Fine-grain sampling (SMARTS: CMU) By program-phase analysis (SimPoint:UCSD, iPart: Intel/MRL)

6 Enterprise Platforms Group Running Commercial Applications on Simulators is Hard Resource Requirements: Disks etc. –Need to modify/re-configure the simulator OS dependencies –Need support for specific kernel and device drivers License checking –Need special action

7 Enterprise Platforms Group Use PIN to select simulation points (PinPoints) and generate traces PIN: A dynamic-instrumentation system + A tool for writing tools + No special compiler/linker flags required Solution: Native Execution with Instrumentation

8 Enterprise Platforms Group PIN-Tools: Profiling, Trace Generation and more…. PIN-based profiler Simulation Point Selection Profile PinPoints PIN-based Trace Generator PIN-based Branch Predictor Your Simulator Here

9 Enterprise Platforms Group Simulation Point Selection with SimPoint [UCSD] Why SimPoint? Instrumentation based Microarchitecture independent Works well (results later) Applied to multi-threaded programs PIN-based profiler SimPoint Tools Basic Block Vectors PinPoints

10 Enterprise Platforms Group Multiple Sources of Error Goal: Accurate Performance Prediction Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling PinPoints Traces Simulation Stats (CPI) Phase-detection is not enough! Need Trace Generation and Simulation

11 Enterprise Platforms Group Main Contributions A Toolkit that automatically: – Profiles, finds phases/ simulation regions (PinPoints) –Validates that PinPoints are representative –Generates traces for simulators Available for Itanium/IA32/EM64T Evaluations in a production environment

12 Enterprise Platforms Group The PinPoints Toolkit PinPoints file H/W counters-based Validation (pfmon : Itanium PAPI : IA32) Compute CPI Match? Whole Program Weighted Sum for PinPoints Phase Detection + PinPoint Selection Trace Generation/Simulation

13 Enterprise Platforms Group Evaluations Applications: Built w/ Intel’s compilers (high opt) HPC: Fluent, AMBER, LS-Dyna, RenderMan SPEC2000: Processed 8-9 times Test Configurations: Linux (RedHat) MercedItanium (1)800 MHzL3: 2MB McKinleyItanium-2900 MHzL3: 1.5MB MadisonItanium-21.3 GHzL3: 3-6 MB

14 Enterprise Platforms Group PinPoints << 1% of program execution Turnaround time (Traces) : Few days PinPoints Generated Program# Retired Instructions (billions) # PinPoints (250 million insts. EACH) AMBER-rt3,9946 Fluent-m32,6258 LS-DYNA4,9326 SPECINT2000(avg.)1424 SPECFP2000(avg.)3735

15 Enterprise Platforms Group Results: Overview PinPoints: Whole-Program CPI prediction (SPEC2000 and HPC applications): –Average CPI prediction error ~5% –PinPoints better than random selection Predicting speedup between microarchitectures –PinPoints can be used to evaluate microarchitecture variations PinPoints Traces: Prediction of native SPEC2000 ratios –INT within 8% FP within 3% More results in the paper

16 Enterprise Platforms Group CPI: Actual vs. Predicted SPEC2000: Itanium-Madison

17 Enterprise Platforms Group SPEC2000 CPI Prediction Average Error: Madison : 2.8% Merced : 3.2% McKinley : 2.7%

18 Enterprise Platforms Group HPC Applications CPI Prediction Average Error: Madison : 5.0%

19 Enterprise Platforms Group Comparison With Random Selection [ 48 unique program runs ]

20 Enterprise Platforms Group Comparison With Random Selection [ 18 unique program runs ]

21 Enterprise Platforms Group Speedup: Merced  McKinley SPEC2000

22 Enterprise Platforms Group PinPoints Speedup Prediction: SPEC2000: Merced  McKinley

23 Enterprise Platforms Group PinPoints: Speedup Prediction Across Multiple Microarchitectures Same Binaries/PinPoints

24 Enterprise Platforms Group Putting it All Together: From PinPoints to Projections PinPoints Traces Simulation Stats (CPI) Does simulation of traces for PinPoints predict native performance? Error Source: Phase detection Error Source: Non-repeatability Error Source: Warm-up, Modeling Error: Cumulative

25 Enterprise Platforms Group CPI Prediction with Simulation SPEC2000: Itanium Madison

26 Enterprise Platforms Group Native SPEC2000 Ratios [Spring 2004] Itanium: Madison 1.5GHz/6MB L3

27 Enterprise Platforms Group Performance Prediction from PinPoints Traces Itanium: Madison 1.5GHz/6MB L3

28 Enterprise Platforms Group Summary PinPoints toolkit : Automatic simulation region selection, tracing, and validation Dynamic instrumentation (PIN )  LARGE programs PinPoints: << 1% of execution Capture whole-program CPI –Average error < 5% for SPEC2000, HPC apps. –Better than random selection PinPoints traces: Predict SPEC2000 Ratios –INT within 8% FP within 3%

29 Enterprise Platforms Group Try it out! (PIN + PinPoints) toolkit : New

30 Enterprise Platforms Group Backup: Simulator Warm-up Strategy 1: Large slice-size (250 million instructions) –Too coarse-grain for phase detection –Too much simulation time Strategy 2: 7 warm-up traces per simulation trace (30 million instructions) Art (SPECFP2000): First pinpoint touches most of the working set –Simulate all pinpoint traces in succession