Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.

Slides:



Advertisements
Similar presentations
Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Advertisements

A Structure Layout Optimization for Multithreaded Programs Easwaran Raman, Princeton Robert Hundt, Google Sandya S. Mannarswamy, HP.
Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner, Angela Demke Brown, Ashvin Goel University of Toronto Presenter: Chuong.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Shadow Profiling: Hiding Instrumentation Costs with Parallelism Tipp Moseley Alex Shye Vijay Janapa Reddi Dirk Grunwald (University of Colorado) Ramesh.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Colorado Computer Architecture Research Group Architectural Support for Enhanced SMT Job Scheduling Alex Settle Joshua Kihm Andy Janiszewski Daniel A.
Transient Fault Tolerance via Dynamic Process-Level Redundancy Alex Shye, Vijay Janapa Reddi, Tipp Moseley and Daniel A. Connors University of Colorado.
Exploring the Potential of Performance Monitoring Hardware to Support Run-time Optimization Alex Shye M.S. Thesis Defense Committee: Daniel A. Connors,
Path Profile Estimation and Superblock Formation Jeff Pang Jimeng Sun.
Analysis of Path Profiling Information Generated with Performance Monitoring Hardware Alex Shye, Matt Iyer, Tipp Moseley, Dave Hodgdon Dan Fay, Vijay Janapa.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Taming Hardware Event Samples for FDO Compilation Dehao Chen (Tsinghua University) Neil Vachharajani, Robert Hundt, Shih-wei Liao (Google) Vinodha Ramasamy.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM Corporation – Austin, TX Ricardo Portillo, Diana Villa,
Performance Monitoring on the Intel ® Itanium ® 2 Processor CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Oct Using Platform-Specific Performance Counters for Dynamic Compilation Florian Schneider and Thomas Gross ETH Zurich.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Session 7C July 9, 2004ICPADS ‘04 A Framework for Profiling Multiprocessor Memory Performance Diana Villa, Jaime Acosta, Patricia J. Teller The University.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
Software Performance Monitoring Daniele Francesco Kruse July 2010.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Confessions of a Performance Monitor Hardware Designer Workshop on Hardware Performance Monitor Design HPCA February 2005 Jim Callister Intel Corporation.
A Framework For Trusted Instruction Execution Via Basic Block Signature Verification Milena Milenković, Aleksandar Milenković, and Emil Jovanov Electrical.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
A Robust Main-Memory Compression Scheme (ISCA 06) Magnus Ekman and Per Stenström Chalmers University of Technolog, Göteborg, Sweden Speaker: 雋中.
Computer Sciences Department University of Wisconsin-Madison
Prof. Hsien-Hsin Sean Lee
Raghuraman Balasubramanian Karthikeyan Sankaralingam
CS203 – Advanced Computer Architecture
ECE Dept., Univ. Maryland, College Park
Computer Architecture
Selective Code Compression Scheme for Embedded System
Part IV Data Path and Control
Henk Corporaal TUEindhoven 2009
What we need to be able to count to tune programs
Part IV Data Path and Control
Understanding Performance Counter Data - 1
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Henk Corporaal TUEindhoven 2011
Perfctr-Xen: A framework for Performance Counter Virtualization
Sampoorani, Sivakumar and Joshua
Adapted from the slides of Prof
Dynamic Hardware Prediction
Patrick Akl and Andreas Moshovos AENAO Research Group
rePLay: A Hardware Framework for Dynamic Optimization
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado at Boulder Department of Electrical and Computer Engineering DRACO Architecture Research Group

Introduction Code coverage is simple but useful for software testing Common method of code coverage analysis is through program instrumentation –Insertion of software probes statically or dynamically –Incurs a high overhead! (~50-200% overhead) Modern processors support contain a hardware Performance Monitoring Unit (PMU) –Itanium, Pentium 4, Power PC –Allow for low overhead sampling of low level information PMU represents a low-overhead alternative to full instrumentation

PMU PMUs are becoming more advanced –Coarse-grained and fine-grained features DCPI, Oprofile- PC sampling –But PMU can do more… –For example, branch vectors on Itanium Obstacles to PMU profiling –Non-deterministic (sampling) –Sample aliasing –Sampling = Less information Offline analysis can extend PMU information! FeaturesDescription Event CountersCounts of course grained events. ex. cpu cycles, flushes,etc. Branch Trace Buffer (BTB) Record branch vector of last 4 branches executed. Filters: T/NT, predicted correct/mispredicted,etc. Instruction Event Address Registers (IEAR) Sample Icache/ITLB missed. Addresses and latency Data Event Address Registers (DEAR) Sample Dcache, DTLB, ALAT misses. Addresses and latency Itanium-2 PMU Features Goal: Explore PMU-based code coverage by sampling branch vectors and performing offline compiler analysis

Code Coverage Framework PMU Branch Vectors … Partial Paths Offline Code Coverage Intermediate File Kernel Buffer Branch Vector Hash Table Online Branch Vectors Interrupt on kernel buffer overflow Terminology Branch Vector: Series of addresses from BTB Partial Path: Path of ops in compiler IR Address Map Annotated Binary Dominator Analysis Configured to sample only taken branches

Dominator Analysis –Finds all blocks guaranteed to execute Cannot be performed effectively online But is standard in any compiler infrastructure Partial Path from Branch Vector Basic Blocks added with Dom. Analysis BTB Branch Vector Terminology Dominator: u dominates v if all paths from Entry to v include u Post Dominator: u post-dominates v if all paths from v to Exit include u

Methodology Experiments run on Itanium-2 with kernel Developed tool using perfmon kernel interface and libpfm-3.1 to interface with PMU –Only sample taken branches to elongate branch vectors Set of SPEC2000 benchmarks –Compiled with the OpenIMPACT Research Compiler With annotations OpenIMPACT module for offline analysis Compared to full code coverage information from a Pin code coverage tool Benchmark#Ops# Covered Ops 164.gzip6,4663,063 (47%) 175.vpr23,57312,229 (52%) 177.mesa89,0067,390 (8%) 179.art2,2011,515 (69%) 181.mcf1,9731,401 (71%) 183.equake3,0332,265 (75%) 188.ammp19,5625,835 (30%) 197.parser17,54111,271 (64%) 256.bzip25,0953,138 (62%) 300.twolf40,49015,705 (39%) Number of Instructions and Actual Code Covered Coverage percentage is the percent of actually covered code discovered with PMU sampling and offline analysis

Effect of Sampling Period Sampling Overhead due to: –Copy BTB to kernel buffer, interrupt on kernel buffer overflow, copy from kernel buffer into hash table

PMU vs Actual Instruction Distribution Kullback-Leibler Divergence –Relative entropy of p with respect to q –d =  k=0 p k log 2 (p k /q k )

Code Coverage

Multiple Runs Regular Sampling: 1) gzip, parser, twolf improve greatly Randomized Sampling may discover code regular sampling cannot

Conclusion Motivates and presents initial results and rational for PMU-based code coverage An example of using advanced PMU feature with branch vectors Illustrates how simple offline analysis can extend PMU information Indicates PMU could be very useful for low overhead profiling and program understanding Could be promising for profiling of released software Questions?