Does Training Input Selection Matter for Feedback-Directed Optimizations? Paul Berube University of Alberta CDP05, October 17, 2005.

Slides:

Advertisements

Similar presentations

1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.

Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,

Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.

Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)

1 S. Tallam, R. Gupta, and X. Zhang PACT 2005 Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona.

Testing Without Executing the Code Pavlina Koleva Junior QA Engineer WinCore Telerik QA Academy Telerik QA Academy.

Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

Evaluation & Assessment Practice ROE – Return on Expectations Joann Halle - VP, XNA Learning & Development John Leutner – Head, Global Learning.

Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.

1 Feedback-directed optimizations with estimated edge profiles from hardware event sampling Open64 workshop, CGO 2008 April 6, 2008 Vinodha Ramasamy, Robert.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

An Overview of Virtual Machine Architectures by J.E. Smith and Ravi Nair presented by Sebastian Burckhardt University of Pennsylvania CIS 700 – Virtualization.

Taming Hardware Event Samples for FDO Compilation Dehao Chen (Tsinghua University) Neil Vachharajani, Robert Hundt, Shih-wei Liao (Google) Vinodha Ramasamy.

Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.

Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.

SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 12 CS2110 – Fall 2009.

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.

© 2003, Carla Ellis Experimentation in Computer Systems Research Why: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you.

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.

LIPO: Feedback Directed Cross-Module Optimization

Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.

The HipHop Compiler from Facebook By Megha Gupta & Nikhil Kapoor.

ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

Monitoring and Evaluation Management of a Training Program.

CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware

© 2011 Lantana Consulting Group, 1 Open Health Tools Membership Presentation July Lantana Consulting Group Transforming healthcare.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:

Information Systems for Production and Operations Management Perspectives on European Quality Initiatives and Programs and their Impacts and Implications.

University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.

This is a work of the U.S. Government and is not subject to copyright protection in the United States. The OWASP Foundation OWASP AppSec DC October 2005.

Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.

Profiling Where does my application spend the time? Profiling1.

SB How ScriptBasic works Introduction to ScriptBasic Modules.

Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS

BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.

Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.

1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.

Understanding Assessment = Understanding Learning Barry O’Sullivan British Council Assessment Research Group Conference - September 12 th 2015 Hosei University,

CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.

Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,

Enabling Control over Adaptive Program Transformation for Dynamically Evolving Mobile Software Validation Mike Jochen, Anteneh Anteneh, Lori Pollock University.

Troubleshooting Dennis Shasha and Philippe Bonnet, 2013.

October 20-23rd, 2015 FEEBO: A Framework for Empirical Evaluation of Malware Detection Resilience Against Behavior Obfuscation Sebastian Banescu Tobias.

Closing the Query Processing Loop in Oracle 11g Allison Lee, Mohamed Zait.

Computer Architecture & Operations I

CSCI206 - Computer Organization & Programming

October 19th 2016 Meeting Minutes.

The “Understanding Performance!” team in CERN IT

How to write a Strategic Plan

Benchmarks Breakout.

CSCE 212 Chapter 4: Assessing and Understanding Performance

Feedback directed optimization in Compaq’s compilation tools for Alpha

CSCI206 - Computer Organization & Programming

Moab® Automation Intelligence Overview

Adaptive Code Unloading for Resource-Constrained JVMs

Calpa: A Tool for Automating Dynamic Compilation

Searching, Sorting, and Asymptotic Complexity

Dynamic Program Analysis

CMSC 611: Advanced Computer Architecture

Closing – What Comes Next

Presentation transcript:

Does Training Input Selection Matter for Feedback-Directed Optimizations? Paul Berube University of Alberta CDP05, October 17, 2005

September 28, 2005Paul Berube2 Outline Background and motivation Aestimo: an FDO evaluation tool Workload Selection Results

September 28, 2005Paul Berube3 What Is FDO? compiletrain Feedback-Directed Optimization: compileevaluate

September 28, 2005Paul Berube4 What Is FDO? compiletrain Feedback-Directed Optimization: compileevaluate training input profile eval input

September 28, 2005Paul Berube5 Performance Evaluation Space Programs Evaluation Inputs Static optimization

September 28, 2005Paul Berube6 Performance Evaluation Space Programs Evaluation Inputs Training Inputs FDO

September 28, 2005Paul Berube7 Performance Evaluation Space Programs Evaluation Inputs Training Inputs Only 1 Train input SPEC Usually 1 Ref input

September 28, 2005Paul Berube8 The Big Question Does the selection of training inputs matter for feedback-directed optimization? –Different transformation decisions? –Different performance?

September 28, 2005Paul Berube9 Aestimo An FDO evaluation tool Automates training and evaluating on a large number of inputs Isolates individual transformations –Fewer experiment variables –Results vary by transformation Measures: –Differences in transformation decisions –Performance differences

September 28, 2005Paul Berube10 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload

September 28, 2005Paul Berube11 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload One Per Input

September 28, 2005Paul Berube12 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload Binary X Input 5 times each

September 28, 2005Paul Berube13 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload Performanc e

September 28, 2005Paul Berube14 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload Performanc e Transformatio n Differences

September 28, 2005Paul Berube15 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload Performanc e Transformatio n Differences FDO vs. Static

September 28, 2005Paul Berube16 An Overview of Aestimo Compile Binaries Optimization Logs Execute Analyze Program Workload Performanc e Transformatio n Differences FDO vs. Static Resubstitutio n

September 28, 2005Paul Berube17 Compilation Process Static Binary Optimization Log Source Static Compile

September 28, 2005Paul Berube18 Compilation Process Source Static Compile Static Binary Optimization Log Training Input Instrumented Binary Instr. Compile Training Run Profile FDO Compile FDO Binary

September 28, 2005Paul Berube19 Compilation Process Source Static Compile Static Binary Optimization Log Instr. Compile Training Run Profile FDO Compile FDO Binary Training Input Optimization Log Static Compile Final Binary Instrumented Binary

September 28, 2005Paul Berube20 Compilation Process Source Static Compile Static Binary Optimization Log Instr. Compile Instrumented Binary Training Run Profile FDO Compile FDO Binary Training Input Optimization Log Static Compile Final Binary

September 28, 2005Paul Berube21 Workload Selection SPEC CINT2000 Benchmark inputs –8 programs, 32 input 84 Additional Inputs –Contacted benchmark authors –Varied representative inputs –Existing collections –Synthetic input generator

September 28, 2005Paul Berube22 Results ORC compiler Inlining and if conversion Itanium and Itanium 2 processors

September 28, 2005Paul Berube23 Workload Performance: bzip2 Inlining Itanium

September 28, 2005Paul Berube24 Workload Performance: bzip2 Training Input Selection Matters! Inlining Itanium

September 28, 2005Paul Berube25 Summary of Contributions Training input selection does impact optimization decisions and performance Aestimo: –Automates training and evaluating on a large number of inputs –Isolates individual transformations A large collection of representative inputs for SPEC CINT2000 programs

September 28, 2005Paul Berube26 Thank You Questions?

September 28, 2005Paul Berube27 Performance: bzip2 trained on xml Inlining Itanium

September 28, 2005Paul Berube28 Performance: bzip2.combined Inlining Itanium