1 Aashish Phansalkar & Lizy K. John Performance Prediction Using Program Similarity The University of Texas at Austin.

Slides:

Advertisements

Similar presentations

EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.

Advertisements

AP STUDY SESSION 2.

Advanced Piloting Cruise Plot.

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.

& dding ubtracting ractions.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Chapter 1 The Study of Body Function Image PowerPoint

Copyright © 2013 Elsevier Inc. All rights reserved.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

ALGEBRA Number Walls

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

David Burdett May 11, 2004 Package Binding for WS CDL.

The Return of Synthetic Benchmarks

We need a common denominator to add these fractions.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Learning to show the remainder

Chapter 7 Sampling and Sampling Distributions

The 5S numbers game..

Solve Multi-step Equations

A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.

Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.

Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.

1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.

Break Time Remaining 10:00.

The basics for simulations

PP Test Review Sections 6-1 to 6-6

ABC Technology Project

1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)

Regression with Panel Data

15. Oktober Oktober Oktober 2012.

1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

© 2012 National Heart Foundation of Australia. Slide 2.

Adding Up In Chunks.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

When you see… Find the zeros You think….

1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

25 seconds left…...

1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Converting a Fraction to %

Clock will move after 1 minute

PSSA Preparation.

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Physics for Scientists & Engineers, 3rd Edition

Select a time to count down from the clock above

1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.

Presentation transcript:

1 Aashish Phansalkar & Lizy K. John Performance Prediction Using Program Similarity The University of Texas at Austin

2 Outline Motivation and Objectives Methodology Experimental results Conclusion Future work

3 Motivation (1): Simulation is costly A computer architect or a designer has to simulate multiple customer applications Simulations take very long due to the complexity of modern microprocessor designs

4 Motivation(2): Making a decision based on benchmark scores Customers often use benchmarks to make a decision about buying computer systems The application program they use often, may not be a part of the benchmark suite Customers can use benchmarks as representatives of their application programs Predict performance of their application based on the already available performance data of benchmarks

5 Objective A quantitative method to estimate performance without running cycle accurate simulation Use the knowledge of similarity between a customers application program and known benchmark programs to develop a quantitative approach to predict performance

6 Outline Motivation and Objectives Methodology Experimental results Conclusion Future work

7 Overview Customer application Repository of Benchmarks Measure similarity Predicted performance New CaseKnown cases

8 Program characterization Instruction mix Percentage of different types of instructions e.g. percentage of memory references, percentage of branch instructions Control Flow % Taken branches % Forward branches % Forward taken branches Basic Block Size (Number of instructions between two branches) Register Dependency Distance Data and instruction temporal locality of program Data and instruction spatial locality of program

9 ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R8) SUB R8,R2,R1 Read After Write Dependency Distance = 4 Measure Distribution of % of dependency distances for following set of ranges. 1, 2, 3-4, 5-8, 8-16, 16-32, greater than 32 Normalized count for each range of dependency distance forms a metric Register dependency distance

10 Memory reuse distance 2004, 2008, 4008, 2000, 1080,2004,4008 Reuse Distance = 4 Reuse Distance = 3 Data and instruction temporal locality Computing reuse distance for a trace of byte addresses is very computation and space intensive Reuse distance for a block of 16, 64, 256, 4096 bytes Temporal locality metrics (tlocality) = Wt. average reuse distance

11 Data and instruction spatial locality Spatial locality metrics are derived from the temporal locality metrics As the block size increases, programs with good spatial locality will show lower values for tlocality for higher block sizes Spatial Locality = tlocality64 / tlocality16 tlocality256 / tlocality16 tlocality4096 / tlocality16

12 Methodology Overview Microarchitecture independent metrics for known benchmarks Microarchitecture independent metrics for the customer application Measure program similarity Prediction of target metric for new application (2 methods) Similarity information Predicted value of target metric

13 Measuring Similarity (1) Distance between two programs in the workload space is the measure of their similarity We assume that similarity between two programs is inversely proportional to the Euclidean distance between them

14 Measuring similarity (2) The workload space is made of many workload characteristics and so its dimensionality is very high Inherent characteristics are highly correlated Euclidean distance measured using these characteristics will be biased The correlated variables will add twice to the distance as the independent variables Use Principal Components Analysis (PCA)

15 Method 1: Predicting performance using weights Compute distance of similarity from program X to each benchmark program dx1, dx2, dx3…dxn in the PC space Calculate weights w1, w2, …. w1 w2 User program X benchmarks

16 Method 2: Predicting performance using clustering Measure all the inherent characteristics for the benchmarks and user program X Cluster all the programs based on the inherent characteristics and find optimal clusters User program X benchmarks

17 Outline Motivation and Objectives Methodology Experimental results Conclusion Future work

18 Experiments Used integer programs from SPEC CPU2000 suite to demonstrate the use of Method 1 and Method 2 described Prediction of speedup Used all the workload characteristics to form the workload space Prediction of cache miss-rates Used only the data locality characteristics to form the workload space

19 Experiment: Predict performance (speedup) of bzip2 using benchmarks from SPEC CPU2000 suite Assume that bzip2 is the customer application Performance of SPEC CPU2000 benchmarks is known Predicting speedup(1) SPEC int 2000 benchmarks used for predictionSpeedup for SGI Altix (1500MHz, Itanium 2) 164.gzip parser twolf gcc eon crafty vortex vpr mcf Speedup for each benchmark program on a machine (from the scores reported on the SPEC website)

20 Predicting speedup(2) Mean used% Error in predicting speedup Wt GM Error4.69 Wt HM Error2.5 Wt AM Error6.87 GM Error8.68 HM Error6.53 AM error10.77 Clustering12.08 Method 1:Predicting speedup using weights Machine name: SGI-Altix 3000 (1500MHz, Itanium 2)

21 Predicting speedup (3) Statistics % Error in predicted speedup using weighted GM % Error in predicted speedup using weighted HM % Error in predicted speedup using weighted AM % Error in predicted speedup using GM Average Lower CI(95%) Upper CI(95%) Method 1: Predicting speedup using weights For 50 different machines the error in predicted speedup

22 Predicting speedup (4) Cluster 1parser, twolf, vortex Cluster 2bzip2, gzip Cluster 3eon, vpr Cluster 4mcf Cluster 5crafty Cluster 6gcc Method 2: Predicting speedup using clustering The average error in predicting the speedup over all machines for bzip2 is 20.29%

23 Prediction of data cache miss rates(1) Method 1: Using weights for prediction Note: Assume every program to be a customer application one at a time

24 Prediction of data cache miss rates(2) Method 2: Using clustering for prediction Note: Assume every program to be a customer application one at a time

25 Outline Motivation and Objectives Methodology Experimental results Conclusion Future work

26 Conclusion Demonstrated two simple methods to predict performance Used SPEC CPU2000 as an example to predict performance. The accuracy of prediction depends on two factors: How well the workload characteristics correlate to performance Is there a program similar to the customer application in the repository of known programs

27 Future Work Two main items on the TO DO list: To add more programs to the repository and validate the results To calibrate the measure of similarity (distance) in workload space to the error in the target metric space.

28 Thank you !!