Benchmarking and Performance Evaluations Todd Mytkowicz Microsoft Research.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Operating Systems Components of OS
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Stat 155, Section 2, Last Time Producing Data How to Sample? –History of Presidential Election Polls Random Sampling Designed Experiments –Treatments &
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
SAT-Based Decision Procedures for Subsets of First-Order Logic
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Run-Time Storage Organization
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
CS 110 Intro to Computer Science I Sami Rollins Fall 2006.
Chapter 28 Design of Experiments (DOE). Objectives Define basic design of experiments (DOE) terminology. Apply DOE principles. Plan, organize, and evaluate.
Lecture 9: SHELL PROGRAMMING (continued) Creating shell scripts!
Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
ScalaMeter Performance regression testing framework Aleksandar Prokopec, Josh Suereth.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Arrays. Arrays: Introduction So far, we handled only one object in each line of code. When we wanted multiple objects to do something, we wrote multiple.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Concept of Power ture=player_detailpage&v=7yeA7a0u S3A.
Lecture 10 : Introduction to Java Virtual Machine
 Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
1 Comp 104: Operating Systems Concepts Java Development and Run-Time Store Organisation.
1 Introduction to JVM Based on material produced by Bill Venners.
Introduction to Experimental Design
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Java Virtual Machine Case Study on the Design of JikesRVM.
Conrad Benham Java Opcode and Runtime Data Analysis By: Conrad Benham Supervisor: Professor Arthur Sale.
Blackjack Betting and Playing Strategies: A Statistical Comparison By Jared Luffman MSIM /3/2007.
16 October Reminder Types of Testing: Purpose  Functional testing  Usability testing  Conformance testing  Performance testing  Acceptance.
The Agent Based Crypto Protocol The ABC-Protocol by Jordan Hind MSE Presentation 3.
CS 2130 Lecture 5 Storage Classes Scope. C Programming C is not just another programming language C was designed for systems programming like writing.
Performance Analysis & Code Profiling It’s 2:00AM -- do you know where your program counter is?
Challenges and Solutions for Embedded Java Michael Wortley Computer Integrated Surgery March 1, 2001.
Ben Watson Principal Software Engineer Shared Platform Group, Application Services Group, Microsoft Author, Writing High-Performance.NET Code.
An Investigation of Xen and PTLsim for Exploring Latency Constraints of Co-Processing Units Grant Jenks UCLA.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Submitting Code Working > Not Working In almost all cases.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Zhiduo Liu Aaron Severance Satnam Singh Guy Lemieux Accelerator Compiler for the VENICE Vector Processor.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Chapter 7 Continued Arrays & Strings. Arrays of Structures Arrays can contain structures as well as simple data types. Let’s look at an example of this,
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
Hello world !!! ASCII representation of hello.c.
Título/Title Nome/Name Cargo/Position Foto/ Picture Linux Performance on Power Breno Leitão Software Engineer.
Science, Measurement, Uncertainty and Error1 Science, Measurements, Uncertainty and Error.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Raced Profiles:Efficient Selection of Competing Compiler Optimizations Hugh Leather, Bruce Worton, Michael O'Boyle Institute for Computing Systems Architecture.
Introduction to Java Dept. Business Computing University of Winnipeg
Mark Claypool and Jonathan Tanner Computer Science Department
Advanced Topics in Data Management
Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century May 4th 2017 Ben Lenard.
Adaptive Code Unloading for Resource-Constrained JVMs
Closure Representations in Higher-Order Programming Languages
LMC Little Man Computer What do you know about LMC?
IB Computer Science II Paul Bui
CSE 153 Design of Operating Systems Winter 2019
Interpreting Java Program Runtimes
Warm Up Find the following: Mean ± 1 Std Deviation =
Presentation transcript:

Benchmarking and Performance Evaluations Todd Mytkowicz Microsoft Research

Let’s pole for an upcoming election I ask 3 of my co-workers who they are voting for.

Let’s pole for an upcoming election I ask 3 of my co-workers who they are voting for. My approach does not deal with – Variability – Bias

Issues with my approach Variability source: My approach is not reproducible

Issues with my approach(II) Bias source: My approach is not generalizable

Take Home Message Variability and Bias are two different things – Difference between reproducible and generalizable!

Take Home Message Variability and Bias are two different things – Difference between reproducible and generalizable! Do we have to worry about Variability and Bias when we benchmark?

Let’s evaluate the speedup of my whizbang idea What do we do about Variability?

Let’s evaluate the speedup of my whizbang idea What do we do about Variability?

Let’s evaluate the speedup of my whizbang idea What do we do about Variability? Statistics to the rescue – mean – confidence interval

Intuition for T-Test 1-6 is uniformly likely (p = 1/6) Throw die 10 times: calculate mean

Intuition for T-Test 1-6 is uniformly likely (p = 1/6) Throw die 10 times: calculate mean TrialMean of 10 throws ……

Intuition for T-Test 1-6 is uniformly likely (p = 1/6) Throw die 10 times: calculate mean TrialMean of 10 throws ……

Back to our Benchmark: Managing Variability

> x=scan('file') Read 20 items > t.test(x) One Sample t-test data: x t = , df = 19, p-value < 2.2e percent confidence interval: sample estimates: mean of x

So we can handle Variability. What about Bias?

System = gcc -O2 perlbench System + Innovation = gcc -O3 perlbench Evaluating compiler optimizations

Madan: speedup = 1.18 ± Conclusion: O3 is good System = gcc -O2 perlbench System + Innovation = gcc -O3 perlbench Evaluating compiler optimizations

Madan: speedup = 1.18 ± Conclusion: O3 is good Todd: speedup = 0.84 ± Conclusion: O3 is bad System = gcc -O2 perlbench System + Innovation = gcc -O3 perlbench Evaluating compiler optimizations

Madan: speedup = 1.18 ± Conclusion: O3 is good Todd: speedup = 0.84 ± Conclusion: O3 is bad System = gcc -O2 perlbench System + Innovation = gcc -O3 perlbench Why does this happen? Evaluating compiler optimizations

Madan: HOME=/home/madan Todd: HOME=/home/toddmytkowicz env stack text env stack Differences in our experimental setup

Runtime of SPEC CPU 2006 perlbench depends on who runs it!

32 randomly generated linking orders Bias from linking order speedup

32 randomly generated linking orders Order of.o files can lead to contradictory conclusions Bias from linking order speedup

Where exactly does Bias come from?

Interactions with hardware buffers O2 Page NPage N + 1

Interactions with hardware buffers O2 Page NPage N + 1 Dead Code

Interactions with hardware buffers O2 Page NPage N + 1 Code affected by O3

Interactions with hardware buffers O2 Page NPage N + 1 Hot code

Page NPage N + 1 Interactions with hardware buffers O2 O3 O3 better than O2

Page NPage N + 1 Interactions with hardware buffers O2 O3 O2 O3 O3 better than O2 O2 better than O3

Cachline N Cacheline N + 1 Interactions with hardware buffers O2 O3 O2 O3 O3 better than O2 O2 better than O3

Other Sources of Bias JIT Garbage Collection CPU Affinity Domain specific (e.g. size of input data) How do we manage these?

Other Sources of Bias How do we manage these? – JIT: ngen to remove impact of JIT “warmup” phase to JIT code before measurement – Garbage Collection Try different heap sizes (JVM) “warmup” phase to build data structures Ensure program is not “leaking” memory – CPU Affinity Try to bind threads to CPUs (SetProcessAffinityMask) – Domain Specific: Up to you!

R for the T-Test Where to download – Simple intro to get data into R Simple intro to do t.test

Some Conclusions Performance Evaluations are hard! – Variability and Bias are not easy to deal with Other experimental sciences go to great effort to work around variability and bias – We should too!