Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

Slides:



Advertisements
Similar presentations
Prepared by Lloyd R. Jaisingh
Advertisements

Probability models- the Normal especially.
Review bootstrap and permutation
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Inferential Statistics
Sampling: Final and Initial Sample Size Determination
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Hypothesis testing Week 10 Lecture 2.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
1. Estimation ESTIMATION.
Statistics CSE 807.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 4, 6, 7.
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Experimental Evaluation
BCOR 1020 Business Statistics
Today Concepts underlying inferential statistics
Hypothesis Testing Using The One-Sample t-Test
13 Design and Analysis of Single-Factor Experiments:
Standard error of estimate & Confidence interval.
The t-test Inferences about Population Means when population SD is unknown.
Inferential Statistics
Choosing Statistical Procedures
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
AM Recitation 2/10/11.
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Statistical inference: confidence intervals and hypothesis testing.
PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
(C) 2003 Mulitfacet ProjectUniversity of Wisconsin-Madison Evaluating a $2M Commercial Server on a $2K PC and Related Challenges Mark D. Hill Multifacet.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Section 9.2 Testing the Mean  9.2 / 1. Testing the Mean  When  is Known Let x be the appropriate random variable. Obtain a simple random sample (of.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Introduction to Experimental Design
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
© Copyright McGraw-Hill 2004
One Sample Mean Inference (Chapter 5)
Sunpyo Hong, Hyesoon Kim
Chapter 13 Understanding research results: statistical inference.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
Introduction to Hypothesis Testing. Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation.
Chapter 9 Introduction to the t Statistic
OPERATING SYSTEMS CS 3502 Fall 2017
Psych 231: Research Methods in Psychology
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Statistics for Business and Economics (13e)
Improving Multiple-CMP Systems with Token Coherence
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 10 – Part II Analysis of Variance
How Confident Are You?.
Presentation transcript:

Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

HPCA 2003Alaa Alameldeen and David Wood2 Motivation  Experimental scientists use statistics  Computer architects in simulation experiments don’t!  Why ignore statistics? Simulations are deterministic Simulations are deterministic  This can lead to wrong conclusions!

HPCA 2003Alaa Alameldeen and David Wood3 Workload Variability OLTP

HPCA 2003Alaa Alameldeen and David Wood4 Workload Variability OLTP Slower memory is better!

HPCA 2003Alaa Alameldeen and David Wood5 What Went Wrong?  Many possible executions for each configuration  Why? Different timing effects OS scheduling decisions OS scheduling decisions Different orders of lock acquisition Different orders of lock acquisition Different transaction mixes Different transaction mixes  This is magnified by short simulations  Variability can lead to wrong conclusions

HPCA 2003Alaa Alameldeen and David Wood6 Overview  Variability is a real phenomenon for multi- threaded workloads Runs from same initial state can be different Runs from same initial state can be different  Variability is a challenge for simulations Simulations are short Simulations are short  Our solution accounts for variability Multiple runs, statistical techniques Multiple runs, statistical techniques

HPCA 2003Alaa Alameldeen and David Wood7 Outline  Motivation and Overview  Variability in Real Systems Time and Space Variability Time and Space Variability  Variability in Simulations  Accounting for Variability  Conclusions

HPCA 2003Alaa Alameldeen and David Wood8 What is Variability?  Differences between multiple estimates of a workload’s performance  Time Variability: Performance changes during different phases of a single run Performance changes during different phases of a single run  Space Variability: Runs starting from the same state follow different execution paths Runs starting from the same state follow different execution paths

HPCA 2003Alaa Alameldeen and David Wood9 Time Variability in Real Systems OLTP One-second intervals

HPCA 2003Alaa Alameldeen and David Wood10 Time Variability Example (Cont’d)  How is this handled in real experiments? Solution: Run your experiment long enough! Solution: Run your experiment long enough! OLTP One-minute intervals

HPCA 2003Alaa Alameldeen and David Wood11 Space Variability in Real Systems OLTP One-second averages 5 runs

HPCA 2003Alaa Alameldeen and David Wood12 Space Variability Example (Cont’d)  How is this handled in real experiments? Same Solution: Run your experiment long enough! Same Solution: Run your experiment long enough! 16-day simulation OLTP One-minute averages 5 runs

HPCA 2003Alaa Alameldeen and David Wood13 Outline  Motivation and Overview  Variability in Real Systems  Variability in Simulations Simulation Infrastructure Simulation Infrastructure Injecting Randomness Injecting Randomness The Wrong Conclusion Ratio The Wrong Conclusion Ratio  Accounting for Variability  Conclusions

HPCA 2003Alaa Alameldeen and David Wood14 Simulation Infrastructure  Workloads Two scientific and five commercial benchmarks Two scientific and five commercial benchmarks  Target System: E10000-like 16-node system  Full System Simulation Virtutech Simics running Solaris 8 on SPARC V9 Virtutech Simics running Solaris 8 on SPARC V9 A blocking processor model (Simics) A blocking processor model (Simics) An OoO processor model (TFSim – Mauer et al., SIGMETRICS’02) An OoO processor model (TFSim – Mauer et al., SIGMETRICS’02)  Memory system simulator MOSI invalidation-based broadcast coherence protocol (Martin et al., HPCA-02) MOSI invalidation-based broadcast coherence protocol (Martin et al., HPCA-02)

HPCA 2003Alaa Alameldeen and David Wood15 Simulating Space Variability?  Simulations are deterministic  Variability cannot be ignored for multi- threaded applications One execution may not be representative One execution may not be representative Execution paths affect simulation conclusions Execution paths affect simulation conclusions  We need to obtain a space of results

HPCA 2003Alaa Alameldeen and David Wood16 Injecting Randomness  We introduce artificial random perturbations in each simulation run  For each memory access, latency in nanoseconds becomes Latency + r (r = -2, -1, 0, 1, 2 nanoseconds, uniform dist.)  Roughly models contention due to DMA traffic  Other methods are possible

HPCA 2003Alaa Alameldeen and David Wood17 Simulated Space Variability  Space variability exists in our benchmarks 20 runs ~10 hrs sim.

HPCA 2003Alaa Alameldeen and David Wood18 Quantifying Variability: The Wrong Conclusion Ratio (WCR)  WCR (16,32) = 18%  WCR (16,64) = 7.5%  WCR (32,64) = 26% OLTP 20 runs 50 Xacts

HPCA 2003Alaa Alameldeen and David Wood19 Outline  Motivation and Overview  Variability in Real Systems  Variability in Simulations  Accounting for Variability  Conclusions

HPCA 2003Alaa Alameldeen and David Wood20 Confidence Intervals  Definition: Range of values expected to include population parameter (e.g. mean) Range of values expected to include population parameter (e.g. mean)  Confidence Probability: Probability that true mean lies inside confidence interval Probability that true mean lies inside confidence interval  For the same confidence probability: Sample Size ↑ → Confidence Interval ↓ Sample Size ↑ → Confidence Interval ↓

HPCA 2003Alaa Alameldeen and David Wood21 Accounting for Space Variability OLTP

HPCA 2003Alaa Alameldeen and David Wood22 Accounting for Space Variability  Simple solution: Estimate #runs such that confidence intervals do not overlap  Tests of hypotheses can be used (paper) OLTP

HPCA 2003Alaa Alameldeen and David Wood23 Conclusions  Short runs of multi-threaded workloads exhibit variability  Variability can lead to wrong simulation conclusions  Our Solution: Injecting randomness Injecting randomness Multiple runs Multiple runs Apply statistical techniques Apply statistical techniques

HPCA 2003Alaa Alameldeen and David Wood24 Backup Slides

HPCA 2003Alaa Alameldeen and David Wood25 Effects of OS Scheduling

HPCA 2003Alaa Alameldeen and David Wood26 WCR Definition  Percentage of comparison simulation experiments that reach a wrong conclusion  The correct conclusion is the relationship between averages of the two populations  WCR can be used to estimate the wrong conclusion probability for single experiments

HPCA 2003Alaa Alameldeen and David Wood27 Confidence Intervals - Equations  The confidence interval for the mean of a normally distributed infinite population:  Sample Size needed to limit mean relative error to r:

HPCA 2003Alaa Alameldeen and David Wood28 Hypothesis Testing  Tests whether there is no difference between two population means Hypothesis: μ 32 = μ 64 tests whether the two means of the 32 and 64 ROB configurations are different Hypothesis: μ 32 = μ 64 tests whether the two means of the 32 and 64 ROB configurations are different  Hypothesis is tested using sample means and variances  If hypothesis rejected  Our conclusion is significant

HPCA 2003Alaa Alameldeen and David Wood29 Accounting for Time Variability  Is time variability caused by the same effects that cause space variability? Use Analysis of Variance (ANOVA) Use Analysis of Variance (ANOVA)  If time variability is caused by different effects, we need to obtain a time sample Observations obtained from different starting points Observations obtained from different starting points

HPCA 2003Alaa Alameldeen and David Wood30 Multi-threaded Workloads and Simulation  Multi-threaded workloads are important Workloads for commercial servers Workloads for commercial servers New architectures support multi-threading New architectures support multi-threading  Performance metrics are different from traditional benchmarks Throughput-oriented (transactions) Throughput-oriented (transactions) IPC is not appropriate (idle time!) IPC is not appropriate (idle time!)  Simulation Challenge: Comparing systems running multi-threaded applications

HPCA 2003Alaa Alameldeen and David Wood31 Simulation of Multi-threaded Workloads  Simulation is slow! We cannot simulate the whole workload We cannot simulate the whole workload  Solution: Run for a fixed number of transactions Run for a fixed number of transactions Measure the per-transaction runtime (cycles per transaction) Measure the per-transaction runtime (cycles per transaction) Use to compare different systems Use to compare different systems