A Mathematical Model for Balancing Co-Phase Effects in Simulated Multithreaded Systems Joshua L. Kihm, Tipp Moseley, and Dan Connors University of Colorado.

Slides:



Advertisements
Similar presentations
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
Advertisements

SHREYAS PARNERKAR. Motivation Texture analysis is important in many applications of computer image analysis for classification or segmentation of images.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.
FACTORIAL ANOVA Overview of Factorial ANOVA Factorial Designs Types of Effects Assumptions Analyzing the Variance Regression Equation Fixed and Random.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Experimental Design, Response Surface Analysis, and Optimization
Continuous-Time Markov Chains Nur Aini Masruroh. LOGO Introduction  A continuous-time Markov chain is a stochastic process having the Markovian property.
G. Alonso, D. Kossmann Systems Group
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Discovery of Locality-Improving Refactorings by Reuse Path Analysis – Kristof Beyls – HPCC pag. 1 Discovery of Locality-Improving Refactorings.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
FACTORIAL ANOVA.
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.
Colorado Computer Architecture Research Group Architectural Support for Enhanced SMT Job Scheduling Alex Settle Joshua Kihm Andy Janiszewski Daniel A.
CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Parallel Application Memory Scheduling Eiman Ebrahimi * Rustam Miftakhutdinov *, Chris Fallin ‡ Chang Joo Lee * +, Jose Joao * Onur Mutlu ‡, Yale N. Patt.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Slides 13b: Time-Series Models; Measuring Forecast Error
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Evaluating the Error Resilience of Parallel Programs Bo Fang, Karthik Pattabiraman, Matei Ripeanu, The University of British Columbia Sudhanva Gurumurthi.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen.
1. THE SIGNIFICANCE OF ECONOMIC GROWTH Learning Objectives 1.Define economic growth and explain it using the production possibilities model and the concept.
SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.
Chapter 6 Scheduling. Basic concepts Goal is maximum utilization –what does this mean? –cpu pegged at 100% ?? Most programs are I/O bound Thus some other.
Copyright © 2010 Pearson Education, Inc. All rights reserved Sec
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
MBA.782.ForecastingCAJ Demand Management Qualitative Methods of Forecasting Quantitative Methods of Forecasting Causal Relationship Forecasting Focus.
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction University of California MICRO ’03 Presented by Jinho Seol.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
H. SAIBI November 25, Outline Generalities Superposition of waves Superposition of the wave equation Interference of harmonic waves.
Introduction to Real-Time Systems
Introduction The rate of change is a ratio that describes how much one quantity changes with respect to the change in another quantity. For a function,
Why The Bretz et al Examples Failed to Work In their discussion in the Biometrical Journal, Bretz et al. provide examples where the implementation of the.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
ANOVA and Multiple Comparison Tests
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
The Pure Birth Process Derivation of the Poisson Probability Distribution Assumptions events occur completely at random the probability of an event occurring.
SUR-2250 Error Theory.
OPERATING SYSTEMS CS 3502 Fall 2017
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
CHAPTER 29: Multiple Regression*
Estimates and 95% CIs of between- and within-pair variations for SS and OS twin pairs and achievement test scores in mathematics and reading assessed in.
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Replicated Binary Designs
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

A Mathematical Model for Balancing Co-Phase Effects in Simulated Multithreaded Systems Joshua L. Kihm, Tipp Moseley, and Dan Connors University of Colorado at Boulder

Exploiting Phase Behavior for Efficient Architecture Simulation Program behavior patterns, or Phases, can be exploited for efficient simulation [Simpoint-Sherwood, et al. PACT ’01] –Capture repeating phase and eliminate simulation time or direct detailed simulation Industry trends towards multithreaded processors –In a multithreaded system, execution is characterized by a combination of phases between co-resident threads, called a Co-Phase [VanBiesbrouk, et al., ISPASS ’04] –Phase exploitation more difficult for simulation and design of multithreaded systems since the individual phases interact in unique ways

Program Execution Terminology Period ACAAAAAAAAAABBBB BCCCCPhase Interval PERIOD – A segment of program execution of a given length (one OS scheduling period in this work) PHASE – A set of periods with similar behavior INTERVAL – A set of consecutive periods with the same phase (one occurrence of a phase)

Effects of Co-phases 181.mcf and 186.crafty Relative progress of threads is determined by individual thread behavior and inter-thread interference As Co-Phase changes, so does interference and performance Data from Pentium-4 (Northwood) system illustrates co-phase effects and transitions [Graph format from VanBiesbrouk] What if we start here? Or here?

Problem Statement Variation in offset between threads causes variation in which co-phases are encountered and their relative importance –>15% standard deviation in IPC for some combinations. Offset is caused by: –Start Times –OS Scheduling –Simulation Error The average performance must be determined in order to reflect real system performance where the relative position of threads will be randomized

Example Analysis of Pentium-4 Data Total ST runtime HT performs below ST! Best performance at high offset

Motivation (Methodology) Tested on implemented hardware –Intel Pentium-4 Northwood with Hyperthreading Used 5 SPEC CPU 2000 benchmarks –188.ammp, 179.art, 186.crafty, 252.eon, 181.mcf –Long-running benchmarks Offsets of –100s,-90s, -80s, … +80s, +90s, +100s (21 tests per pairing)

Performance Variance Due to Offset Percent standard deviation Variation is high for many metrics Self-pairings have high variation

Co-Phase Variance Difference in portion of time spent in each co-phase Co-phase mix changes with offset

Conceptual Model The time spent in each co-phase interval will determine overall performance The amount of time in the co-phase interval is dependent on each thread’s: –Performance in co-phase –Length of the interval –Number of operations already completed in the current interval of each thread

Determining the Time in Co- Phase Interval Interval length and co-phase performance are constant, but need to be determined ahead of time *Assumption of phase-based simulation The number of a operations already completed is a function of previous performance and co-phase profile

Determining the Time in a Co- Phase Interval Offset Time in Interval Interval i runs in its entirety Interval is not encountered Similar case for thread Y Part of Interval occurs (Monotonic) Overall case is the minimum Thread Y changes phase first Next interval is (i,j+1) Thread X changes phase first Next interval is (i+1,j) Area under the curve is proportional to average length of the interval

Mathematical Model Thread X finishes first Thread Y finishes first Performance in Co-Phase Number of operations in interval Number of operations yet to complete in interval Number of operations already completed in Interval

Start-up Intervals Interval lengths are dependent on previous intervals (the total number of retired operations) all the way back to the start of execution of the thread –Some model is needed to simulate the number of operations difference between thread Model based on single-threaded behavior *Assume that single phase behavior is indicative of average co-phase behavior

Deriving Start-up Intervals Offset Time in Interval Length of Phase Interval Interval is never entered Interval doesn’t occurCo-phase i,1Next Startup interval i+1,0 i+1,0 i-1,0 Interval length equals offset M=1 Length of phase i

Mathematical Model (Start-up Interval) Length of intervals up to “i” Length of intervals up to and including “i” Partial completion Interval not encountered Full completion

Example analysis of Pentium-4 Data First two intervals of 186.crafty and 188.art

Example Analysis of Pentium-4 Data Run time of crafty Run time of art Total ST runtime Art Phase 2 causes heavy interference HT performs below ST! Best performance at high offset

Extensions to More Threads One thread is “reference” –Arbitrarily chosen –Number of variables grows linearly Concepts and equations easily extend to more threads

Conclusions Offset causes variations in co-phase mix and therefore performance –Average 6.7% standard deviation in IPC A complete picture of performance can only be gained through looking at more than “0” offset Relative importance of co-phases can be determined mathematically