ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.

Slides:



Advertisements
Similar presentations
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
Advertisements

Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
G. Alonso, D. Kossmann Systems Group
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
(C) 2002 Daniel SorinDuke Architecture Why Computer Architecture is Exciting and Challenging Daniel Sorin Department of Electrical & Computer Engineering.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
ECE669 L12: Interconnection Network Performance March 9, 2004 ECE 669 Parallel Computer Architecture Lecture 12 Interconnection Network Performance.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Performance Evaluation
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison
Chapter 4 Assessing and Understanding Performance
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Presented by Deepak Srinivasan Alaa Aladmeldeen, Milo Martin, Carl Mauer, Kevin Moore, Min Xu, Daniel Sorin, Mark Hill and David Wood Computer Sciences.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
1 Computer Performance: Metrics, Measurement, & Evaluation.
An Analytical Performance Model for Co-Management of Last-Level Cache and Bandwidth Sharing Taecheol Oh, Kiyeon Lee, and Sangyeun Cho Computer Science.
CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.
Computer Architecture Challenges Shriniwas Gadage.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Multi Core Processor Submitted by: Lizolen Pradhan
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Performance Evaluation of Computer Systems Introduction
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
A Model for Computational Science Investigations Supercomputing Challenge
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Welcome to CPS 210 Graduate Level Operating Systems –readings, discussions, and programming projects Systems Quals course –midterm and final exams Gateway.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Sunpyo Hong, Hyesoon Kim
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
CS203 – Advanced Computer Architecture Performance Evaluation.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
CMSC 611: Advanced Computer Architecture
CS203 – Advanced Computer Architecture
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 2: Performance Evaluation
September 2 Performance Read 3.1 through 3.4 for Tuesday
Network Performance and Quality of Service
Andy Wang CIS 5930 Computer Systems Performance Analysis
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Many-core Software Development Platforms
Performance of computer systems
Performance of computer systems
Architecture & System Performance
Performance of computer systems
Presentation transcript:

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel J. Sorin Duke University

2 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Outline Metrics Methodologies –Modeling –Simulation Workloads

3 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Performance Metrics How do we tell if our design is good? Performance metrics –Clock speed (gigahertz)? No! Why not? –Instructions per cycle? No! Why not? (tougher question) –Database transactions per second? What’s important? Depends on workload … –Latency  interactive computing –Throughput  batch jobs/queries –Availability  enterprise applications –Power  mobile computing –Cost-efficiency  everything but supercomputing

4 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Metrics and Units Latency (an aspect of performance) –Reponse time Throughput (another aspect of performance) –Transactions per cycle Availability –How many “nines” (e.g., 5 nines = % available) Power (energy) –Watts (joules) Hybrid metrics capture more than one aspect –Cost-efficiency: dollars-seconds –Power-delay (energy-delay): watts-seconds (joules-seconds) –Performability: combines performance with availability

5 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Secondary Metrics Metrics that we can use for insight, debugging, etc. –Quantify specific aspects of system, not holistic behavior Examples –Instructions per cycle (IPC) –Cache hit rates –Average memory request latency –Average network link utilization –Fraction of directory requests that require 3 hops –Etc. You can use these metrics to explain results –Otherwise, results are just inscrutable, unjustified numbers

6 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Comparing to Prior Work How does your idea compare to prior work? –This is how we show that our idea is worthy of publication –E.g., 50% better throughput on TPC-C, but with 20% more power Why is comparison difficult? –Impossible to exactly reproduce experimental setup Example differences in experimental setup –Different system model »Different ISA, microarchitecture, network, etc. –Different workloads (or same workloads compiled differently) »Different OS »Even for same exact application, can have different jobs running in the background (e.g., kernel daemons) –Different simulator (or different configuration of simulator) »Assumptions about latencies, bandwidths, etc.

7 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Fair Comparisons Ideally, we’d make perfectly fair comparisons –Compare “apples and apples” If impossible, then give benefit of doubt to prior work –Assumptions about prior work should be optimistic Assumptions about our work should be pessimistic –Don’t assume that our 4MB cache can be accessed in 1 cycle –Find the worst case scenario for our system –Assume that future trends will be less favorable than is likely Show that, even in our worst case, we still do well –Otherwise, readers will be less convinced

8 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Speedup (Wood & Hill) DISCUSSION

9 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Outline Metrics Evaluation Methodologies –Modeling –Simulation Workloads

10 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Ways to Evaluate New Architectures Simulating Modeling Building Tradeoff between three desired features

11 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Building Construct a hardware prototype Advantages +Way cool to show off hardware to friends +Runs quickly Disadvantages –Takes long time (grad student time!) to build –Expensive –Not flexible Generally too labor intensive for research studies

12 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Modeling Mathematically model the system –Use probabilities and/or queuing models Advantages +Very flexible +Very quick to develop +Runs quickly Disadvantages –Cannot capture effects of system details –Architects are skeptical of models Generally OK for back of the envelope estimates

13 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Simulating Write a program that mimics system behavior Advantages +Very flexible +Relatively quick to develop Disadvantages –Runs slowly (e.g., 30,000 times slower than hardware) Method of choice for most architectural research

14 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Simulation Challenges Simulator Workload System description Performance metrics Tough problems associated with each arrow!

15 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Applications to Simulate We care how system does on important applications We’ll talk about this in a few slides …

16 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Describing Simulated System How detailed must our simulator be? Model every transistor in the processor? –Would take too long Abstract away details of processor organization? –Could miss important effects of processor features –Could achieve wrong conclusion Need balance –Model in detail only where necessary –E.g., model memory system in detail, but abstract disks

17 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Analytic Model of Shared Memory System Queuing model can capture behavior of system PRESENTATION

18 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Simics Full System Simulator Simics is a full-system commercial simulator PRESENTATION

19 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Outline Metrics Methodologies –Modeling –Simulation Workloads

20 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Workloads We care how system does on important applications But who defines “important”? (I do!) Types of applications –Scientific (genomics, weather simulation, protein folding) –Commercial (database, web serving, application serving) –Desktop (office productivity software, multimedia) –Portable (voice recognition) –???

21 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 DEC/Compaq/Intel (?) Workload Analysis Commercial workloads are different from scientific PRESENTATION

22 (C) 2004 Daniel J. Sorin ECE 259 / CPS 221 Wisconsin Workload Analysis Commercial workloads are different from scientific Simulating them requires extra work PRESENTATION