CMSC 611: Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Performance Analysis of Multiprocessor Architectures
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.5 Comparing and Summarizing Performance.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Chapter 4 Assessing and Understanding Performance
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
SPEC 2006 CSE 820. Michigan State University Computer Science and Engineering Q1. What is SPEC? SPEC is the Standard Performance Evaluation Corporation.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Computer Performance Computer Engineering Department.
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Lecture 9: 9/24/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
Measuring Performance II and Logic Design
CMSC 611: Advanced Computer Architecture
CS203 – Advanced Computer Architecture
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
Execution time Execution Time (processor-related) = IC x CPI x T
CMSC 611 Advanced Computer Arch.
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611 Advanced Computer Arch.
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance of computer systems
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
January 25 Did you get mail from Chun-Fa about assignment grades?
Parameters that affect it How to improve it and by how much
Computer Organization and Design Chapter 4
Presentation transcript:

CMSC 611: Advanced Computer Architecture Getting Data: Benchmarks, Simulation & Profiling Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / © 2003 Elsevier Science

Performance Variations Performance is dependent on workload Task dependent Need a measure of workload Best = run your program Often cannot Benchmark = “typical” workload Standardized for comparison

Synthetic Benchmarks Synthetic benchmarks are artificial programs that are constructed to match the characteristics of large set of programs Whetstone (scientific programs), Dhrystone (systems programs), LINPACK (linear algebra), …

Synthetic Benchmark Drawbacks They may not reflect the user interest since they are not real applications They may not reflect real program behavior (e.g. memory access pattern) Compiler and hardware can inflate the performance of these programs far beyond what the same optimization can achieve for real-programs

Application Benchmarks Real applications typical of expected workload Applications and mix important

The SPEC Benchmarks System Performance Evaluation Cooperative Suite of benchmarks Created by a set of companies to improve the measurement and reporting of CPU performance SPEC2017 is the latest suite SPEC speed and SPEC rate Integer and Float 10 programs per set Since SPEC requires running applications on real hardware, the memory system has a significant effect on performance Reported with results

Performance Reports Hardware Software (with versions) Results CPU model, speed, cores, cache Memory, storage Software (with versions) OS, compiler Firmware, filesystem Results 3 reported per test, use median Time and speedup vs. reference platform Guiding principle is reproducibility (report environment & experiments setup)

The SPEC Benchmarks Bigger numeric values of the SPEC ratio indicate faster machine “historical” reference machine Sun Fire V490 w/ 2.1 GHz Ultra-SPARC-IV+ 2006 update of 1997 300 MHz UltraSparc II

Comparing & Summarizing Performance Wrong summary can present a confusing picture A is 10 times faster than B for program 1 B is 10 times faster than A for program 2 Total execution time is a consistent summary measure Relative execution times for the same workload Assuming that programs 1 and 2 are executing for the same number of times on computers A and B Execution time is the only valid and unimpeachable measure of performance

Performance Summary Where: n is the number of programs executed wi is a weighting factor that indicates the frequency of executing program i with and Weighted arithmetic means summarize performance while tracking exec. time Never use AM for normalizing time relative to a reference machine

Effect of Compilation App. and arch. specific optimization can dramatically impact performance

Price-Performance Metric Prices reflects those of July 2001 SPECbase CINT2000 SPEC CINT2000 per $1000 in price Different results are obtained for other benchmarks, e.g. SPEC CFP2000 With the exception of the Sunblade price-performance metrics were consistent with performance

Simulation Model effects of hardware Limitation Bonus Not real hardware Only as accurate as the model Runs 10-100x slower Bonus Can compare options and test new ones Overall picture likely pretty close

Valgrind Open-source profiler (win/mac/linux) Runs unmodified x86 programs JIT compiles x86 to intermediate code Tools add tracking code Compiled back to x86 to run

Valgrind tools Tools for cache, branches, memory, … Cache Branching 2 levels of cache: L1 & lowest level (e.g. L3) Compare cache sizes, strategies Branching Conditional & indirect Cycle counts

Instrumented Profiling Modify program when compiling gprof compiler flags Manual modifications Add timers to code Add simulation to class members

Statistical Profiling Periodically interrupt program See where it is and what’s happening Hardware counters help Get real data for cache, branch, CPI, … Need to run longer to get valid data Can start & stop mid-run

Statistical Profilers Vtune (Intel) Windows only $$$, but educational trials CodeAnalyst / CodeXL (AMD) Windows & Linux Xcode Instruments (Apple) Mac only gprof (anything using gcc) Statistical and instrumented modes