CS203 – Advanced Computer Architecture Performance Evaluation.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Computer Abstractions and Technology
The University of Adelaide, School of Computer Science
Parallel computer architecture classification
Power calculation for transistor operation What will cause power consumption to increase? CS2710 Computer Organization1.
Performance Analysis of Multiprocessor Architectures
CS 6290 Evaluation & Metrics. Performance Two common measures –Latency (how long to do X) Also called response time and execution time –Throughput (how.
CS 203 A: Advanced Computer Architecture
1 ECE 570– Advanced Computer Architecture Dr. Patrick Chiang Winter 2013 Tues/Thurs 2-4PMPM.
1 Burroughs B5500 multiprocessor. These machines were designed to support HLLs, such as Algol. They used a stack architecture, but part of the stack was.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Chapter 4 Assessing and Understanding Performance
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Rung-Bin Lin Chapter 1: Fundamental of Computer Design1-1 Chapter 1. Fundamentals of Computer Design Introduction –Performance Improvement due to (1).
CptS / E E COMPUTER ARCHITECTURE
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Lecture 2: Fundamentals of Computer Design Kai Bu
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture 1 Overview of Computer Architecture TopicsOverview Readings: Chapter 1 August 24, 2015 CSCE 513 Computer Architecture.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 02: Fundamentals of Computer Design - Basics Kai Bu
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
The University of Adelaide, School of Computer Science
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining and Parallelism Mark Staveley
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 3: Computer Architectures
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
EGRE 426 Computer Organization and Design Chapter 4.
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Classification of parallel computers Limitations of parallel processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
What is *Computer Architecture*
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
4- Performance Analysis of Parallel Programs
Uniprocessor Performance
Morgan Kaufmann Publishers
The University of Adelaide, School of Computer Science
Performance of computer systems
The University of Adelaide, School of Computer Science
Computer Evolution and Performance
The University of Adelaide, School of Computer Science
Chapter 4 Multiprocessors
Performance of computer systems
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

CS203 – Advanced Computer Architecture Performance Evaluation

PERFORMANCE TRENDS 2Performance Trends

Clock Rate Historically the clock rates of microprocessors have increased exponentially Highest clock rate of intel processors in every year from 1990 to 2008 Due to process improvements Deeper pipeline Circuit design techniques If trend kept up, today’s clock rate would be > 30GHz! 3 Process Improvement Arch. & Circuit Improvement Performance Trends

Single Processor Performance 4 RISC Move to multi-processor (Power Wall) Multicore processor Instruction Level Parallelism (ILP) Performance Trends

Classes of Computers Personal Mobile Device (PMD) e.g. start phones, tablet computers Emphasis on energy efficiency and real-time Desktop Computing Emphasis on price-performance Servers Emphasis on availability, scalability, throughput Clusters / Warehouse Scale Computers Used for “Software as a Service (SaaS)” Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Embedded Computers Emphasis: price 5Performance Trends

Current Trends in Architecture Cannot continue to leverage Instruction-Level parallelism (ILP) Single processor performance improvement ended in 2003 due to the power wall New models for performance: Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP) These require explicit restructuring of the application 6Performance Trends

Parallelism Classes of parallelism in applications: Data-Level Parallelism Task-Level Parallelism Classes of architectural parallelism: Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism (Multi-processors) Request-Level Parallelism (Server clusters) 7Performance Trends

Flynn’s Taxonomy Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD) Vector architectures Multimedia extensions Graphics processor units Multiple instruction streams, single data stream (MISD) No commercial implementation Multiple instruction streams, multiple data streams (MIMD) Tightly-coupled MIMD Loosely-coupled MIMD 8Performance Trends

MEASURING PERFORMANCE 9Measuring Performance

Typical performance metrics: Response time Throughput Speedup of X relative to Y Execution time Y / Execution time X Execution time Wall clock time: includes all system overheads CPU time: only computation time Benchmarks Kernels (e.g. matrix multiply) Toy programs (e.g. sorting) Synthetic benchmarks (e.g. Dhrystone) Benchmark suites (e.g. SPEC06fp, TPC-C, SPLASH) 10Measuring Performance

Fundamental Equations of Performance We typically use IPC (instructions per cycle) 11Measuring Performance

More equations Different instruction types having different CPIs 12Measuring Performance

BENCHMARKS 13Benchmarks

What to use as a Benchmark? Real programs: Porting problem; too complex to understand. Kernels Computationally intense piece of real program. Benchmark suites Spec: standard performance evaluation corporation Scientific/engineeing/general purpose Integer and floating point New set every so many years (95,98,2000,2006) Tpc benchmarks: For commercial systems Tpc-b, tpc-c, tpc-h, and tpc-w Embedded benchmarks Media benchmarks Others, poor choices Toy benchmarks (e.g. quicksort, matrix multiply) Synthetic benchmarks (not real) 14Benchmarks

Aggregate performance measures For a group of programs, test suite Execution time: weighted arithmetic mean Program with longest execution time dominates Speedup: geometric mean 15Benchmarks

Geometric Mean Example Comparison independent of reference machine 16 Program AProgram BArithmetic Mean Machine 110 sec100 sec55 sec Machine 21 sec200 sec100.5 sec Reference 1100 sec10000 sec5050 sec Reference 2100 sec1000 sec550 sec Program AProgram BArithmeticGeometric Wrt Reference 1 (speedup) Machine Machine Wrt Reference 2 (speedup) Machine 110 Machine Benchmarks

PRINCIPLES OF COMPUTER DESIGN 17Principles of Computer Design

Take Advantage of Parallelism e.g. multiple processors, disks, memory banks, pipelining, multiple functional units Principle of Locality Reuse of data and instructions Focus on the Common Case Amdahl’s Law 18Principles of Computer Design

Amdahl’s Law Speedup is due to enhancement(E) Let F be the fraction where enhancement is applied Also, called parallel fraction and (1-F) as the serial fraction 19Principles of Computer Design

Amdahl’s Law Law of diminishing returns Program with execution time T Fraction f of the program can be sped up by a factor s New execution time, enhanced, is T e Optimize the common case! Execute rare case in software (e.g. exceptions) 20Principles of Computer Design

Gustafson’s Law As more cores are integrated, the workloads are also growing! Let s be the serial time of a program and p the time that can be done in parallel Let f = p/(s+p) 21 s p C cores Principles of Computer Design