Performance David Monismith Jan. 16, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

The Central Processing Unit (CPU) Understanding Computers.
CS1104: Computer Organisation School of Computing National University of Singapore.
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Lecture: Pipelining Basics
Performance Evaluation of Architectures Vittorio Zaccaria.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture: Pipelining Basics
Lecture 3: Computer Performance
CS430 – Computer Architecture Lecture - Introduction to Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Measuring Performance Chris Clack B261 Systems Architecture.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1/18/02CSE Performance I Measuring Performance Part I.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.
Introduction Computer Organization Spring 1436/37H (2015/16G) Dr. Mohammed Sinky Computer Architecture
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Performance. Moore's Law Moore's Law Related Curves.
Computer Organization Exam Review CS345 David Monismith.
Two notions of performance
Compilers can have a profound impact on the performance of an application on given a processor. This problem will explore the impact compilers have on.
The Central Processing Unit (CPU)
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Lecture: Pipelining Basics
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
CMSC 611: Advanced Computer Architecture
Central Processing Unit
Morgan Kaufmann Publishers Computer Performance
Performance Cycle time of a computer CPU speed speed = 1 / cycle time
CMSC 611: Advanced Computer Architecture
Parameters that affect it How to improve it and by how much
Performance.
Lecture: Pipelining Basics
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS2100 Computer Organisation
Presentation transcript:

Performance David Monismith Jan. 16, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text

Outline Last time – Finished history and a beginning on performance evaluation. This time – Continued Performance evaluation

Recall Performance = how good is it? But need to know what to measure and how to measure it. Discussed Speed, Operations per second, Efficiency (e.g. Power) Other measures include bandwidth, latency, and cost – Bandwidth - How much data can be transmitted over a line per unit time – Latency (Memory, Disk) - How long does it take to retrieve data (e.g. disk access time) – Resource Cost - $/GB, $/unit, $/GHz

Recall Performance = 1/execution time Speedup = performance_of_computer_1/performance_o f_computer_2

Dividing Up Execution Time Execution time can be divided into two parts The whole is called user time or wall time – This is the time the user spends waiting for the program to finish. – The time the program actually runs is CPU time – The time used for the system services (OS, IO, waits, etc.) is called System CPU time Try running top on a Linux or Mac machine at the terminal to see how much CPU time is used in running programs For this class we will only be concerned with user time as we will assume System CPU time is small enough to be negligible

Clock Time Hz - Hertz = cycles per second MHz Hz, GHz - 1 billion Hz A Clock signal can be thought of as a square wave (an on/off pulse) – The CPU clock governs how quickly data passes through CPU – Data is part of assembly instructions – Instruction delegates the data path

Clock Time The Clock signal delays data from moving from component to component in CPU (i.e. register to ALU/CU, ALU to CU, data to memory, data from memory) Recall from math, waveforms have a period and frequency – frequency = clock_speed – period = 1/frequency (i.e. seconds per cycle) For a 1MHz processor frequency, period is 1 microsecond (1e-6) For a 1GHz processor frequency, period is 1 nanosecond (1e-9) What is the period for a 250MHz processor (4 ns)? 500MHz (2 ns)?

Clock Waveforms | | High in the waveform above represents on and low represents off. Note that the wave form above could use zero/off to represent time in doing work in a CPU component and one/on to represent transition time. Transition time and computation time are not necessarily equivalent.

Instructions Instructions can be very different from machine to machine – RISC vs. CISC machines Some processors use same # of cycles per instr Others have differing instruction counts Simple computation for time CPU performs processing CPU Time = total CPU cycles * cycle time

Analytical performance calculations may answer one of a few questions: How long will a program run? What happens if we improve performance/want a certain performance? (e.g. what if we want a program to run in 1.0s) How can we make two machines that require a different number of cycles per instruction (CPI) to run a program in the same amount of time?

Data for such problems may include: A list of instructions and cycles required by each A clock speed or period Average CPI (Clocks/Instruction) for a specific program Sample code with a CPI table Answer might require run time, instruction count, etc.

Run Time What is the run time of a program requiring 1,250,000 cycles on a 50MHz machine? 1 cycle = 1/50,000,000 = 20ns 20ns*1,250,000 = 0.025s

CPI Equation Execution time = # of instructions * cycles/instruction * time/cycle This is tool for understanding tradeoffs in the design of instruction sets, processor pipelines, and memory system organizations.

Cycles Per Instruction Example Given a program with an average CPI of 2.5 for 20,000 instructions that runs on a processor with a 10ns clock, find the clock frequency and program runtime.

Solution Frequency = 1/1e-8s --> 1e8 Hz --> 100 MHz Total cycles = 2.5 cycles/instr * 20,000 instr = 50,000 cycles Run time = 50,000 cycles * 10ns/cycle = 500,000ns = 0.5ms

Finding the Runtime Example Given a program with 1,000,000 instructions costing 6 cycles each and 2,000,000 instructions costing 3 clocks cycles each, find the frequency needed to complete the program in 1s and 0.5s.

Solution 6*1,000,000+3*2,000,000 = 12,000,000 cycles To complete in 1s, divide by 1s to get 12,000,000 cycles/s = 12 MHz To complete in 0.5s, divide by 0.5s to get 24,000,000 cycles/s = 24 MHz