Read Section 1.4, Section 1.7 (pp )

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Assessing and Understanding Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
ECE-C355 Computer Structures Winter 2008 Chapter 04: Understanding Performance Slides are adapted from Professor Mary Jane Irwin (
CSE431 L04 Understanding Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 04: Understanding Performance Mary Jane Irwin (
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Evaluating Performance
Computer Organization and Architecture 18 th March, 2008.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
1 Computer Performance: Metrics, Measurement, & Evaluation.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Lecture 2: Computer Performance
CS3350B Computer Architecture Winter 2015 Performance Metrics I Marc Moreno Maza
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Morgan Kaufmann Publishers
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
CS35101 – Computer Architecture Week 9: Understanding Performance Paul Durand ( ) [Adapted from M Irwin (
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
Computer Architecture & Operations I
Measuring Performance II and Logic Design
Computer Organization
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Assessing and Understanding Performance
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Architecture
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Performance.
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
Presentation transcript:

Read Section 1.4, Section 1.7 (pp. 48-49) Performance Read Section 1.4, Section 1.7 (pp. 48-49) Adapted from Slides by Prof. Mary Jane Irwin, Penn State University And Slides Supplied by the Publisher

Outline What is computer architecture? Classes of Computers Performance Performance Metrics The performance equation Measuring performance Improving performance: parallelism, locality, Amdahl's law

What is Computer Architecture? Old definition of computer architecture = Instruction set architecture, ISA

Instruction set architecture, ISA Organization of Programmable Storage Data Types & Data Structures: Encodings & Representations Instruction Formats Instruction Set Modes of Addressing and Accessing Data Items and Instructions How are “Exceptional Conditions” Handled

ISA vs. Computer Architecture Architect’s job much more than instruction set design technical hurdles today more challenging than those in instruction set design

Definition of Computer Architecture The science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer architecture >> ISA

Classes of Computers Desktop computers Servers Supercomputers Designed to deliver good performance to a single user at low cost usually executing 3rd party software, usually incorporating a graphics display, a keyboard, and a mouse Servers Used to run larger programs for multiple, simultaneous users typically accessed only via a network and that places a greater emphasis on dependability and security Supercomputers A high performance, high cost class of servers with hundreds to thousands of processors, terabytes (240 ≈1012 ) of memory and petabytes (250 ≈1015 ) of storage that are used for high-end scientific and engineering applications Embedded computers (processors) A computer inside another device used for running one predetermined application

Review: Some Basic Definitions Kilobyte – 210 or 1,024 bytes Megabyte– 220 or 1,048,576 bytes sometimes “rounded” to 106 or 1,000,000 bytes Gigabyte – 230 or 1,073,741,824 bytes sometimes rounded to 109 or 1,000,000,000 bytes Terabyte – 240 or 1,099,511,627,776 bytes sometimes rounded to 1012 or 1,000,000,000,000 bytes Petabyte – 250 or 1024 terabytes sometimes rounded to 1015 or 1,000,000,000,000,000 bytes Exabyte – 260 or 1024 petabytes Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes

Outline What is computer architecture? Performance Performance metrics The performance equation Measuring performance Improving performance: parallelism, locality, Amdahl's law

Performance Metrics Purchasing perspective Design perspective given a collection of computers, which has the best performance ? least cost ? best cost/performance? Design perspective faced with design options, which has the best performance improvement ? Both require basis for comparison metric for evaluation Or smallest/lightest Longest battery life Most reliable/durable (in space)

Response Time versus Throughput Response time (execution time): the time between the start and the completion of a task Important to individual users Throughput (bandwidth) – the total amount of work done in a given time Important to computer center managers Will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused on throughput

Defining (Speed) Performance To maximize performance, we need to minimize execution time Consider a computer X performanceX = 1 / execution_timeX If computer X is n times faster than computer Y, then performanceX execution_timeY -------------------- = --------------------- = n performanceY execution_timeX Increasing performance requires decreasing execution time Decreasing response time almost always improves throughput

Relative Performance Example If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B? We know that A is n times faster than B if performanceA execution_timeB -------------------- = --------------------- = n performanceB execution_timeA The performance ratio is 15 ------ = 1.5 10 So A is 1.5 times faster than B

Performance Factors CPU execution time of a program (CPU time): time the CPU spends running the program Does not include time waiting for Input or output (I/O) operations or running other programs or Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program or both. or the number of clock cycles required for a program or both.

Review: Machine Clock Rate Clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle time (clock period) CC = 1 / CR one clock period 10 nsec clock cycle => 100 MHz clock rate 5 nsec clock cycle => 200 MHz clock rate 2 nsec clock cycle => 500 MHz clock rate 1 nsec (10-9) clock cycle => 1 GHz (109) clock rate 500 psec clock cycle => 2 GHz clock rate 250 psec clock cycle => 4 GHz clock rate 200 psec clock cycle => 5 GHz clock rate A clock cycle is the basic unit of time to execute one operation/pipeline stage/etc.

Improving Performance Example A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must computer B have to run this program in 6 seconds? Note: Computer B will require 1.2 times as many clock cycles as computer A to run the program. For lecture

Improving Performance Example A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must computer B have to run this program in 6 seconds? Note: Computer B will require 1.2 times as many clock cycles as computer A to run the program. CPU timeA CPU clock cyclesA clock rateA = ------------------------------- For lecture CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec = 20 x 109 cycles CPU timeB 1.2 x 20 x 109 cycles clock rateB = ------------------------------- clock rateB 1.2 x 20 x 109 cycles 6 seconds = ------------------------------- = 4 GHz

Clock Cycles Per Instruction, CPI Not all instructions take the same amount of time to execute A program execution time is equals the number of instructions executed multiplied by the average time per instruction Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute CPI for this instruction class A B C CPI 1 2 3

Using the Performance Equation Computers A and B implement the same ISA. Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 for some program Computer B has a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program. Which computer is faster and by how much? For class handout

Using the Performance Equation Computer A clock cycle time = 250 ps ; CPIA = 2.0 Computer B clock cycle time = 500 ps ; CPIB = 1.2 Which computer is faster and by how much? Both A and B execute the same number of instructions, I, so CPU timeA = I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps For lecture Clearly, A is faster … by the ratio of execution times Clearly, A is faster … by the ratio of execution times performanceA execution_timeB 600 x I ps ------------------- = --------------------- = ---------------- = 1.2 performanceB execution_timeA 500 x I ps

Morgan Kaufmann Publishers April 14, 2017 CPI in More Detail If n different instruction types (classes) are executed by a program Each instruction of type i takes a different number of cycles to execute (CPIi) Hence, the average # of clocks per instruction, CPI Relative frequency of instruction type i Dr. W. Abu-Sufah 21 Chapter 1 — Computer Abstractions and Technology

Effective (Average) CPI Hence, computing the overall effective CPI is done by looking at the different types (classes) of instructions executed by the program and their individual cycle counts and then averaging Let n be the number of different instruction classes executed by a program Freqi is the relative frequency {percentage} of class i instructions executed CPIi is the average number of clock cycles per instruction for instruction class i Then we have: Overall effective CPI =  (CPIi x Freqi) i = 1 n The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs

THE Performance Equation Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle or Instruction_count x CPI clock_rate CPU time = ----------------------------------------------- These equations separate the three key factors that affect performance: Instruction_coun, CPI, and clock_rate Can measure the CPU execution time by running the program The clock rate is usually given Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details CPI varies by instruction type and ISA implementation for which we must know the implementation details Note that instruction count is dynamic – i.e., its not the number of lines in the code, but THE NUMBER OF INSTRUCTIONS EXECUTED

Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction _count CPI clock_cycl e Algorithm Programming language Compiler ISA Core organization Technology For class handout

Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction _count CPI clock_cycle Algorithm Programming language Compiler ISA Core organization Technology X X X X X X For lecture X X X X X X

A Simple Example Op Freq CPIi Freq x CPIi ALU 50% 1 . Load 20% 5 Store 10% 3 Branch 2  = How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once?

A Simple Example Op Freq CPIi Freq x CPIi ALU 50% 1 Load 20% 5 Store 10% 3 Branch 2  = .5 1.0 .3 .4 .5 .4 .3 .5 1.0 .3 .2 .25 1.0 .3 .4 2.2 1.6 2.0 1.95 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once? For lecture CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster

Before We Continue Lecture notes are posted on the class web-site usually before the lecture. Review them and try to get some understanding if possible before class. Read assigned textbook sections Homework1 has been posted since Friday, February 8. There are 10 problems to solve, several with many parts. The homework is time consuming and can't be solved in one night. You should have started working on them by now. Print your solution of the homework and bring it to class next Tuesday, February 19. Do not cheat. Cheating will be pursued and it defeats the purpose of trying to solve the homework.

Workloads and Benchmarks Benchmarks – a set of programs that form a “workload” specifically chosen to measure performance SPEC (System Performance Evaluation Cooperative) creates standard sets of benchmarks starting with SPEC89. The latest is SPEC CPU2006 which consists of 12 integer benchmarks (CINT2006) and 17 floating- point benchmarks (CFP2006). www.spec.org There are also benchmark collections for power workloads (SPECpower_ssj2008), for mail workloads (SPECmail2008), for multimedia workloads (mediabench), …

SPEC CINT2006 on Barcelona (CC = 0.4 x 109) See Slide notes Name ICx109 CPI ExTime RefTime SPEC ratio perl 21,118 0.75 637 9,770 15.3 bzip2 2,389 0.85 817 9,650 11.8 gcc 1,050 1.72 724 8,050 11.1 mcf 336 10.00 1,345 9,120 6.8 go 1,658 1.09 721 10,490 14.6 hmmer 2,783 0.80 890 9,330 10.5 sjeng 2,176 0.96 837 12,100 14.5 libquantum 1,623 1.61 1,047 20,720 19.8 h264avc 3,102 993 22,130 22.3 omnetpp 587 2.94 690 6,250 9.1 astar 1,082 1.79 773 7,020 xalancbmk 1,058 2.70 1,143 6,900 6.0 Geometric Mean 11.7 Perl – interpreted string processing Bzip2 – block-sorting compression Gcc – GNU C compiler Mcf – combinatorial optimization Go – Go came (AI) Hmmer – search gene sequence Sjeng – chess game (AI) Libquantum – quantum computer simulation H264avc – video compression Omnetpp – discret event simulation library Astar – games/path finding Xalancbmk – XML parsing High cache miss rates

Comparing and Summarizing Performance How do we summarize the performance for a benchmark set (e.g. SPEC CINT2006 ) with a single number? First the execution times are normalized giving the “SPEC ratio” (bigger is faster, i.e., SPEC ratio is the inverse of execution time) The SPEC ratios are then “averaged” using the geometric mean (GM) GM = n  SPEC ratioi i = 1 n

Comparing and Summarizing Performance (cont.) Guiding principle in reporting performance measurements is reproducibility List everything another experimenter would need to duplicate the experiment version of the operating system compiler settings input set used specific computer configuration (clock rate, cache sizes and speed, memory size and speed, etc.)

Summary: Evaluating ISAs Design-time metrics: Can the ISA be implemented, in how long, at what cost? Can the ISA be programmed? Ease of compilation? Static metrics: How many bytes does the program occupy in memory? Dynamic (run time) metrics: How many instructions are executed? How many bytes does the processor fetch to execute the program? How many clocks are required per instruction? How small a clock cycle is practical? Best Metric: Time to execute the program! CPI Inst. Count Cycle Time depends on the instructions set, the processor organization, and compilation techniques.

Morgan Kaufmann Publishers April 14, 2017 Performance Summary Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tclock Chapter 1 — Computer Abstractions and Technology