Computer Performance Computer Engineering Department.

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Computer Abstractions and Technology
Power calculation for transistor operation What will cause power consumption to increase? CS2710 Computer Organization1.
810:142 Lecture 2: Performance Fall 2006 Chapter 4: Performance Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design,
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Lecture 7: 9/17/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
1 Recap. 2 Measuring Performance  A computer user: response time (execution time).  A computer center manager - throughput - the total amount of work.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 4 Assessing and Understanding Performance
Gordon Moore Gordon Moore, cofounder of Intel 1965: 2 x trans. per chip/year After 1970: 2 x trans. per chip/1.5year 摩爾定律.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
1 Computer Performance: Metrics, Measurement, & Evaluation.
CSE 340 Computer Architecture Summer 2014 Understanding Performance
Lecture 2: Computer Performance
Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.
BİL 221 Bilgisayar Yapısı Lab. – 1: Benchmarking.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
EGRE 426 Computer Organization and Design Chapter 4.
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
Performance COE 301 / ICS 233 Computer Organization Prof. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Computer Architecture & Operations I
Measuring Performance II and Logic Design
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Lecture 3: MIPS Instruction Set
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Uniprocessor Performance
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Performance of computer systems
Lecture 3: MIPS Instruction Set
CMSC 611: Advanced Computer Architecture
Performance of computer systems
Computer Organization and Design Chapter 4
Presentation transcript:

Computer Performance Computer Engineering Department

Case Study A company wants to re-design its computer M BASE (5 GHz) to beat the competition, using a hardware team and a compiler team. Instruction CPI i Frequency class A 2 40% B 3 25% C 3 25% D 5 10% By optimizing the hardware and changing the clock to 6 GHz Instruction CPI i Frequency class A 2 40% B 2 25% C 3 25% D 4 10%

Case Study - continued The CPI for each machine is CPI MBASE = 2x x0.25+3x0.25+5x0.1 = 2.8 cycles/instr. CPI MOPT = 2x x0.25+4x0.1 = 2.45 cycles/instr. The MIPS for each machine are MIPS = # Instructions = # Instructions Execution time # CPU cycles/frequency MIPS = Clock frequency (Million cycles/sec) CPI MIPS MBASE = 5 x 10 3 = 1,785 MIPS 2.8 MIPS MOPT = 6x 10 3 = 2,429 MIPS MIPS MOPT = 2449 = MIPS MBASE 1785

Case Study - continued The Compiler team will leave the architecture unchanged (5 GHz clock), but wants to reduce the number of instructions when the high level code is converted to assembly language. Instruction Class % Instruction to Execute vs. Base A 90% B 90% C 85% D 95% So the ratio of instructions overall is =.9x.4+.9x x x.1 = 0.81 The new CPI = 2x.4x.9 + 3x.25x.9 + 3x.25x x.1x.95 =

Case Study - continued The resultant speed up from Compiler optimization is CPU time MBASE = Inst. Count x CPI = Inst. Count x 2.8 Clock frequency Clock frequency CPU time MOPT = Inst. Count x 0.81x3.1 = Instr. Count x 2.5 Clock frequency Clock frequency So the speed up is CPU time MBASE = 2.8 = 1.12 (or 12% improvement) CPU time MOPT 2.5 If BOTH hardware and software are optimized, CPI MBOTH = (2x0.4x0.9+2x0.25x x0.85+4x0.1x0.95)/0.81 So CPI MBOTH = 2.7 cycles/instruction

Case Study - continued The resultant speed up from optimizing BOTH hardware and software CPU time MBASE = Clock frequency BOTH CPI BASE = 6 x10 9 x 2.8 CPU time MBOTH 0.81 Clock frequency BASE CPI BOTH 4.05x or 54% improvement The improvements take time… and the competition advances too Optimization Method Time taken Improvement Hardware 6 months 37% Compiler 6 months 12% Both 8 months 54% We know that CPU performance grows 50%/year or 3.8% /month

Case Study - conclusions So the competition will have a CPU performance increase in six months of (1.038) 6 = 1.25 In eight months the CPU performance will grow (1.038) 8 = 1.35 So only optimizing the compiler will not be sufficient either M OPT or M BOTH is the way to go!

Another way to judge performance- Benchmarks  These are libraries of programs that designers and consumers run on various computers to compare their performance.  They emulate a workload similar to the application that the consumer intends to use the computer for, or the designer wants to optimize for.  One advantage of benchmarks is reproducibility such that two or more designs can be compared before a computer hits the market;  To assure objectivity benchmarks are established by an independent committee.

Benchmarks - continued  This organization is the Standard Performance Evaluation Corporation (SPEC)  They publish benchmark results for CPUs, as well as graphics cards, web servers and other architectures.  Since this is a fast-changing field, so do the benchmark ( for CPUs we had SPEC CPU95, which was replaced by SPEC CPU2000, CPU2004 and now SPEC CPU2006)  For servers they used SPECweb99 now replaced by SPECweb2005

Benchmarks - continued  Regardless of version and targeted hardware, benchmarks are a collection of programs, not just one. Since each benchmark program (within a given benchmark library) is different, results need to be summarized.  How is execution time used with benchmarks?  Example Machine A Machine B Benchmark program Benchmark program Benchmark program Total execution time (sec)

Benchmarks - continued  Performance A/Performance B = Exec. Time B/Exec. Time A = 650/1510 = 0.43 or Performance B = 2.32 Performance A  Thus Machine B is more than 2 times better than A, even though in two of the Benchmark programs Machine A was faster.  Thus total execution time is an indicator of performance if each of the benchmark programs is executed once (or an equal number of times).  Another measure is arithmetic mean = Sum Time i Where Time i is the time taken to execute n program i and n is the total number of programs in the benchmark

Benchmarks - continued  If not all programs in the benchmark are executed the same number of times, then we need to use a weighted Arithmetic mean = Sum (W i Times i )/n where W i is the weight assigned to the program i of the benchmark.  A normalized execution time is the ratio of the time taken to execute a given program on a given computer versus the same program being executed by a “reference” computer.  A better way to gauge performance is to use the Geometric mean of normalized execution time. sqrt n ( a 1 x a 2 x …… x a n ), where a i = execution time ratio for program i out of n programs.

Benchmarks - continued  The number of programs has grown in SPEC 2000 to 12 integer programs and 14 floating point programs  Additional reading

Benchmarks - continued

Benchmark Comparison (on SPEC CPU2000) The comparison of Pentium III and Pentium IVs  Both scale linearly with clock rate (aggressive caching reduces memory penalty)  Pentium 4 uses different pipeline and instructions which boost fp computations

Benchmarks and Energy efficiency  Reducing power means reducing voltage and/or reducing clock frequency – a technique used in laptops and other mobile applications;  Processors then have three modes: max clock, adaptive clock, minimum clock (minimum power).

Benchmarks and Energy efficiency  Energy efficiency= performance/avg. power consumption (watts);  Pentium M (part of Centrino)– designed from the start for mobile computing has superior energy efficiency vs. the Pentium III-M and Pentium 4-M which are modified versions of the standard processors 1 GHz to 2.26 GHz depending on voltage

Dual-core Architecture Places two processors on a single chip (ex. Intel Core Duo).

Benchmarks - continued  A normalized execution time is the ratio of the time taken to execute a given program on a given computer versus the same program being executed by a “reference” computer.  A better way to gauge performance is to use the Geometric mean of normalized execution time. sqrt n ( a 1 x a 2 x …… x a n ), where a i = execution time ratio for program i out of n programs.

Benchmarks - continued Spec CPU2006 has 13 integer tasks (Standard Performance Evaluation Co.) and 18 floating point tasks. The elapsed time in seconds for each of the benchmarks in the CINT2006 or CFP2006 suite is given and the ratio to the reference machine (a Sun UltraSparc II system at 296MHz), is calculated. The SPECint_base2006 and SPECfp_base2006 metrics are calculated as a Geometric Mean of the individual ratios, where each ratio is based on the median execution time from three runs. SPEC CPU2006 Benchmark Descriptions

Spec CPU2006 for Multi-core CPUs System name ProcessorSpeed Results CoresChipsCores/ chip Threads/ core BasePeak (optimized compiler) AMD Opteron 890, 2.8 GHz Intel Dual-Core Itanium 2 1.4GHz Intel Xeon 5160, 3.00 GHz Intel Xeon processor X5365, 3.0 GHz, Compared to a reference machine 296 MHz UltraSPARC II processor - reference

Multi-core Benchmarks

Evaluation Summary Actual Target WorkloadFull Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks ProsCons representative very specific non-portable difficult to run, or measure portable widely used improvements useful in reality easy to run, early in design cycle identify peak capability and potential bottlenecks less representative easy to “fool” “peak” may be a long way from application performance

Additional readings  The Efficeon product sheet at ocessor.pdf ocessor.pdf  Multi-Core Processor Architecture Explained na/eng/ htm?page=2&=prn na/eng/ htm?page=2&=prn  Performance Scaling in the Multi-Core Era na/eng/dc/threading/ htm na/eng/dc/threading/ htm