EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.

Slides:



Advertisements
Similar presentations
CS1104: Computer Organisation School of Computing National University of Singapore.
Advertisements

Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
1. 2 Performance  Today we’ll discuss issues related to performance: —Latency/Response Time/Execution Time vs. Throughput —How do you make a reasonable.
Computer Organization and Architecture 18 th March, 2008.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
CIS429.S00: Lec2- 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important quantitative.
ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CS61C L221 Performance © UC Regents 1 CS61C - Machine Structures Lecture 22 - Introduction to Performance November 17, 2000 David Patterson
CS430 – Computer Architecture Lecture - Introduction to Performance
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Lecture 2: Computer Performance
CENG 450 Computer Systems & Architecture Lecture 3 Amirali Baniasadi
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
Operation Frequency No. of Clock cycles ALU ops % 1 Loads 25% 2
Performance Chapter 4 P&H. Introduction How does one measure report and summarise performance? Complexity of modern systems make it very more difficult.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
Computer Performance Computer Engineering Department.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Computer Architecture CPSC 350
CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.
Cost and Performance.
EEL5708/Bölöni Lec 8.1 9/19/03 September, 2003 Lotzi Bölöni Fall 2003 EEL 5708 High Performance Computer Architecture Lecture 5 Intel 80x86.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
EEL5708/Bölöni Lec 1.1 8/27/03 August 25, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.
EGRE 426 Computer Organization and Design Chapter 4.
EEL5708/Bölöni Lec 3.1 Fall 2004 Sept 1, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
EEL-4713 Ann Gordon-Ross.1 EEL-4713 Computer Architecture Performance.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Two notions of performance
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
September 2 Performance Read 3.1 through 3.4 for Tuesday
ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance
How do we evaluate computer architectures?
Computer Architecture & Operations I
Prof. Hsien-Hsin Sean Lee
Computer Architecture CSCE 350
Computer Performance He said, to speed things up we need to squeeze the clock.
Computer Performance Read Chapter 4
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
Presentation transcript:

EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction

EEL5708/Bölöni Lec 1.2 Acknowledgements All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright , University of California Berkeley

EEL5708/Bölöni Lec 1.3 Case 1: VIA KT266 chipset for the Athlon processors

EEL5708/Bölöni Lec 1.4 Take 1: April 4, 2001 Tom’s Hardware ( Web site for hardware entusiasts. Review of the VIA Apollo KT266 chipset html The website’s conclusion: KT266 is still way too slow to challenge or even replace AMD's 760 chipset. As a conclusion, I could maybe say the typical words always used in early reviews "let's hope VIA will finally improve KT266". However, I have my doubts if this will happen any time soon. My advice to you is to either forget about DDR altogether for the time being, or to go for Athlon plus AMD760 and NOTHING ELSE.

EEL5708/Bölöni Lec 1.5 Take 2: One week later… Article title: “VIA Apollo KT266 revisited: Much Ado About Nothing” ( /index.html ) Another website ( ) obtains different results. An additional resistor (!) mounted on the motherboard and a different BIOS. Tom’s Hardware concludes that there are indeed improvements, but they are not significant enough to change the conclusion.

EEL5708/Bölöni Lec 1.6 Take 3: Five months later (September 2001) VIA KT266A is launched Tom’s Hardware: “’A’ stands for vastly improved performance” ( 0902/index.html) Changes: “improvements” to the memory controller. Processor frequency, bus frequency, etc. stay the same. Pin-by-pin compatible with the predecessors! Conclusion: “The performance of Apollo KT266A is nothing short of impressive.”

EEL5708/Bölöni Lec 1.7 Synthetic benchmarks:

EEL5708/Bölöni Lec 1.8 Real world benchmarks

EEL5708/Bölöni Lec 1.9 Some conclusions “Architecture” matters. Real world benchmarks less improvement than synthetic ones: Amdahl’s Law Which benchmark do I care about? (this time at least, they were consistent…) …

EEL5708/Bölöni Lec 1.10 Case 2: Video compression performance in Intel Pentium 4 vs. AMD Athlon

EEL5708/Bölöni Lec 1.11 Take 1 (11/20/00): First impressions Intel Pentium 4 is launched. The initial measurements show that it greatly overperforms the AMD Athlon for MPEG 4 video compression /index.html

EEL5708/Bölöni Lec 1.12 Take 1 (11/20/00): First impressions (cont’d)

EEL5708/Bölöni Lec 1.13 Take 2: New results force new conclusions Concerns are raised about the fact that the measurement was done with a low quality setting (MMX arithmetics) Repeating the measurements with floating point arithmetics, the relative performance was reversed /index.html

EEL5708/Bölöni Lec 1.14 Take 2 : New results force new conclusions (cont’d)

EEL5708/Bölöni Lec 1.15 Take 3: Intel engineers create an optimized version of the software As a response, Intel engineers created a modified version of the software: -recompiled it with higher optimizations. -rewritten parts of the code to use the new instruction set extensions (SSE2) The higher optimizations benefited both Intel and AMD processors (but Intel more) The SSE2 options reversed the performance ranking again. OBS: AMD engineers created an AMD optimized version, too, with significant improvements, but this did not change the rankings.

EEL5708/Bölöni Lec 1.16 Take 3: Intel engineers create an optimized version of the software

EEL5708/Bölöni Lec 1.17 Take 3 (cont’d)

EEL5708/Bölöni Lec 1.18 Case 2: Conclusions Real world benchmark, huge differences –Why? Software solution to a hardware problem? –Optimizing for the architecture –So, what if it is not open source? –Software development cycles… Picking the right architecture + understanding the architecture we have

EEL5708/Bölöni Lec 1.19 Review: Measuring performance

EEL5708/Bölöni Lec 1.20 Performance measures Time to execute a given program Number of programs which can be run in parallel Responsiveness (user interfaces) Predictable execution time (for real time systems) Energy consumption (mostly for portables, but check the new Google and Microsoft data centers…) And so on…

EEL5708/Bölöni Lec 1.21 Which is faster? (Latency vs throughput) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200

EEL5708/Bölöni Lec 1.22 Performance(X) Execution_time(Y) n == Performance(Y) Execution_time(X) Definitions Performance is in units of things per sec –bigger is better If we are primarily concerned with response time –performance(x) = 1 execution_time(x) " X is n times faster than Y" means

EEL5708/Bölöni Lec 1.23 Computer Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time

EEL5708/Bölöni Lec 1.24 Cycles Per Instruction (Throughput) “Instruction Frequency” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Average Cycles per Instruction”

EEL5708/Bölöni Lec 1.25 Example: Calculating CPI bottom up Typical Mix of instruction types in program Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5