9/23/2004Lec 1-21 CS 203A Advanced Computer Architecture Instructor: L. N. Bhuyan Lecture 1.

Slides:

Advertisements

Similar presentations

Computer Abstractions and Technology

Advertisements

Performance Evaluation of Architectures Vittorio Zaccaria.

TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.

ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.

CS 203 A: Advanced Computer Architecture

2-1 ECE 361 ECE C61 Computer Architecture Lecture 2 – performance Prof. Alok N. Choudhary

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

ENGS 116 Lecture 21 Performance and Quantitative Principles Vincent H. Berk September 26 th, 2008 Reading for today: Chapter , Amdahl article.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.

CS 161 Design and Architecture of Computer Systems Lecture 1 Instructor: L.N. Bhuyan ( Adapted from notes by Dave Patterson.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

Computer Performance Evaluation: Cycles Per Instruction (CPI)

©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan

Chapter 4 Assessing and Understanding Performance

1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.

Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.

1 Computer Performance: Metrics, Measurement, & Evaluation.

Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.

Lecture 2: Computer Performance

Computer Performance Computer Engineering Department.

Memory/Storage Architecture Lab Computer Architecture Performance.

Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.

1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.

Computer Architecture

1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Computer Architecture CPSC 350

Cost and Performance.

1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.

Performance Performance

TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

Performance Analysis Topics Measuring performance of systems Reasoning about performance Amdahl’s law Systems I.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,

Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

CS203 – Advanced Computer Architecture Performance Evaluation.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.

June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.

Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.

CpE 442 Introduction to Computer Architecture The Role of Performance

Measuring Performance II and Logic Design

CS203 – Advanced Computer Architecture

What is *Computer Architecture*

Lecture 2: Performance Evaluation

Computer Architecture & Operations I

CS161 – Design and Architecture of Computer Systems

ECE 4100/6100 Advanced Computer Architecture Lecture 1 Performance

Performance Performance The CPU Performance Equation:

Morgan Kaufmann Publishers

Computer Architecture CSCE 350

CMSC 611: Advanced Computer Architecture

Performance of computer systems

August 30, 2000 Prof. John Kubiatowicz

Performance of computer systems

The University of Adelaide, School of Computer Science

Utsunomiya University

Presentation transcript:

9/23/2004Lec 1-21 CS 203A Advanced Computer Architecture Instructor: L. N. Bhuyan Lecture 1

9/23/2004Lec 1-22 Instructor Information Laxmi Narayan Bhuyan Office: Engg.II Room Tel: (951) Office Times: W, pm

9/23/2004Lec 1-23 Course Syllabus Introduction, Performance, Instruction Set, Pipelining – Appendix A Instruction level parallelism, Dynamic scheduling, Branch Prediction and Speculation – Ch 2 Text Limits on ILP and Software Approaches – Ch 3 Multiprocessors, Thread-Level Parallelism–Ch 4 Memory Hierarchy – Ch 5 I/O Architectures – Papers Text: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufman Publisher – Fourth Editon Prerequisite: CS 161

9/23/2004Lec 1-24 Course Details Grading: Based on Curve Test1: 35 points Test 2: 35 points Project: 30 points

9/23/2004Lec 1-25 What is *Computer Architecture* Computer Architecture = Instruction Set Architecture + Organization + Hardware + …

9/23/2004Lec 1-26 The Instruction Set: a Critical Interface instruction set software hardware The actual programmer visible instruction set

9/23/2004Lec 1-27 Instruction-Set Processor Design Architecture(ISA) programmer/compiler view –“functional appearance to its immediate user/system programmer” –Opcodes, addressing modes, architected registers, IEEE floating point Implementation (µarchitecture) processor designer/view –“logical structure or organization that performs the architecture” –Pipelining, functional units, caches, physical registers Realization(chip) chip/system designer view –“physical structure that embodies the implementation” –Gates, cells, transistors, wires

9/23/2004Lec 1-28 Hardware Trends in Technology (Section 1.4 and Figure 1.9): –Feature size (10 microns in 1971 to 0.18 microns in 2001, to in 2010!!!) Minimum size of a transistor or a wire in either the x or y dimension –Logic designs –Packaging technology –Clock rate –Supply voltage Moore’s Law – Number of transistors doubles every 1.5 years (due to smaller feature size and larger die size)

9/23/2004Lec 1-29 Relationship Between the Three Aspects Processors having identical ISA may be very different in organization. Processors with identical ISA and nearly identical organization are still not nearly identical. –e.g. Pentium II and Celeron are nearly identical but differ at clock rates and memory systems  Architecture covers all three aspects.

9/23/2004Lec Applications and Requirements Scientific/numerical: weather prediction, molecular modeling –Need: large memory, floating-point arithmetic Commercial: inventory, payroll, web serving, e-commerce –Need: integer arithmetic, high I/O Embedded: automobile engines, microwave, PDAs –Need: low power, low cost, interrupt driven Network computing: Web, Security, multimedia, games, entertainment –Need: high data bandwidth, application processing, graphics

Network bandwidth outpaces Moore’s law GHz and Gbps Time /7 Network bandwidth Moore’s Law TCP requirements Rule of thumb: 1GHz for 1Gbps Moore’s Law

9/23/2004Lec Classes of Computers High performance (supercomputers) –Supercomputers – Cray T-90, SGI Altix –Massively parallel computers – Cray T3E Balanced cost/performance –Workstations – SPARCstations –Servers – SGI Origin, UltraSPARC –High-end PCs – Pentium quads Low cost/power –Low-end PCs, laptops, PDAs – mobile Pentiums

9/23/2004Lec Why Study Computer Architecture Aren’t they fast enough already? –Are they? –Fast enough to do everything we will EVER want? AI, protein sequencing, graphics –Is speed the only goal? Power: heat dissipation + battery life Cost Reliability Etc. Answer #1: requirements are always changing

9/23/2004Lec Why Study Computer Architecture Annual technology improvements (approx.) –Logic: density + 25%, speed +20% –DRAM (memory): density +60%, speed: +4% –Disk: density +25%, disk speed: +4% Designs change even if requirements are fixed. But the requirements are not fixed. Answer #2: technology playing field is always changing

9/23/2004Lec Example of Changing Designs Having, or not having caches –1970: 10K transistors on a single chip, DRAM faster than logic  having a cache is bad –1990: 1M transistors, logic is faster than DRAM  having a cache is good –2000: 600M transistors -> multiple level caches and multiple CPUs -> Multicore CPUs –Will software ever catch up?

9/23/2004Lec Performance Growth in Perspective Same absolute increase in computing power –Big Bang – 2001 –2001 – – 2001: performance improved 35,000X!!! –What if cars or planes improved at this rate?

9/23/2004Lec Measuring Performance Latency (response time, execution time) –Minimize time to wait for a computation Energy/Power consumption Throughput ( tasks completed per unit time, bandwidth ) –Maximize work done in a given interval –= 1/latency when there is no overlap among tasks –> 1/latency when there is In real processors there is always overlap (pipelining) Both are important (Architecture – Latency is important, Embedded system – Power consumption is important, and Network – Throughput is important)

9/23/2004Lec Performance Terminology “X is n times faster than Y’’ means: Execution time Y Execution time X = n “X is m% faster than Y’’ means: Execution time Y - Execution time X Execution time X = mX 100%

9/23/2004Lec Execution time w/o E (Before) Execution time w E (After) Compute Speedup – Amdahl’s Law Speedup is due to enhancement(E): Speedup (E) = Time Before Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task Execution time after Speedup(E) is unaffected, what is the Execution time after and Speedup(E) ? Time After

9/23/2004Lec Amdahl’s Law Execution time after Speedup(E) = ExTime before x [ (1-F) + FSFS ] = ExTime before ExTime after = 1 FSFS ] [ (1-F) +

9/23/2004Lec Amdahl’s Law – An Example Q: Floating point instructions improved to run 2X; but only 10% of execution time are FP ops. What is the execution time and speedup after improvement? Ans: F = 0.1, S = 2 ExTime after = ExTime before x [ (1-0.1) + 0.1/2 ] = 0.95 ExTime before Speedup = ExTime before ExTime after = = Read examples in the book!

9/23/2004Lec CPU Performance The Fundamental Law Three components of CPU performance: –Instruction count –CPI –Clock cycle time

9/23/2004Lec CPI - Cycles per Instruction Let Fi be the frequency of type I instructions in a program. Then, Average CPI: Example: average CPI = = 1.57 cycles/instruction

9/23/2004Lec Example (RISC Vs. CISC) Instruction mix of a RISC architecture. Add a register-memory ALU instruction format? One op. in register, one op. in memory The new instruction will take 2 cc but will also increase the Branches to 3 cc. Q: What fraction of loads must be eliminated for this to pay off?

9/23/2004Lec Solution Exec Time = Instr. Cnt. x CPI x Cycle time Instr.FiFi CPI i CPI i xF i IiIi CPI i CPI i xI i ALU.51.5-X1 Load X2.4-2X Store Branch Reg/Mem X22X 1.0CPI=1.51-X(1.7-X)/(1-X) Instr. Cnt old x CPI old x Cycle time old >= Instr. Cnt new x CPI new x Cycle time new 1.0 x 1.5 >= (1-X) x (1.7-X)/(1-X) X >= 0.2 ALL loads must be eliminated for this to be a win!

9/23/2004Lec Improve Memory System All instructions require an instruction fetch, only a fraction require a data fetch/store. –Optimize instruction access over data access Programs exhibit locality –Spatial Locality –Temporal Locality Access to small memories is faster –Provide a storage hierarchy such that the most frequent accesses are to the smallest (closest) memories. Cache Memory Disk/Tape Registers

9/23/2004Lec Benchmarks “program” as unit of work –There are millions of programs –Not all are the same, most are very different –Which ones to use? Benchmarks –Standard programs for measuring or comparing performance Representative of programs people care about repeatable!!

29 Choosing Programs to Evaluate Perf. Toy benchmarks –e.g., quicksort, puzzle –No one really runs. Scary fact: used to prove the value of RISC in early 80’s Synthetic benchmarks –Attempt to match average frequencies of operations and operands in real workloads. –e.g., Whetstone, Dhrystone –Often slightly more complex than kernels; But do not represent real programs Kernels –Most frequently executed pieces of real programs –e.g., livermore loops –Good for focusing on individual features not big picture –Tend to over-emphasize target feature Real programs –e.g., gcc, spice, SPEC89, 92, 95, SPEC2000 (standard performance evaluation corporation), TPCC, TPCD

9/23/2004Lec Networking Benchmarks: Netbench, Commbench, Applications: IP Forwarding, TCP/IP, SSL, Apache, SpecWeb Commbench: Execution Driven Simulators: Simplescalar – NepSim -

9/23/2004Lec MIPS and MFLOPS MIPS: millions of instructions per second: –MIPS = Inst. count/ (CPU time * 10**6) = Clock rate/(CPI*10 6 ) –easy to understand and to market –inst. set dependent, cannot be used across machines. –program dependent –can vary inversely to performance! (why? read the book) MFLOPS: million of FP ops per second. –less compiler dependent than MIPS. –not all FP ops are implemented in h/w on all machines. –not all FP ops have same latencies. –normalized MFLOPS: uses an equivalence table to even out the various latencies of FP ops.

9/23/2004Lec Performance Contd. SPEC CINT 2000, SPEC CFP2000, and TPC- C figures are plotted in Fig. 1.19, 1.20 and 1.22 for various machines. EEMBC Performance of 5 different embedded processors (Table 1.24) are plotted in Fig Also performance/watt plotted in Fig Fig.1.30 lists the programs and changes in SPEC89, SPEC92, SPEC95 and SPEC2000 benchmarks.