Chapter 1 Fundamentals of Computer Design

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

Computer Abstractions and Technology
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Chapter 4 Assessing and Understanding Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Computer Architecture- 1 Ping-Liang Lai ( 賴秉樑 ) Chapter 1 Fundamentals of Computer Design Computer Architecture 計算機結構.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
CS 6461: Computer Architecture Fall 2013 History and Trends Instructor: Morris Lancaster.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
The University of Adelaide, School of Computer Science
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Chapter 1 Technology Trends and Performance. Chapter 1 — Computer Abstractions and Technology — 2 Technology Trends Electronics technology continues to.
Morgan Kaufmann Publishers
CPU Performance using Different Parameters CS 250: Andrei D. Coronel, MS,CEH,PhD Cand.
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
CS203 – Advanced Computer Architecture
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
Computer Architecture & Operations I
Morgan Kaufmann Publishers Technology Trends and Performance
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Computer Architecture & Operations I
Lecture 3: MIPS Instruction Set
September 2 Performance Read 3.1 through 3.4 for Tuesday
How do we evaluate computer architectures?
Defining Performance Which airplane has the best performance?
Chapter1 Fundamental of Computer Design
Uniprocessor Performance
Morgan Kaufmann Publishers
Fundamentals of Computer Design - Trends and Performance
Architecture & Organization 1
Chapter 1 Fundamentals of Computer Design
CS2100 Computer Organisation
Defining Performance Section /14/2018 9:52 PM.
Computer Performance He said, to speed things up we need to squeeze the clock.
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
CMSC 611: Advanced Computer Architecture
Fundamentals of Computer Design - Trends and Performance
The University of Adelaide, School of Computer Science
Computer Evolution and Performance
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
CS 704 Advanced Computer Architecture
Computer Performance Read Chapter 4
The University of Adelaide, School of Computer Science
Utsunomiya University
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Fundamentals of Computer Design
CS2100 Computer Organisation
Presentation transcript:

Chapter 1 Fundamentals of Computer Design Computer Architecture 計算機結構 Chapter 1 Fundamentals of Computer Design Ping-Liang Lai (賴秉樑)  

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

1.1 Introduction Old Conventional Wisdom: Power is free, Transistors expensive New Conventional Wisdom: “Power wall” Power expensive, Xtors free (Can put more on chip than can afford to turn on) Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …) New CW: “ILP wall” law of diminishing returns on more HW for ILP Old CW: Multiplies are slow, Memory access is fast New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply) Old CW: Uniprocessor performance 2X / 1.5 yrs New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall Uniprocessor performance now 2X / 5(?) yrs  Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years) More simpler processors are more power efficient

Crossroads: Uniprocessor Performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

1.2 Classes of Computers Desktop Computing: optimize price-performance. Servers: provide larger-scale and more reliable file and computing services. Embedded Computers: real-time performance requirement. Feature Desktop Server Embedded Price of system $500-$5,000 $5,000-$5,000,000 $10-$100,000 (including network routers at the high end) Price of microprocessor module $50-$500 (per processor) $200-$10,000 (per processor) $0.01-$100 (per processor) Critical system design issues Price-performance, graphics performance Throughput, availability, scalability Price, power consumption, application-specific performance

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

Instruction Set Architecture: Critical Interface software instruction set hardware Properties of a good abstraction Lasts through many generations (portability) Used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels CS252 S05

Instruction Set Architecture (ISA) ISA is the actual programmer-visible instruction set. Class of ISA; Memory addressing; Addressing modes; Types and sizes of operands; Operations; Control flow instructions; Encoding on ISA.

Organization, Hardware, and Architecture Organization: includes the high-level aspects of a computer’s design. Memory system, the memory interconnect, and the design of the internal processor or CPU (arithmetic, logic, branching, and data transfer). For example: AMD Opteron 64 and Intel P4 have same ISA, but they have different internal pipeline and cache organizations. Hardware: detailed logic design and the packaging technology. For example, P4 and Mobile P4 have same ISA and organization, but they have different clock frequency and memory system. Architecture: covers all three aspects of computer design – instruction set architecture, organization, and hardware. Designer must meet functional requirements as well as price, power, performance, and availability goals.

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

1.4 Trends in Technology A successful new ISA may last decades, for example, IBM mainframe. Four critical technologies Integrated circuit logic technology: transistor density increased by about 35% per year, quadrupling in somewhat over four years; Semiconductor DRAM (Dynamic Random-Access Memory): capacity increases by about 40% per year, doubling roughly every two years; Magnetic disk technology: roller coaster of rates, disk are 50-100 times cheaper per bit than DRAM (chapter 6). Network technology: network performance depends both on the performance of switches and transmission.

Performance Trends: Bandwidth over Latency Bandwidth or throughput: the total amount of work done in a given time. Such as megabyte per second for a disk transfer. Latency or response time: the time between the start and the completion of an event. Such as milliseconds for a disk access.

Scaling of Transistor Performance and Wires Feature size: the minimum size of a transistor or a wire in either the x or y dimension. From 10 microns in 1971 to 0.09 microns (90 nm) in 2006; The density of transistors increases quadratically with a linear decrease in feature size; Transistor performance improves linearly with decreasing feature size; Since improvement in transistor density, thus CPU move quickly from 4-bit to 8-bit, to 16-bit, to 32-bit microprocessors; However, the signal delay for a wire increases in proportion to the production of its resistance and capacitance.

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

Power in IC (1/3) Power also provides challenges as devices are scaled. Dynamic power (watts, W)in CMOS chip: the traditional dominant energy consumption has been in switching transistors. For mobile devices: they care about battery life more than power, so energy is the proper metric, measured in joules: In modern VLSI, the exact power measurement is the sum of, Powertotal=Powerdynamic+Powerstatic+Powerleakage Hence, lower voltage can reduce Powerdynamic and Energydynamic greatly. (In the past 20 years, supply voltage is from 5V down to 1V)

Power in IC (2/3) Example 1 (p.22): Some microprocessor today are design to have adjustable voltage, so that a 15% reduction in voltage may result in a 15% reduction in frequency. What would be the impact on dynamic power? Answer Since the capacitance is unchanged, the answer is the ratios of the voltages and frequencies: thereby reducing power to about 60% of the original.

Power in IC (3/3) As we move from one process to the next, (60 nm or 45 nm…) Transistor switching and frequency ↑; Capacitance and voltage ↓; However, power consumption and energy ↑. Static power: an important issue because leakage current flows even when a transistor is off: Thus, transistor ↑, power ↑; Feature size ↓, power ↑ (why? You can find out in VLSI area).

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

Silicon Wafer and Dies Exponential cost decrease – technology basically the same: A wafer is tested and chopped into dies that are packaged. dies along the edge Die (晶粒) Wafer (晶圓) AMD K8, source: http://www.amd.com

Cost of an Integrated Circuit (IC) (A greater portion of the cost that varies between machines) (sensitive to die size) (# of dies along the edge) Today’s technology:   4.0, defect density 0.4 ~ 0.8 per cm2

Examples of Cost of an IC Example 1 (p.22): Find the number of dies per 300 mm (30 cm) wafer for a die that is 1.5 cm on a side. The total die area is 2.25 cm2. Thus Example 2 (p.24): Find the die yield for dies that are 1.5 cm on a side and 1.0 cm on a side, assuming a defect density of 0.4 per cm2and α is 4. The total die areas are 2.25 cm2 and 1.00 cm2. For the large die the yield is For the small die, it is

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

Response Time, Throughput, and Performance Response time (反應時間): the time between the start and the completion of an event – also referred to as execution time. The computer user is interested. Throughput (流通量): the total amount of work done in a given time. The administrator of a large data processing center may be interested. In comparing design alternatives, The phrase “X is faster than Y” is used here to mean that the response time or execution time is lower on X than on Y. In particular, “X is n times faster than Y” or “the throughput of X is n times higher than Y” will mean

Performance Measuring Execution is the reciprocal of performance,

Reliable Measure – User CPU Time Response time may include disk access, memory access, input/output activities, CPU event and operating system overhead – everything… In order to get an accurate measure of performance, we use CPU time instead of using response time. CPU time is the time the CPU spends computing a program and does not include time spent waiting for I/O or running other programs. CPU time can also be divided into user CPU time (program) and system CPU time (OS). Key in UNIX command time, we have, 90.7s 12.9s 2:39 65% (user CPU, system CPU, total response,%). In our performance measures, we use user CPU time – because of its independence on the OS and other factors.

Outline 1.1 Introduction 1.2 Classes of Computers 1.3 Defining Computer Architecture 1.4 Trends in Technology 1.5 Trends in Power in Integrated Circuits 1.6 Trends in Cost 1.7 Dependability 1.8 Measuring, Reporting, and Summarizing Performance 1.9 Quantitative Principles of Computer Design 1.10 Putting It All Together: Performance and Price-Performance

Four Useful Principles of CA Design Take advantage of parallelism One most important methods for improving performance. System level parallelism and Individual processor level parallelism. Principle of Locality The properties of programs. Temporal locality and Spatial locality. Focus on the common case For power, resource allocation and performance. Amdahl’s law “The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.”

Two Equations to Evaluate Alternatives Amdahl’s Law The performance gain that can be obtained by improving some porting of a computer can be calculated using Amdahl’s Law. Amdahl’s Law defines the speedup that can be gained by using a particular feature. The CPU Performance Equation Essentially all computers are constructed using a clock running at a constant rate. CPU time then can be expressed by the amount of clock cycles.

This fraction enhanced Amdahl's Law (1/5) Speedup is the ratio Alternatively, Two major reasons of Speedup enhancement Fractionenhanced: the fraction of the execution time in the original machine that can be converted to take advantage of the enhancement (≦1). Speedupenhanced: the improvement gained by the enhanced execution mode (≧1). This fraction enhanced

This fraction enhanced Amdahl's Law (2/5) Thus, Execution Timeoverall = the time of the unenhanced portion of the machine + the time spent using the enhancement, i.e. that is, This fraction enhanced ExTimeold ExTimenew

Amdahl's Law (3/5) Example 3 (p.40): Suppose that we want to enhance the processor used for Web serving. The new processor is 10 times faster on computation in the Web serving application than the original processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement? Answer Fractionenhanced = 0.4, Speedupenhanced = 10 Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost-performance.

Amdahl's Law (4/5) Example 4 (p.40): A common transformation required in graphics processors is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSOR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives. Answer We can compare these two alternatives by comparing the speedups: Improving the performance of the FP operations overall is slightly better because of the higher frequency.

Amdahl's Law (5/5) Example 5 (p.41): The calculation of the failure rates of the disk subsystem was Therefore, the fraction of the failure rate that could be improved is 5 per million hours out of 23 for the whole system, or 0.22. Answer The reliability improvement would be Despite an impressive 4150X improvement in reliability of one module, from the system’s perspective, the change has a measurable but small benefit.

CPU Performance (1/5) Essentially all computers are constructed using clock (all called ticks, clock ticks, clock periods, clocks, cycles, or clock cycles) running at a constant rate. Clock rate: today in GHz Clock cycle time: clock cycle time = 1/clock rate Ex. 1 GHz clock rate = 1 ns cycle time Thus, the CPU time for a program can be expressed two ways: Or,

CPU Performance (2/5) We can also count the number of instructions executed – the instruction path length or instruction count (IC). If we know the number of clock cycles and IC, then the average number of clock cycles per instruction (CPI). CPI is computed as Thus, clock cycles can be defined as IC × CPI, this allows us to use CPI in the execution time formula: This figure provides insight into different styles of instruction sets and implementations.

CPU Performance (3/5) The pieces fit together of CPU time A α% improvement in any one of three pieces leads to a α% improvement in CPU time. Unfortunately, it is difficult to change one parameter in complete isolation form others, because the technologies of them are interdependent: Clock cycle time: Hardware technology and organization; CPI: Organization and instruction set architecture; Instruction count: Instruction set architecture and compiler technology. Processor performance is dependent upon three characteristics: instruction count, clock cycles per instruction and clock cycle (or rate). Computer architecture is focus on CPI and IC parameters.

CPU Performance (4/5) To calculate the number of total processor clock cycles as To express CPU time again And overall CPI as ICi: the number of times instruction i is executed in a program. CPIi: the average number of clocks per instruction for instruction i. It is useful in designing the processor. Hint: CPIi should be measured because pipeline effects, cache misses, and any other memory system inefficiencies. ICi/IC presents the fraction of occurrences of that instruction in a program.

CPU Performance (5/5) Example 6 (p.43): Suppose we have made the following measurements: Frequency of FP operations = 25%, Average CPI of FP operations =4.0, Average CPI of other instructions = 1.33, Frequency of FPSQR = 2%, CPI of FPSQR =20. Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the processor performance equation. Answer First, observe that only the CPI changes; the clock rate and instruction count remain identical. We start by finding the original CPI with neither enhancement; We can compute the CPI for the enhanced FPSQR by subtracting the cycles saved from the original CPI:

Amdahl's Law vs. CPU Performance CPU performance equation is better than Amdahl’s Law Possible to measure the constituent parts; To measure the fraction of execution time for which a set of instructions is responsible; For an existing processor, to measure execution time and clock speed is easy; The challenge lies in discovering the instruction count or the CPI. Most new processors include counter for both instructions executed and for clock cycles.