Chapter 1 Fundamentals of Computer Design

Name: Chapter 1 Fundamentals of Computer Design
Uploaded: 2017-07-10T19:02:54+00:00
Duration: PTM35S19
Description: Chapter 1 Fundamentals of Computer Design

Chapter 1 Fundamentals of Computer Design
EEF011 Computer Architecture 計算機結構 Chapter 1 Fundamentals of Computer Design 吳俊興高雄大學資訊工程學系 September 2004

Chapter 1. Fundamentals of Computer Design
1.1 Introduction 1.2 The Changing Face of Computing and the Task of the Computer Designer 1.3 Technology Trends 1.4 Cost, Price, and Their Trends 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design 1.7 Putting It All Together 1.8 Another View 1.9 Fallacies and Pitfalls 1.10 Concluding Remarks

Computer Architecture – 1.1 Introduction
Rapid rate of improvement in the roughly 55 years the technology used to build computers innovation in computer design Beginning in about 1970,computer designers became largely dependent upon IC technology Emergence of microprocessor: late 1970s C: virtual elimination of assembly language programming (portability) UNIX: standardized, vendor-independent OS Lowered the cost and risk of bringing out a new architecture

1.1 Introduction (cont.) Development of RISC (Reduced Instruction Set Computer): early 1980s Focused the attention of designers on The exploitation of instruction-level parallelism Initially through pipelining, and later through multiple instruction issue The use of caches initially in simple forms and later using more sophisticated organizations and optimizations

Significantly enhanced the capability available to users
Figure 1.1: 20 years of sustained growth in performance at an annual rate of over 50% (two different versions of SPEC, i.e., SPEC92 and SPEC95) Effects Significantly enhanced the capability available to users Dominance of microprocessor-based computers across the entire range of computer design Minicomputers and mainframes replaced Supercomputers built with collections of microprocessors

1.2 Changing Face of Computing
1960s: dominated by mainframes business data processing, large-scale scientific computing 1970s: the birth of minicomputers scientific laboratories, time-sharing multiple-user environment 1980s: the rise of the desktop computer (personal computer and workstation) 1990s: the emergence of the Internet and the World Wide Web handheld computing devices emergence of high-performance digital consumer electronics

Segmented Markets Desktop computing: to optimize price-performance
the largest market in dollar terms Web-centric, interactive applications Servers Web-based services, replacing traditional mainframe Availability, Scalability, Throughput Embedded Computers Features The presence of the computers is not immediately obvious The application can usually be carefully tuned for the processor and system Widest range of processing power and cost price is a key factor in the design of computers Real-time performance requirement Key characteristics the need to minimize memory and power The use of processor cores together with application-specific circuitry

Task of Computer Designer
Tasks Determine what attributes are important for a new machine, then design a machine to maximize performance while staying within cost and power constraints Efforts instruction set design, functional organization, logic design, and implementation (IC design, packaging, power, and cooling) Design a computer to meet functional requirements as well as price, power, and performance goals (see Figure 1.4)

Task of a Computer Designer
Evaluate Existing Systems for Bottlenecks Simulate New Designs and Organizations Implement Next Generation System Technology Trends Benchmarks Workloads Implementation Complexity Effect of trends Designer often needs to design for the next technology because during a single product cycle (typically 4-5 years), key technologies, such as memory, change sufficiently that the designer must plan for these changes.

1.3 Technology Trends The designer must be especially aware of rapidly occurring changes in implementation technology Integrated circuit logic technology transistor density increases by about 35% per year die size are less predictable and slower combined effect: growth rate in transistor count about 55% per year Semiconductor DRAM density increases by between 40% and 60% per year cycle time decreased by about one-third in 10 years bandwidth per chip increases about twice Magnetic disk technology Network technology

Moore’s Law Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors that could be crammed on a chip doubles every year.

Technology Trends Capacity Speed (latency)
Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changed in architecture, Operating Systems, and Programming practices. Capacity Speed (latency) Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 2x in 10 years Disk 4x in 3 years 2x in 10 years

1.4 Cost, Price, and Their Trends
Price: what you sell a finished good 1999: more than half the PCs sold were priced at less than $1000 Cost: the amount spent to produce it, including overhead

Cost and Its Trend Time Volume Commodities
Learning curve: manufacturing cost reduces over time Yield: the percentage of the manufactured devices that survive the testing procedure As the technology matures over time, the yield improves and hence, things get cheaper Volume Increasing volume decreases cost (time for learning curve), Increases purchasing and manufacturing efficiency Commodities Products are sold by multiple vendors in large volumes and are essentially identical The low-end business such as DRAMs, disks, monitors Improved competition  reduced cost

Prices of six generations of DRAMs
Between the start of a project and the shipping of a produce, say, two years, the cost of a new DRAM drops by a factor of between 5 and 10 in constant dollars.

Price of an Intel Pentium III at a Given Frequency
It decreases over time as yield enhancements decrease the cost of a good die and competition forces price reductions.

Cost and Its Trend (Cont.)
Time Learning curve: manufacturing cost reduces over time Yield: the percentage of the manufactured devices that survive the testing procedure As the technology matures over time, the yield improves and hence, things get cheaper Volume Increasing volume decreases cost (time for learning curve), Increases purchasing and manufacturing efficiency Commodities Products are sold by multiple vendors in large volumes and are essentially identical The low-end business such as DRAMs, disks, monitors Improved competition  reduced cost Volume: 熟能生巧 Commodity Components

Wafer and Dies Exponential cost decrease – technology basically the same: A wafer is tested and chopped into dies that are packaged Figure 1.8. This wafer contains 564 MIPS64 R20K processors in 0.18 Figure 1.7. Intel Pentium 4 microprocessor Die

Cost of an Integrated Circuit (IC)
Cost of IC = (A greater portion of the cost that varies between machines) (sensitive to die size) # of dies along the edge Today’s technology:   4, detect density 0.4 and 0.8 per cm2

Cost of an IC (cont.)

Cost Example in a $1000 PC in 2001

1.5 Measuring and Reporting Performance
Designing high performance computers is one of the major goals of any computer architect. As a result, assessing the performance of computer hardware is the at the heart of computer design and greatly affects the demand and market value of the computer. However, measuring performance of a computer system is not a straightforward task: Metrics – How do we describe in a numerical way the performance of a computer? What tools do we use to find those metrics? How do we summarize the performance?

What is the computer user interested in?
Reduce the time to run certain task Execution time (response time) The time between the start and the completion of an event Increase the tasks per week, day, hour, sec, ns … Throughput The total amount of work done in a given time

Example Do the following changes to a computer system increase throughput, reduce response time, or both? Replacing the processor in a computer with a faster version Adding additional processors to a system that uses multiple processors for separate tasks –for example handling an airline reservation system Answer Both response time and throughput are improved Only throughput increases

Execution Time and Performance Comparison
In this subject, we will be primarily interested in execution time as a measure of performance The relationship between performance and execution time on a computer X (reciprocal) is defined as follows: To compare design alternatives, we use the following equation: “X is n times faster than Y” or “the throughput of X is n times higher than Y” means that the execution time is n times less on X than Y To maximize performance of an application, we need to minimize its execution time

Measure Performance – user CPU time
Response time may include disk access, memory access, input/output activities, CPU event and operating system overhead - everything In order to get an accurate measure of performance, we use CPU time instead of using response time CPU time is the time the CPU spends computing a program and does not include time spent waiting for I/O or running other programs CPU time can also be divided into user CPU time (program) and system CPU time (OS) Key in UNIX command time, we have, 90.7u 12.9s % (user CPU, system CPU, total response,%) In our performance measures, we use user CPU time – because of its independence on the OS and other factors

Programs for Measuring Performance
Real applications: text processing software (Word), compliers (C), and other applications like Photoshop – have inputs, outputs, and options when the user wants to use them One major downside: Real applications often encounter portability problems arising from dependences on OS or complier Modified (or scripted) applications: Modification is to enhance portability or focus on one particular aspect of system performance. Scripts are used to simulate the interaction behaviors Kernels: small, key pieces from real programs. Typically used to evaluate individual features of the machine Toy benchmarks: typically between 10 and 100 lines of code and produce a known result Synthetic benchmarks: artificially created code to match an average execution profile

Benchmark Suites They are a collection of programs (workload) that try to explore and capture all the strengths and weaknesses of a computer system (real programs, kernels) A key advantage of such suites is that the weakness of any one benchmark is lessened by the presence of the other benchmarks Good vs. Bad benchmarks Improving product for real programs vs. improving product for benchmarks to get more sales If benchmarks are inadequate, then sales wins!

SPEC: Standard Performance Evaluation Cooperation
SPEC Benchmarks SPEC: Standard Performance Evaluation Cooperation Most successful attempts and widely adopted First generation 1989 10 programs yielding a single number (“SPECmarks”) Second generation 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Unlimited compiler flags Third generation 1995 New set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) Single flag setting for all programs: SPECint_base95, SPECfp_base95 “benchmarks useful for 3 years” SPEC CPU 2000 CINT2000 (11 integer programs) and CFP2000 (14 floating point programs)

CINT2000 (Integer Component of SPEC CPU2000)
SPEC Benchmarks CINT2000 (Integer Component of SPEC CPU2000) Program Language What Is It 164.gzip C Compression 175.vpr C FPGA Circuit Placement and Routing 176.gcc C C Programming Language Compiler 181.mcf C Combinatorial Optimization 186.crafty C Game Playing: Chess 197.parser C Word Processing 252.eon C++ Computer Visualization 253.perlbmk C PERL Programming Language 254.gap C Group Theory, Interpreter 255.vortex C Object-oriented Database 256.bzip2 C Compression 300.twolf C Place and Route Simulator

CFP2000 (Floating Point Component of SPEC CPU2000)
SPEC Benchmarks CFP2000 (Floating Point Component of SPEC CPU2000) Program Language What Is It 168.wupwise Fortran 77 Physics / Quantum Chromodynamics 171.swim Fortran 77 Shallow Water Modeling 172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field 173.applu Fortran 77 Parabolic / Elliptic Differential Equations 177.mesa C 3-D Graphics Library 178.galgel Fortran 90 Computational Fluid Dynamics 179.art C Image Recognition / Neural Networks 183.equake C Seismic Wave Propagation Simulation 187.facerec Fortran 90 Image Processing: Face Recognition 188.ammp C Computational Chemistry 189.lucas Fortran 90 Number Theory / Primality Testing 191.fma3d Fortran 90 Finite-element Crash Simulation 200.sixtrack Fortran 77 High Energy Physics Accelerator Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution

More Benchmarks TPC: Transaction Processing Council
Measure the ability of a system to handle transactions, which consist of database accesses and updates. Many variants depending on transaction complexity TPC-A: simple bank teller transaction style TPC-C: complex database query EDN Embedded Microprocessor Benchmark Consortium (EEMBC, “embassy”) 34 kernels in 5 classes 16 automotive/industrial; 5 consumer; 3 networking; 4 office automation; 6 telecommunications

How to Summarize Performance How would you report these results?
Management would like to have one number Technical people want more: They want to have evidence of reproducibility – there should be enough information so that you or someone else can repeat the experiment There should be consistency when doing the measurements multiple times How would you report these results? Computer A Computer B Computer C Program P1 (secs) 1 10 20 Program P2 (secs) 1000 100 Total Time (secs) 1001 110 40

Comparing and Summarizing Performance
Comparing the performance by looking at individual programs is not fair Total execution time: a consistent summary measure Arithmetic Mean – provides a simple average Timei: execution time for program i in the workload Doesn’t account for weight: all programs are treated equal

Weighted Variants What is the proper mixture of programs for the workload? Weight is a weighting factor to each program to indicate the relative frequency of the program in the workload: % of use Weighted Arithmetic Mean Weighti: frequency of program i in the workload May be better but beware the dominant program time

Example: Figure 1.16 Weighted Arithmetic Mean = A B C W(1) W(2) W(3)
Program P1 (secs) 1 10 20 0.5 0.909 0.999 Program P2 (secs) 1000 100 0.091 0.001 Arithmetic Mean 500.5 55 Weighted Arithmetic Mean (1) Weighted Arithmetic Mean (2) 91.82 18.18 Weighted Arithmetic Mean (3) 2 10.09 Weighted Arithmetic Mean =

Normalized Time Metrics
Normalized execution time metrics Measure the performance by normalizing it to a reference machine: Execution time ratioi Geometric Mean Geometric mean is consistent no matter which machine is the reference The arithmetic mean should not be used to average normalized execution times However, geometric mean still doesn’t form accurate predication models (doesn’t predict execution time). It encourages designers to improve the easiest benchmarks rather the slowest benchmarks (due to no weighting)

Example A B C W(1) Program P1 (secs) 1 10 20 100/101 Program P2 (secs) 1000 100 1/101 Normalized to A 0.1 0.02 Geometric Mean 0.63 Arithmetic Mean 500.5 55 Weighted Arithmetic Mean (1) 10.89 Machines A and B have the same performance according to the Geometric Mean measure, yet this would only be true for a workload that P1 runs 100 times more than P2 according to the Weighted Arithmetic Mean measure

1.6 Quantitative Principles of Computer Design
Already known how to define, measure and summarize performance, then we can explore some of the principles and guidelines in design and analysis of computers. Make the common case fast In making a design trade-off, favor the frequent case over the infrequent case Improving the frequent event, rather than the rare event, will obviously help performance Frequent case is often simpler and can be done faster than the infrequent case We have to decide what the frequent case is and how much performance can be improved by making the case faster

Two equations to evaluate design alternatives
Amdahl’s Law Amdahl’s Law states that the performance improvement to be gained from using some fast mode of execution is limited by the fraction of the time the faster mode can be used Amdahl’s Law defines the speedup that can be gained by using a particular feature The performance gain that can be obtained by improving some porting of a computer can be calculated using Amdahl’s Law The CPU Performance Equation Essentially all computers are constructed using a clock running at a constant rate CPU time then can be expressed by the amount of clock cycles

Amdahl's Law Speedup due to enhancement E:
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected This fraction enhanced Fractionenhanced: the fraction of the execution time in the original machine that can be converted to take advantage of the enhancement Speedupenhanced: the improvement gained by the enhanced execution mode

Amdahl's Law ExTimenew = ExTimeold x (1 - Fractionenhanced) +
Speedupenhanced ExTimeold ExTimenew 1 Speedupoverall = = Fractionenhanced (1 - Fractionenhanced) + Speedupenhanced This fraction enhanced ExTimeold ExTimenew

Amdahl's Law Example: Floating point (FP) instructions are improved to run faster by a factor of 2; but only 10% of actual instructions are FP. What’s the overall speedup gained by this improvement? Answer Speedupoverall = 1 0.95 1.053 ExTimenew = ExTimeold x ( /2) = 0.95 x ExTimeold Amdahl’s Law can serve as a guide to how much an enhancement will improve performance and how to distribute resource to improve cost-performance It is particularly useful for comparing performances both of the overall system and the CPU of two design alternatives More example: page 41 and page 42

CPU Performance All computers are constructed using a clock to operate its circuits Typically measured by two basic metrics Clock rate – today in MHz and GHz Clock cycle time: clock cycle time = 1/clock rate E.g., 1 GHz rate corresponds to 1 ns cycle time Thus the CPU time for a program is given by: Or,

CPU Performance More typically, we tend to count # instructions executed, known as Instruction Count (IC) CPI: average number of clock cycles per instruction Hence, alternative method to get the CPU time Depends on IS of the computer and its compiler CPU performance is equally dependent upon 3 characteristics: clock cycle time, CPI, IC. They are independent and making one better often makes another worse because the basic technologies involved in changing each characteristics are interdependent. Clock cycle time: hardware technology and organization CPI: organization and instruction set architecture IC: instruction set architecture and complier technology

CPU Performance Example: Suppose we have 2 implementations of the same instruction set architecture. Computer A has a clock cycle time of 10 nsec and a CPI of 2.0 for some program, and computer B has a clock cycle time of 20 nsec and a CPI of 1.2 for the same program. What machine is faster for this program? Answer Assume the program require IN instructions to be executed: CPU clock cycleA = IN x 2.0 CPU clock cycleB = IN x 1.2 CPU timeA = IN x 2.0 x 10 = 20 x IN nsec CPU timeB = IN x 1.2 x 20 = 24 x IN nsec So computer A is faster than computer B.

# times instruction i is executed
CPU Performance Often the overall performance is easy to deal with on a per instruction set basis The overall CPI can be expressed as: # times instruction i is executed CPI for instruction i

Cycles Per Instruction
Example: Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type. Base Machine (Reg / Reg) Instruction Freq CPI (% Time) ALU 50% 1 (33%) Load 20% 2 (27%) Store 10% 2 (13%) Branch 20% 2 (27%) (Total CPI ) How do we get total CPI? How do we get %time?

CPU Performance So design 2 is better
Example: Suppose we have made the following measurements: Frequency of FP operations (other than FPSQR) = 25% Avg. CPI of FP operations = Avg. CPI of other instructions = 1.33 Frequency of FPSQR = 2% CPI of FPSQR = 20 Compare two designs: 1) decrease CPI of FPSQR to 2; 2) decrease the avg. CPI of all FP operations to 2.5. Answer First, find the original CPI: The CPI with design 1: The CPI with design 2: So design 2 is better

Principle of Locality Other important fundamental observations come from the properties of programs. Principle of locality: Programs tend to reuse data and instructions they have used recently. There are two different types of locality: (Ref. Chapter 5) Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.) Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close one another tend to be referenced soon (straight line code, array access, etc.) We can predict with reasonable accuracy what instructions and data a program will use in the near future based on its accesses in the past. A program spends 90% of its execution time in only 10% of the code.

Some “Misleading” Performance Measures
There are certain computer performance measures which are famous with computer manufactures and sellers – but may be misleading. MIPS (Million Instructions Per Second) MIPS depends on the instruction set to make it difficult to compare MIPS of different instructions. MIPS varies between programs on the same computer – different programs use different instruction mix. MIPS can vary inversely to performance –most importantly.

Some “Misleading” Performance Measures
MFLOPS: Focus on one type of work MFLOPS (Million Floating-point Operations Per Second) depends on the program. Must be FP intensive. MFLOPS depends on the computer as well. The floating-point operations vary in complexity (e.g., add & divide). Peak Performance: Performance that the manufacture guarantees you won’t exceed Difference between peak performance and average performance is huge. Peak performance is not useful in predicting observed performance.

Summary 1.1 Introduction 1.2 The Changing Face of Computing and the Task of the Computer Designer 1.3 Technology Trends 1.4 Cost, Price, and Their Trends 1.5 Measuring and Reporting Performance Benchmarks (Weighted) Arithmetic Mean, Normalized Geometric Mean 1.6 Quantitative Principles of Computer Design Amdahl’s Law CPU Performance: CPU Time, CPI Locality

Chapter 1 Fundamentals of Computer Design

Similar presentations

Presentation on theme: "Chapter 1 Fundamentals of Computer Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 1 Fundamentals of Computer Design

Similar presentations

Presentation on theme: "Chapter 1 Fundamentals of Computer Design"— Presentation transcript:

Similar presentations

About project

Feedback