CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design

CET520 -- Gannod2 A bit of history 1945 – no stored-program computers Today, less than $1,000 will buy a personal computer more powerful than a computer bought in 1980 for $1 million. What has contributed to this rapid increase? –Technology –Computer Design

CET520 -- Gannod4 Course Emphasis “by 2001, the difference between the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15” In this course we will discuss: –Architectural design techniques used –Associated compiler improvements –Quantitative approach to computer design and analysis

CET520 -- Gannod5 Figure 1.3 FeatureDesktopServerEmbedded Price of system $1000- $10,000 $10,000- $10,000,000 $10-$100,000 Price of micr. proc. module $100-$1000$200-$2000$.20-$200 # sold per year150,000,0004,000,000300,000,000 System design issues price- performance, graphics performance throughput, availability, scalability price, power consumption, app-spec perf.

CET520 -- Gannod6 Definitions Instruction Set Architecture refers to visible instruction set. Implementation has 2 components: –Organization –Hardware Organization includes high-level aspects of design. –SPARC-2 and SPARC-20 have same ISA but different organizations. Hardware refers to specifics of a machine. –Two versions of Silicon Graphics Indy have same ISA, same organization, but different hardware (clock rate, cache structure). Architecture covers all 3 aspects of computer design.

CET520 -- Gannod7 Computer Design Architects must design computer under several constraints: –Price –Power –Performance –Functional requirements Application software often drives functional requirements (Figure 1.4, pg. 10) Architect must prioritize requirements and try to optimize design in light of all constraints.

CET520 -- Gannod8 Functional Requirements Application Area General purpose Scientific Commercial servers Embedded Computing Level of Software Compatibility Programming Lang Binary Operating System requirements Addr. Space Memory management Protection Standards Floating Point I/O bus OS Networks Programming Lang

CET520 -- Gannod9 Cost in a System Cabinet 6% Processor Board37% I/O devices37% Software20% -- processor is single most expensive item (22%) -- monitor is second most costly item (19%) Cost vs. Price -- cost is not the same as price -- price includes direct costs (labor, scrap, warranty), gross margin(R&D, marketing, sales, building rental, etc.), discounts -- Changing cost by $1000 could change price by $3000-$4000

CET520 -- Gannod10 Performance When we say one computer has better performance than another, what do we mean? Different people may mean different things: –Single user –Manager of large, multi-user system

CET520 -- Gannod11 Performance – key terms Execution time (response time) –time to execute a job from start to finish Wall-clock time (elapsed time) CPU time –user time –system time Throughput –number of jobs processed per time unit System performance refers to elapsed time on an unloaded system CPU performance refers to user CPU time on an unloaded system

CET520 -- Gannod12 Performance – key formulas Performance for completing task X is Allows us to compare performance of different machines on the same task. Execution time for program:

CET520 -- Gannod13 Improving Performance Improving performance is a hardware designers main goal. Given the previous formulas, how can a designer improve performance?

CET520 -- Gannod14 If only it were that simple... Unfortunately, these factors are NOT independent –Changing instruction set to lower the instruction count may lead to an organization with a slower clock cycle time… –small IC may not be fastest because complex instructions require more clock cycles. There are many tradeoffs when designing for better performance.

CET520 -- Gannod15 Performance Equation IC: is a function of the instruction set architecture and compiler technology. CPI: is primarily a function of implementation. CP: is primarily a function of the hardware technology.

CET520 -- Gannod16 Frequency vs. Period Execution time is based on the clock period. We are often given the clock rate (frequency) in MHz. What is MHz? What is relationship between rate (f) and clock period (cp)? Example –Clock rate 500 MHz –What’s the clock period?

CET520 -- Gannod17 Benchmarking -- key terms Workload: set of programs (instructions) that run on the computer system. Benchmarks: programs chosen specifically to measure performance. Workload is meant to predict typical performance. –Real Applications –Modified (or scripted) applications –Kernels – pieces of real programs –Toy benchmarks – small, well-known programs –Synthetic benchmarks – match average frequency of operations in real programs.

CET520 -- Gannod18 Benchmark suites A benchmark suite is a collection of benchmark programs that contain a variety of applications. –E.g., SPEC92 Advantage: weakness of one benchmark is lessened by presence of other benchmarks. When we have several benchmarks we need to summarize performance of entire suite to determine which system has better performance.

CET520 -- Gannod19 Summarizing Performance Typically we measure several different applications (not just 1) People like to have a single number to measure performance. Total Execution Time Weighted Execution Time Normalized Execution Time

CET520 -- Gannod20 Total Execution Time Simply add execution times of all benchmarks in suite. Prev Example: Arithmetic mean is closely related:

CET520 -- Gannod21 Weighted Execution Time Are all programs run an equal number of times? Weighted arithmetic mean: Prev Example: –w1 =.5w2 =.5 –w1 =.909w2 =.091 –w1=.999w2 =.001

CET520 -- Gannod22 Normalized ExT Normalize times to a reference machine and then summarize. Geometric mean of normalized times: Prev Example: –Normalize to A –Normalize to C

CET520 -- Gannod23 Pros and Cons of GM Pros: –GM is independent of running times of individual programs –Independent of base machine. Cons: –Does not predict execution time. –Encourages hardware and software designers to focus on benchmarks that are easiest to improved rather than the slowest ones.

CET520 -- Gannod24 Amdahl’s Law One important principle in computer design is: Make the common case fast. Amdahl’s Law defines the speedup that can be gained by using a particular feature: Not always possible to use enhancement:

CET520 -- Gannod25 Speedup speedup = old Ext / new Ext

CET520 -- Gannod26 Example 1 Implementations of FP square root vary in performance. Suppose FPSQR is responsible for 20% of execution time, and all FP operations are responsible for 50% of execution time. Proposal 1: add hardware to speed up FPSQR by factor of 10. Proposal 2: make all FP instructions run 2 times faster. Which proposal should we accept?

CET520 -- Gannod27 Solution Speedup FPSQR Speedup FP

CET520 -- Gannod28 Calculating CPI Calculate from Performance Equation: During design, we don’t know what ExT time is… Calculate CPI from detailed understanding of the architecture:

CET520 -- Gannod29 Example 2 Consider 3 compilers (1,2,3) for the same machine. The machine has 3 classes (A,B,C) of instructions with the following characteristics: ClassCPI A1 B2 C3 The clock rate is f=100MHz For a particular program, the compilers generate code with the following IC values (in millions of instructions) CompilerIC A IC B IC C 1621 2222 31011 Which compiler generates the fastest code?

CET520 -- Gannod30 Solution First, calculate the CPI for each compiler: Execution time for the 3 compilers:

CET520 -- Gannod31 Measuring Performance factors CP: –Easy to do after the machine is built! –During design, use timing estimators to measure critical paths in design IC: –Compilers, profilers –Often interested in finding instruction mix as well (simulators/execution-based monitoring) CPI: –Simplistic example we did previously not very accurate in modern processors (pipeline/cache effects) –CPI i = Pipeline CPI i + Memory CPI i

CET520 -- Gannod32 MIPS Million Instructions Per Second Suppose we want to compare performance of task on two different implementations of the SAME architecture. Same architecture => same IC Smaller sec/instr => Larger instr/sec

CET520 -- Gannod33 (native) MIPS Note: this measure is ONLY valid for comparing SAME program on SAME architecture!!! Consider two implementations: –CPI A = 1.5 –freq A = 400 MHz –CPI B = 1.8 –freq B = 500 MHz Which has better native MIPS rating?

CET520 -- Gannod34 Solution Native MIPS A Native MIPS B

CET520 -- Gannod35 MIPS, MOPS and other FLOPS MIPS : –Millions of Instructions Per Second MOPS: –Millions of Operations Per Second FLOPS –FLoating point Operations Per Second

CET520 -- Gannod36 MIPS is unreliable Recall Example #2 –f=100MHz CompilerIC A IC B IC C 1621 2222 31011 Find MIPS rating –C1: –C2: –C3: Conclusions using MIPS:

CET520 -- Gannod37 Memory Hierarchy Computer systems usually have several different types of memory organized in a hierarchy. WHY? CPU Registers Cache Memory disk

CET520 -- Gannod38 Locality Locality of reference: programs tend to reuse data and instructions that they have used before (recently) Two types of locality –Temporal: recently axccessed items are likely to be accessed in the near future –Spatial: items whose addresses are near one another tend to be referenced close together in time. Do we expect instructions or data to have a higher degree of locality?

CET520 -- Gannod39 Key Terms Cache hit: CPU finds requested data in cache Cache miss: requested data not in cache. Miss rate: fraction of cache accesses that result in miss Block: amount of data transferred between cache and memory Miss penalty: extra time taken to get requested cache block into cache. Page fault: requested data is not in memory.

CET520 -- Gannod40 Example 4 Suppose cache is 10 times faster than main memory and that cache can be used 90% of the time. How much speedup do we gain from using the cache? Using Amdahl’s Law:

CET520 -- Gannod41 CPU ExT and the Memory Hierarchy Need to expand our definition of CPU ExT:

CET520 -- Gannod42 Example 5 A machine has CPI 2 when all memory accesses hit cache. 40% of the instructions access data. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the machine be if the hit rate was 100%? ExT ideal = ExT cache =

CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

Similar presentations

Presentation on theme: "CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

Similar presentations

Presentation on theme: "CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design."— Presentation transcript:

Similar presentations

About project

Feedback