Download presentation
Presentation is loading. Please wait.
Published byAnabel Bryan Modified over 9 years ago
1
Compsci 001 4.1 Today’s topics l Performance & Computer Architecture Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 1997. http://computer.howstuffworks.com/pc.htm http://computer.howstuffworks.com/pc.htm l Slides from Alvy Lebeck, Duke CS Marti Hearst, UC Berkeley SIMS David Patterson, UC Berkeley CS Mounir Hamdi, HKUST CS l Upcoming Complexity
2
Compsci 001 4.2 Performance l Performance= 1/Time The goal for all software and hardware developers is to increase performance l Metrics for measuring performance (pros/cons?) Elapsed time CPU time Instruction count (RISC vx. CISC) Clock cycles per instruction Clock cycle time MIPS vs. MFLOPS Throughput (tasks/time) Other more subjective metrics? l What kind of workload to be used? Applications, kernels and benchmarks (toy or synthetic)
3
Compsci 001 4.3 What is Realtime? l Response time Panic How to tell “I am still computing” Progress bar l Flicker Fusion frequency l Update rate vs. refresh rate Movie film standards (24 fps projected at 48 fps) l Interactive media Interactive vs. non-interactive graphics computer games vs. movies animation tools vs. animation Interactivity real-time systems system must respond to user inputs without any perceptible delay (A Primary Challenge in VR)
4
Compsci 001 4.4 The Big Picture Control Datapath Memory Processor Input Output l Since 1946 all computers have had 5 components The Von Neumann Machine l What is computer architecture? Computer Architecture = Machine Organization + Instruction Set Architecture +...
5
Compsci 001 4.5 Fetch, Decode, Execute Cycle l Computer instructions are stored (as bits) in memory l A program’s execution is a loop Fetch instruction from memory Decode instruction Execute instruction l Cycle time Measured in hertz (cycles per second) 2 GHz processor can execute this cycle up to 2 billion times a second Not all cycles are the same though…
6
Compsci 001 4.6 Organization Logic Designer's View ISA Level FUs & Interconnect l Capabilities & Performance Characteristics of Principal Functional Units (Fus) (e.g., Registers, ALU, Shifters, Logic Units,...) l Ways in which these components are interconnected l Information flows between components l Logic and means by which such information flow is controlled. l Choreography of FUs to realize the ISA
7
Compsci 001 4.7 Instruction Set Architecture... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964SOFTWARE -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Set -- Instruction Formats -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions
8
Compsci 001 4.8 The Instruction Set: a Critical Interface instruction set l What is an example of an Instruction Set architecture?
9
Compsci 001 4.9 Forces on Computer Architecture Computer Architecture Technology Programming Languages Operating Systems History Applications Cleverness
10
Compsci 001 4.10 Technology l In ~1985 the single-chip processor (32-bit) and the single-board computer emerged => workstations, personal computers, multiprocessors have been riding this wave since l Now, we have multicore processors DRAM YearSize 198064 Kb 1983256 Kb 19861 Mb 19894 Mb 199216 Mb 199664 Mb 1999256 Mb 20021 Gb 20072 Gb 20094 Gb Microprocessor Logic DensityDRAM chip capacity
11
Compsci 001 4.11 Technology => dramatic change l Processor logic capacity: about 30% per year clock rate: about 20% per year l Memory DRAM capacity: about 60% per year (4x every 3 years) Memory speed: about 10% per year Cost per bit: improves about 25% per year l Disk capacity: about 60% per year Total use of data: 100% per 9 months! l Network Bandwidth Bandwidth increasing more than 100% per year!
12
Compsci 001 4.12 Performance Trends
13
Compsci 001 4.13 Processor Transistor Count (from http://en.wikipedia.org/wiki/Transistor_count) ProcessorTransistor count Date of intro- duction Manufactu- rer Intel 400423001971Intel Intel 800825001972Intel Intel 808045001974Intel Intel 808829 0001978Intel Intel 80286134 0001982Intel Intel 80386275 0001985Intel Intel 804861 200 0001989Intel Pentium3 100 0001993Intel AMD K54 300 0001996AMD Pentium II7 500 0001997Intel AMD K68 800 0001997AMD Pentium III9 500 0001999Intel AMD K6-III21 300 0001999AMD AMD K722 000 0001999AMD Pentium 442 000 0002000Intel ProcessorTransistor count Date of introdu- ction Manufacturer Itanium25 000 0002001Intel Barton54 300 0002003AMD AMD K8105 900 0002003AMD Itanium 2220 000 0002003Intel Itanium 2 with 9MB cache 592 000 0002004Intel Cell241 000 0002006SonySony/IBM/ ToshibaIBM Toshiba Core 2 Duo291 000 0002006Intel Core 2 Quadro582 000 0002006Intel Dual-Core Itanium 2 1 700 000 0002006Intel Quad-Core Itanium 2 000 000 000200Intel
14
Compsci 001 4.14 Processor-Memory Speed Gap µProc 50%/yr. DRAM 9%/yr. (2X/10 yrs) 1 10 100 1000 19801981198319841985198619871988198919901991199219931994199519961997199819992000 DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law”
15
Compsci 001 4.15 Latency vs. Throughput
16
Compsci 001 4.16 Memory bottleneck l CPU can execute dozens of instruction in the time it takes to retrieve one item from memory l Solution: Memory Hierarchy Use fast memory Registers Cache memory Rule: small memory is fast, large memory is small
17
Compsci 001 4.17 A great idea in computer science l Temporal locality Programs tend to access data that has been accessed recently (i.e. close in time ) l Spatial locality Programs tend to access data at an address near recently referenced data (i.e. close in space ) l Useful in graphics and virtual reality as well Realistic images require significant computational power Don’t need to represent distant objects as well l Efficient distributed systems rely on locality Memory access time increases over a network Want to acess data on local machine
18
Compsci 001 4.18 Microprocessor Generations l First generation: 1971-78 Behind the power curve (16-bit, <50k transistors) l Second Generation: 1979-85 Becoming “real” computers (32-bit, >50k transistors) l Third Generation: 1985-89 Challenging the “establishment” (Reduced Instruction Set Computer/RISC, >100k transistors) l Fourth Generation: 1990- Architectural and performance leadership (64-bit, > 1M transistors, Intel/AMD translate into RISC internally)
19
Compsci 001 4.19 In the beginning (8-bit) Intel 4004 l First general-purpose, single- chip microprocessor l Shipped in 1971 l 8-bit architecture, 4-bit implementation l 2,300 transistors l Performance < 0.1 MIPS (Million Instructions Per Sec) l 8008: 8-bit implementation in 1972 3,500 transistors First microprocessor-based computer (Micral) Targeted at laboratory instrumentation Mostly sold in Europe All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
20
Compsci 001 4.20 1st Generation (16-bit) Intel 8086 l Introduced in 1978 Performance < 0.5 MIPS l New 16-bit architecture “Assembly language” compatible with 8080 29,000 transistors Includes memory protection, support for Floating Point coprocessor l In 1981, IBM introduces PC Based on 8088--8-bit bus version of 8086
21
Compsci 001 4.21 2nd Generation (32-bit) Motorola 68000 l Major architectural step in microprocessors: First 32-bit architecture initial 16-bit implementation First flat 32-bit address Support for paging General-purpose register architecture Loosely based on PDP-11 minicomputer l First implementation in 1979 68,000 transistors < 1 MIPS (Million Instructions Per Second) l Used in Apple Mac Sun, Silicon Graphics, & Apollo workstations
22
Compsci 001 4.22 3 rd Generation: MIPS R2000 l Several firsts: First (commercial) RISC microprocessor First microprocessor to provide integrated support for instruction & data cache First pipelined microprocessor (sustains 1 instruction/clock) l Implemented in 1985 125,000 transistors 5-8 MIPS (Million Instructions per Second)
23
Compsci 001 4.23 4 th Generation (64 bit) MIPS R4000 l First 64-bit architecture l Integrated caches On-chip Support for off-chip, secondary cache l Integrated floating point l Implemented in 1991: Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS l Intel translates 80x86/ Pentium X instructions into RISC internally
24
Compsci 001 4.24 Key Architectural Trends l Increase performance at 1.6x per year (2X/1.5yr) True from 1985-present l Combination of technology and architectural enhancements Technology provides faster transistors ( 1/lithographic feature size) and more of them Faster transistors leads to high clock rates More transistors (“Moore’s Law”): Architectural ideas turn transistors into performance –Responsible for about half the yearly performance growth l Two key architectural directions Sophisticated memory hierarchies Exploiting instruction level parallelism
25
Compsci 001 4.25 Where have all the transistors gone? l Superscalar (multiple instructions per clock cycle) Execution Icache D cache branch TLB Intel Pentium III (10M transistors) 2 Bus Intf Out-Of-Order SS Branch prediction (predict outcome of decisions) 3 levels of cache Out-of-order execution (executing instructions in different order than programmer wrote them)
26
Compsci 001 4.26 Laws? l Define each of the following. What has its effect been on the advancement of computing technology? Moore’s Law Amdahl’s Law Metcalfe’s Law
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.