1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California Berkeley, CA
2 Technology Trends: Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “Moore’s Law”: Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law
3 Technology Trends: Processor Performance 1.54X/yr Processor performance increase/yr mistakenly referred to as Moore’s Law (transistors/chip)
4 5 components of any Computer Processor (active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk, Network
5 Computer Technology =>Dramatic Change n Processor m 2X in speed every 1.5 years; 1000X performance in last 15 years n Memory m DRAM capacity: 2x / 1.5 years; 1000X size in last 15 years m Cost per bit: improves about 25% per year n Disk m capacity: > 2X in size every 1.5 years m Cost per bit: improves about 60% per year m 120X size in last decade n State-of-the-art PC “when you graduate” ( ) m Processor clock speed: 1500 MegaHertz (1.5 GigaHertz) m Memory capacity: 500 MegaByte(0.5 GigaBytes) m Disk capacity: 100 GigaBytes(0.1 TeraBytes) m New units! Mega => Giga, Giga => Tera
6 Integrated Circuit Costs Die cost = Wafer cost Dies per Wafer * Die yield Die Cost is goes roughly with the cube of the area: fewer dies per wafer * yield worse with die area Flaws Dies
7 Die Yield (1993 data) Raw Dices Per Wafer wafer diameterdie area (mm 2 ) ”/15cm ”/20cm ”/25cm die yield23%19%16%12%11%10% typical CMOS process: =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer Good Dices Per Wafer (Before Testing!) 6”/15cm ”/20cm ”/25cm typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000
Real World Examples ChipMetalLineWaferDefectAreaDies/YieldDie Cost layerswidthcost/cm 2 mm 2 wafer 386DX20.90$ %$4 486DX230.80$ %$12 PowerPC $ %$53 HP PA $ %$73 DEC Alpha30.70$ %$149 SuperSPARC30.70$ %$272 Pentium30.80$ %$417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15
9 Processor Trends/ History n History of innovations to 2X / 1.5 yr m Pipelining (helps seconds / clock, or clock rate) m Out-of-Order Execution (helps clocks / instruction) m Superscalar (helps clocks / instruction)
10 Pipelining is Natural! °Laundry Example °Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away °Washer takes 30 minutes °Dryer takes 30 minutes °“Folder” takes 30 minutes °“Stasher” takes 30 minutes to put clothes into drawers ABCD
11 Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 TaskOrderTaskOrder B C D A Time 30 6 PM AM
12 Pipelined Laundry: Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A 30
13 Pipeline Hazard: Stall A depends on D; stall since folder tied up TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A E F bubble 30
14 Out-of-Order Laundry: Don’t Wait A depends on D; rest continue; need more resources to allow out-of-order TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A 30 E F bubble
15 Superscalar Laundry: Parallel per stage More resources, HW match mix of parallel tasks? TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A E F (light clothing) (dark clothing) (very dirty clothing) (light clothing) (dark clothing) (very dirty clothing) 30
16 Superscalar Laundry: Mismatch Mix Task mix underutilizes extra resources TaskOrderTaskOrder 12 2 AM 6 PM Time 30 (light clothing) (dark clothing) (light clothing) A B D C
17 State of the Art: Alpha n 15M transistors n 2 64KB caches on chip; 16MB L2 cache off chip n Clock 600 MHz n 90 watts n Superscalar: fetch up to 6 instructions/clock cycle, retires up to 4 instruction/clock cycle n Execution out-of-order
18 Other example: Sony Playstation 2 n Emotion Engine: 6.2 GFLOPS, 75 million polygons per second (Microprocessor Report, 13:5) m Superscalar MIPS core + vector coprocessor + graphics/DRAM m Claim: “Toy Story” realism brought to games
19 The Goal: Illusion of large, fast, cheap memory n Fact: Large memories are slow, fast memories are small n How do we create a memory that is large, cheap and fast (most of the time)? n Hierarchy of Levels m Similar to Principle of Abstraction: hide details of multiple levels
20 Hierarchy Analogy: Term Paper n Working on paper in library at a desk n Option 1: Every time need a book m Leave desk to go to shelves (or stacks) m Find the book m Bring one book back to desk m Read section interested in m When done with section, leave desk and go to shelves carrying book m Put the book back on shelf m Return to desk to work m Next time need a book, go to first step
21 Hierarchy Analogy: Library n Option 2: Every time need a book m Leave some books on desk after fetching them m Only go to shelves when need a new book m When go to shelves, bring back related books in case you need them; sometimes you’ll need to return books not used recently to make space for new books on desk m Return to desk to work m When done, replace books on shelves, carrying as many as you can per trip n Illusion: whole library on your desktop n Buzzword “cache” from French for hidden treasure
22 Why Hierarchy works: Natural Locality n The Principle of Locality: m Program access a relatively small portion of the address space at any instant of time. Address Space 02^n - 1 Probability of reference n What programming constructs lead to Principle of Locality?
23 Memory Hierarchy: How Does it Work? n Temporal Locality (Locality in Time): Keep most recently accessed data items closer to the processor m Library Analogy: Recently read books are kept on desk m Block is unit of transfer (like book) n Spatial Locality (Locality in Space): Move blocks consists of contiguous words to the upper levels m Library Analogy: Bring back nearby books on shelves when fetch a book; hope that you might need it later for your paper
24 Memory Hierarchy Pyramid Levels in memory hierarchy Central Processor Unit (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB “Upper” “Lower” Level 3... (data cannot be in level i unless also in i+1)
25 Big Idea of Memory Hierarchy n Temporal locality: keep recently accessed data items closer to processor n Spatial locality: moving contiguous words in memory to upper levels of hierarchy n Uses smaller and faster memory technologies close to the processor m Fast hit time in highest level of hierarchy m Cheap, slow memory furthest from processor n If hit rate is high enough, hierarchy has access time close to the highest (and fastest) level and size equal to the lowest (and largest) level
26 Disk Description / History 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” Sector Track Cylinder Head Platter Arm Embed. Proc. (ECC, SCSI) Track Buffer
27 Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 Mbytes (2.5” diameter) source: N.Y. Times, 2/23/98, page C3 1997: 3090 Mbit/s. i Mbytes (3.5” diameter) 2000: 10,100 Mb/s. i. 25,000 MBytes 2000: 11,000 Mb/s. i. 73,400 MBytes
28 State of the Art: Ultrastar 72ZX m 73.4 GB, 3.5 inch disk m 2¢/MB m 16 MB track buffer m 11 platters, 22 surfaces m 15,110 cylinders m 7 Gbit/sq. in. areal density m 17 watts (idle) m 0.1 ms controller time m 5.3 ms avg. seek (seek 1 track => 0.6 ms) m 3 ms = 1/2 rotation m 37 to 22 MB/s to media source: 2/14/00 Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte { + Sector Track Cylinder Head Platter Arm Embed. Proc. Track Buffer
29 A glimpse into the future? n IBM microdrive for digital cameras m 340 Mbytes n Disk target in 5-7 years?
30 Questions? Contact us if you’re interested: