1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California.

Slides:

Advertisements

Similar presentations

IT253: Computer Organization

Advertisements

PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

TU/e Processor Design 5Z0321 Processor Design 5Z032 Computer Systems Overview Chapter 1 Henk Corporaal Eindhoven University of Technology 2011.

Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.

EECS 318 CAD Computer Aided Design LECTURE 2: DSP Architectures Instructor: Francis G. Wolff Case Western Reserve University This presentation.

Chapter 4 The Components of the System Unit

CS 430 – Computer Architecture Disks

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

CS 430 – Computer Architecture

Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:

Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.

COMP3221 lec33-Cache-I.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 12: Cache Memory - I

Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.

CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.

CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

Cs 61C L25 pipeline.1 Patterson Spring 99 ©UCB CS61C Introduction to Pipelining Lecture 25 April 28, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.

Disk Technologies. Magnetic Disks Purpose: – Long-term, nonvolatile, inexpensive storage for files – Large, inexpensive, slow level in the memory hierarchy.

Cs 61C L16 Review.1 Patterson Spring 99 ©UCB CS61C Memory Hierarchy Introduction and Eight Week Review Lecture 16 March 12, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

CS61C L21 Pipeline © UC Regents 1 CS61C - Machine Structures Lecture 21 - Introduction to Pipelined Execution November 15, 2000 David Patterson

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

CS430 – Computer Architecture Introduction to Pipelined Execution

CS 61C L30 Introduction to Pipelined Execution (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

1 Chapter 4 The Central Processing Unit and Memory.

Chapter 1 Sections 1.1 – 1.3 Dr. Iyad F. Jafar Introduction.

M206 – Data Measurement. Introduction ‘Have you ever wondered how the computer interprets data?’ This is the language that the computer understands. This.

Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.

CS 61C L01 Introduction (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia CS61C www page www-inst.eecs.berkeley.edu/~cs61c/

CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

Introduction CSE 410, Spring 2008 Computer Systems

CS1104: Computer Organisation School of Computing National University of Singapore.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

CS1104: Computer Organisation School of Computing National University of Singapore.

1 Computer System Organization I/O systemProcessor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction.

Integrated Circuits Costs

Computer Organization and Design Computer Abstractions and Technology

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS61C L17 Cache1 © UC Regents 1 CS61C - Machine Structures Lecture 17 - Caches, Part I October 25, 2000 David Patterson

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

Csci 136 Computer Architecture II – IO and Storage Systems Xiuzhen Cheng

1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.

Introduction CSE 410, Spring 2005 Computer Systems

Introduction to Computers - Hardware

COSC3330 Computer Architecture

Lecture 18: Pipelining I.

Yu-Lun Kuo Computer Sciences and Information Engineering

Hardware Technology Trends and Database Opportunities

The Goal: illusion of large, fast, cheap memory

CSE 410, Spring 2006 Computer Systems

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

Welcome to Architectures of Digital Systems

8086 processor.

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Presentation transcript:

1 Introduction to Hardware/Architecture David A. Patterson EECS, University of California Berkeley, CA

2 Technology Trends: Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “Moore’s Law”: Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law

3 Technology Trends: Processor Performance 1.54X/yr Processor performance increase/yr mistakenly referred to as Moore’s Law (transistors/chip)

4 5 components of any Computer Processor (active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk, Network

5 Computer Technology =>Dramatic Change n Processor m 2X in speed every 1.5 years; 1000X performance in last 15 years n Memory m DRAM capacity: 2x / 1.5 years; 1000X size in last 15 years m Cost per bit: improves about 25% per year n Disk m capacity: > 2X in size every 1.5 years m Cost per bit: improves about 60% per year m 120X size in last decade n State-of-the-art PC “when you graduate” ( ) m Processor clock speed: 1500 MegaHertz (1.5 GigaHertz) m Memory capacity: 500 MegaByte(0.5 GigaBytes) m Disk capacity: 100 GigaBytes(0.1 TeraBytes) m New units! Mega => Giga, Giga => Tera

6 Integrated Circuit Costs Die cost = Wafer cost Dies per Wafer * Die yield Die Cost is goes roughly with the cube of the area: fewer dies per wafer * yield worse with die area Flaws Dies

7 Die Yield (1993 data) Raw Dices Per Wafer wafer diameterdie area (mm 2 ) ”/15cm ”/20cm ”/25cm die yield23%19%16%12%11%10% typical CMOS process:  =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer Good Dices Per Wafer (Before Testing!) 6”/15cm ”/20cm ”/25cm typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000

Real World Examples ChipMetalLineWaferDefectAreaDies/YieldDie Cost layerswidthcost/cm 2 mm 2 wafer 386DX20.90$ %$4 486DX230.80$ %$12 PowerPC $ %$53 HP PA $ %$73 DEC Alpha30.70$ %$149 SuperSPARC30.70$ %$272 Pentium30.80$ %$417 From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

9 Processor Trends/ History n History of innovations to 2X / 1.5 yr m Pipelining (helps seconds / clock, or clock rate) m Out-of-Order Execution (helps clocks / instruction) m Superscalar (helps clocks / instruction)

10 Pipelining is Natural! °Laundry Example °Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away °Washer takes 30 minutes °Dryer takes 30 minutes °“Folder” takes 30 minutes °“Stasher” takes 30 minutes to put clothes into drawers ABCD

11 Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 TaskOrderTaskOrder B C D A Time 30 6 PM AM

12 Pipelined Laundry: Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A 30

13 Pipeline Hazard: Stall A depends on D; stall since folder tied up TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A E F bubble 30

14 Out-of-Order Laundry: Don’t Wait A depends on D; rest continue; need more resources to allow out-of-order TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A 30 E F bubble

15 Superscalar Laundry: Parallel per stage More resources, HW match mix of parallel tasks? TaskOrderTaskOrder 12 2 AM 6 PM Time B C D A E F (light clothing) (dark clothing) (very dirty clothing) (light clothing) (dark clothing) (very dirty clothing) 30

16 Superscalar Laundry: Mismatch Mix Task mix underutilizes extra resources TaskOrderTaskOrder 12 2 AM 6 PM Time 30 (light clothing) (dark clothing) (light clothing) A B D C

17 State of the Art: Alpha n 15M transistors n 2 64KB caches on chip; 16MB L2 cache off chip n Clock 600 MHz n 90 watts n Superscalar: fetch up to 6 instructions/clock cycle, retires up to 4 instruction/clock cycle n Execution out-of-order

18 Other example: Sony Playstation 2 n Emotion Engine: 6.2 GFLOPS, 75 million polygons per second (Microprocessor Report, 13:5) m Superscalar MIPS core + vector coprocessor + graphics/DRAM m Claim: “Toy Story” realism brought to games

19 The Goal: Illusion of large, fast, cheap memory n Fact: Large memories are slow, fast memories are small n How do we create a memory that is large, cheap and fast (most of the time)? n Hierarchy of Levels m Similar to Principle of Abstraction: hide details of multiple levels

20 Hierarchy Analogy: Term Paper n Working on paper in library at a desk n Option 1: Every time need a book m Leave desk to go to shelves (or stacks) m Find the book m Bring one book back to desk m Read section interested in m When done with section, leave desk and go to shelves carrying book m Put the book back on shelf m Return to desk to work m Next time need a book, go to first step

21 Hierarchy Analogy: Library n Option 2: Every time need a book m Leave some books on desk after fetching them m Only go to shelves when need a new book m When go to shelves, bring back related books in case you need them; sometimes you’ll need to return books not used recently to make space for new books on desk m Return to desk to work m When done, replace books on shelves, carrying as many as you can per trip n Illusion: whole library on your desktop n Buzzword “cache” from French for hidden treasure

22 Why Hierarchy works: Natural Locality n The Principle of Locality: m Program access a relatively small portion of the address space at any instant of time. Address Space 02^n - 1 Probability of reference n What programming constructs lead to Principle of Locality?

23 Memory Hierarchy: How Does it Work? n Temporal Locality (Locality in Time):  Keep most recently accessed data items closer to the processor m Library Analogy: Recently read books are kept on desk m Block is unit of transfer (like book) n Spatial Locality (Locality in Space):  Move blocks consists of contiguous words to the upper levels m Library Analogy: Bring back nearby books on shelves when fetch a book; hope that you might need it later for your paper

24 Memory Hierarchy Pyramid Levels in memory hierarchy Central Processor Unit (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB “Upper” “Lower” Level 3... (data cannot be in level i unless also in i+1)

25 Big Idea of Memory Hierarchy n Temporal locality: keep recently accessed data items closer to processor n Spatial locality: moving contiguous words in memory to upper levels of hierarchy n Uses smaller and faster memory technologies close to the processor m Fast hit time in highest level of hierarchy m Cheap, slow memory furthest from processor n If hit rate is high enough, hierarchy has access time close to the highest (and fastest) level and size equal to the lowest (and largest) level

26 Disk Description / History 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” Sector Track Cylinder Head Platter Arm Embed. Proc. (ECC, SCSI) Track Buffer

27 Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 Mbytes (2.5” diameter) source: N.Y. Times, 2/23/98, page C3 1997: 3090 Mbit/s. i Mbytes (3.5” diameter) 2000: 10,100 Mb/s. i. 25,000 MBytes 2000: 11,000 Mb/s. i. 73,400 MBytes

28 State of the Art: Ultrastar 72ZX m 73.4 GB, 3.5 inch disk m 2¢/MB m 16 MB track buffer m 11 platters, 22 surfaces m 15,110 cylinders m 7 Gbit/sq. in. areal density m 17 watts (idle) m 0.1 ms controller time m 5.3 ms avg. seek (seek 1 track => 0.6 ms) m 3 ms = 1/2 rotation m 37 to 22 MB/s to media source: 2/14/00 Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte { + Sector Track Cylinder Head Platter Arm Embed. Proc. Track Buffer

29 A glimpse into the future? n IBM microdrive for digital cameras m 340 Mbytes n Disk target in 5-7 years?

30 Questions? Contact us if you’re interested: