Five Components of a Computer

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Memory Organization.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer Organization and Architecture
Revision Mid 2 Prof. Sin-Min Lee Department of Computer Science.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Computing Systems Memory Hierarchy.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Lecture 19: Virtual Memory
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
IT253: Computer Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Computer Organization & Programming
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Computer Organization CS224 Fall 2012 Lessons 37 & 38.
The Memory Hierarchy Cache, Main Memory, and Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
CS161 – Design and Architecture of Computer
Computer Organization
Memory COMPUTER ARCHITECTURE
Yu-Lun Kuo Computer Sciences and Information Engineering
CS161 – Design and Architecture of Computer
CS352H: Computer Systems Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers Memory & Cache
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
ECE 445 – Computer Organization
Morgan Kaufmann Publishers Memory Hierarchy: Introduction
Contents Memory types & memory hierarchy Virtual memory (VM)
Presentation transcript:

Five Components of a Computer Keyboard, Mouse Computer Memory (passive) (where programs, data live when running) Devices Processor Disk (where programs, data live when not running) Input Control Output Datapath Display, Printer

Processor-Memory Performance Gap 55%/year (2X/1.5yr) “Moore’s Law” Processor-Memory Performance Gap (grows 50%/year) DRAM 7%/year (2X/10yrs) HIDDEN SLIDE – KEEP? Memory baseline is a 64KB DRAM in 1980, with three years to the next generation until 1996 and then two years thereafter with a 7% per year performance improvement in latency. Processor assumes a 35% improvement per year until 1986, then a 55% until 2003, then 5% Need to supply an instruction and a data every clock cycle In 1980 there were no caches (and no need for them), by 1995 most systems had 2 level caches (e.g., 60% of the transistors on the Alpha 21164 were in the cache)

The Memory Hierarchy Goal Fact: Large memories are slow and fast memories are small How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)? With hierarchy With parallelism

Memory Caching Mismatch between processor and memory speeds leads us to add a new level: a memory cache Implemented with same IC processing technology as the CPU (usually integrated on same chip): faster but more expensive than DRAM memory Cache is a copy of a subset of main memory Most processors have separate caches for instructions and data

Memory Technology Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM) 50ns – 70ns, $20 – $75 per GB Magnetic disk 5ms – 20ms, $0.20 – $2 per GB Ideal memory Access time of SRAM Capacity and cost/GB of disk

Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon e.g., instructions in a loop Spatial locality Items near those accessed recently are likely to be accessed soon E.g., sequential instruction access, array data

Taking Advantage of Locality Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory Main memory Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory Cache memory attached to CPU

Memory Hierarchy Levels

Memory Hierarchy Analogy: Library You’re writing a term paper (Processor) at a table in the library Library is equivalent to disk essentially limitless capacity very slow to retrieve a book Table is main memory smaller capacity: means you must return book when table fills up easier and faster to find a book there once you’ve already retrieved it

Memory Hierarchy Analogy Open books on table are cache smaller capacity: can have very few open books fit on table; again, when table fills up, you must close a book much, much faster to retrieve data Illusion created: whole library open on the tabletop Keep as many recently used books open on table as possible since likely to use again Also keep as many books on table as possible, since faster than going to library

Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses = 1 – hit ratio Then accessed data supplied from upper level

Cache Memory Cache memory Given accesses X1, …, Xn–1, Xn The level of the memory hierarchy closest to the CPU Given accesses X1, …, Xn–1, Xn How do we know if the data is present? Where do we look?

Direct Mapped Cache Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) #Blocks is a power of 2 Use low-order address bits

Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data Actually, only need the high-order bits Called the tag What if there is no data in a location? Valid bit: 1 = present, 0 = not present Initially 0

Cache Example 8-blocks, 1 word/block, direct mapped Initial state Index V Tag Data 000 N 001 010 011 100 101 110 111

Cache Example Word addr Binary addr Hit/miss Cache block 22 10 110 Index V Tag Data 000 N 001 010 011 100 101 110 Y 10 Mem[10110] 111

Cache Example Word addr Binary addr Hit/miss Cache block 26 11 010 Index V Tag Data 000 N 001 010 Y 11 Mem[11010] 011 100 101 110 10 Mem[10110] 111

Cache Example Word addr Binary addr Hit/miss Cache block 22 10 110 Hit 26 11 010 010 Index V Tag Data 000 N 001 010 Y 11 Mem[11010] 011 100 101 110 10 Mem[10110] 111

Cache Example Word addr Binary addr Hit/miss Cache block 16 10 000 3 00 011 011 Hit Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 11 Mem[11010] 011 00 Mem[00011] 100 101 110 Mem[10110] 111

Cache Example Word addr Binary addr Hit/miss Cache block 18 10 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Mem[10010] 011 00 Mem[00011] 100 101 110 Mem[10110] 111

Address Subdivision

Bits in a Cache Example: How many total bits are required for a direct-mapped cache with 16 KB of data and 4- word blocks, assuming a 32-bit address? (DONE IN CLASS) 32-bit address, cache size 2n blocks, cache block size is 2m words. The size of tag is? The total bit in cache is?

Problem1 (DONE IN CLASS) For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the cache Tag Index Offset 31-10 9-4 3-0 What is the cache line size (in words)? How many entries does the cache have ? How big the data in cache is? (DONE IN CLASS)

Problem2 Below is a list of 32-bit memory address, given as WORD addresses: 1, 134, 212,1, 135, 213, 162, 161, 2, 44, 41, 221 For each of these references, identify the binary address, the tag, the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or miss. For each of these references, identify the binary address, the tag, the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or miss. (DONE IN CLASS)

Problem 3 Below is a list of 32-bit memory address, given as BYTE addresses: 1, 134, 212,1, 135, 213, 162, 161, 2, 44, 41, 221 For each of these references, identify the binary address, the tag, the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or miss. For each of these references, identify the binary address, the tag, the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or miss. (DONE IN CLASS)

Associative Caches Fully associative n-way set associative Allow a given block to go in any cache entry Requires all entries to be searched at once Comparator per entry (expensive) n-way set associative Each set contains n entries Block number determines which set (Block number) modulo (#Sets in cache) Search all entries in a given set at once n comparators (less expensive)

Associative Cache Example

Spectrum of Associativity For a cache with 8 entries

Misses and Associativity in Caches Example: Assume there are 3 small caches (direct mapped, two-way set associative, fully associative), each consisting of 4 one-word blocks. Find the number of misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, 8 (DONE IN CLASS)

Associativity Example Morgan Kaufmann Publishers 23 April, 2017 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative Block access sequence: 0, 8, 0, 6, 8 Direct mapped Block address Cache index Hit/miss Cache content after access 1 2 3 miss Mem[0] 8 Mem[8] 6 Mem[6] Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Associativity Example Morgan Kaufmann Publishers 23 April, 2017 Associativity Example 2-way set associative Block address Cache index Hit/miss Cache content after access Set 0 Set 1 miss Mem[0] 8 Mem[8] hit 6 Mem[6] Fully associative Block address Hit/miss Cache content after access miss Mem[0] 8 Mem[8] hit 6 Mem[6] Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Replacement Policy Direct mapped: no choice Set associative Prefer non-valid entry, if there is one Otherwise, choose among entries in the set Least-recently used (LRU) Choose the one unused for the longest time Simple for 2-way, manageable for 4-way, too hard beyond that Most-recently used (MRU) Random Gives approximately the same performance as LRU for high associativity

Set Associative Cache Organization

Problem 4 (DONE IN CLASS) Identify the index bits, the tag bits and block offset bits for a cache of 3- way set associative cache with 2-word blocks and a total size of 24 words. How about cache block size 8 bytes with a total size of 96 bytes, 3-way set associative? (DONE IN CLASS)

Problem 5 (DONE IN CLASS) Identify the index bits, the tag bits and block offset bits for a cache of 3- way set associative cache with 4-word blocks and a total size of 24 words. How about 3-way set associative, cache block size 16bytes with 2 sets. How about a full associative cache with 1-word blocks and a total size of 8 words? How about a full associative cache with 2-word blocks and a total size of 8 words? (DONE IN CLASS)

Problem 6 (DONE IN CLASS) Identify the index bits, the tag bits and block offset bits for a cache of 3- way set associative cache with 4-word blocks and a total size of 24 words. How about 3-way set associative, cache block size 16bytes with 2 sets. (DONE IN CLASS)

Problem 7 Below is a list of 32-bit memory address, given as WORD addresses: 1, 134, 212,1, 135, 213, 162, 161, 2, 44, 41, 221 For each of these references, identify the index bits, the tag bits and block offset bits for a cache of 3-way set associative cache with 2-word blocks and a total size of 24 words. Show if a hit or a miss, assuming using LRU replacement? Show final cache contents. How about a full associative cache with 1-word blocks and a total size of 8 words? (DONE IN CLASS)

Problem 8 Below is a list of 32-bit memory address, given as WORD addresses: 1, 134, 212,1, 135, 213, 162, 161, 2, 44, 41, 221 What is the miss rate of a fully associative cache with 2-word blocks and a total size of 8 words, using LRU replacement. What is the miss rate using MRU replacement? (DONE IN CLASS)

Replacement Algorithms (1) Direct mapping No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

Block / line sizes How much data should be transferred from main memory to the cache in a single memory reference Complex relationship between block size and hit ratio as well as the operation of the system bus itself As block size increases, Locality of reference predicts that the additional information transferred will likely be used and thus increases the hit ratio (good)

Block / line sizes Number of blocks in cache goes down, limiting the total number of blocks in the cache (bad) As the block size gets big, the probability of referencing all the data in it goes down (hit ratio goes down) (bad) Size of 4-8 addressable units seems about right for current systems

Number of Caches (Single vs. 2-Level) Modern CPU chips have on-board cache (L1, Internal cache) 80486 -- 8KB Pentium -- 16 KB Power PC -- up to 64 KB L1 provides best performance gains Secondary, off-chip cache (L2) provides higher speed access to main memory L2 is generally 512KB or less -- more than this is not cost-effective

Unified Cache Unified cache stores data and instructions in 1 cache Only 1 cache to design and operate Cache is flexible and can balance “allocation” of space to instructions or data to best fit the execution of the program -- higher hit ratio

Split Cache Split cache uses 2 caches -- 1 for instructions and 1 for data Must build and manage 2 caches Static allocation of cache sizes Can out perform unified cache in systems that support parallel execution and pipelining (reduces cache contention)

Some Cache Architectures

Some Cache Architectures

Some Cache Architectures

Virtual Memory

Virtual Memory In order to be executed or data to be accessed, a certain segment of the program has to be first loaded into main memory; in this case it has to replace another segment already in memory Movement of programs and data, between main memory and secondary storage, is performed automatically by the operating system. These techniques are called virtual-memory techniques

Virtual Memory

Virtual Memory Organization The virtual programme space (instructions + data) is divided into equal, fixed-size chunks called pages. Physical main memory is organized as a sequence of frames; a page can be assigned to an available frame in order to be stored (page size = frame size). The page is the basic unit of information which is moved between main memory and disk by the virtual memory system.

Demand Paging The program consists of a large amount of pages which are stored on disk; at any one time, only a few pages have to be stored in main memory. The operating system is responsible for loading/ replacing pages so that the number of page faults is minimized.

Demand Paging We have a page fault when the CPU refers to a location in a page which is not in main memory; this page has then to be loaded and, if there is no available frame, it has to replace a page which previously was in memory.

Address Translation Accessing a word in memory involves the translation of a virtual address into a physical one: - virtual address: page number + offset - physical address: frame number + offset Address translation is performed by the MMU using a page table.

Example

Address Translation

The Page Table The page table has one entry for each page of the virtual memory space. Each entry of the page table holds the address of the memory frame which stores the respective page, if that page is in main memory. If every page table entry is around 4 bytes, how big the page table is?

The Page Table Each entry of the page table also includes some control bits which describe the status of the page: whether the page is actually loaded into main memory or not; if since the last loading the page has been modified; information concerning the frequency of access, etc.

Memory Reference with Virtual Memory

Memory Reference with Virtual Memory Memory access is solved by hardware except the page fault sequence which is executed by the OS software. The hardware unit which is responsible for translation of a virtual address into a physical one is the Memory Management Unit (MMU).

Translation Lookaside Buffer Every virtual memory reference causes two physical memory access Fetch page table entry Fetch data Use special cache for page table TLB

Fast Translation Using a TLB Morgan Kaufmann Publishers 23 April, 2017 Fast Translation Using a TLB Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

TLB and Cache Interaction Morgan Kaufmann Publishers 23 April, 2017 TLB and Cache Interaction Chapter 5 — Large and Fast: Exploiting Memory Hierarchy

Pentium II Address Translation Mechanism

Page Replacement When a new page is loaded into main memory and there is no free memory frame, an existing page has to be replaced The decision on which page to replace is based on the same speculations like those for replacement of blocks in cache memory LRU strategy is often used to decide on which page to replace.

Page Replacement When the content of a page, which is loaded into main memory, has been modified as result of a write, it has to be written back on the disk after its replacement. One of the control bits in the page table is used in order to signal that the page has been modified.