Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM.

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

Part IV: Memory Management
M. Mateen Yaqoob The University of Lahore Spring 2014.
1 Virtual Memory Management B.Ramamurthy. 2 Demand Paging Main memory LAS 0 LAS 1 LAS 2 (Physical Address Space -PAS) LAS - Logical Address.
1 Memory Management Managing memory hierarchies. 2 Memory Management Ideally programmers want memory that is –large –fast –non volatile –transparent Memory.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Allocating Memory.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Memories and the Memory Subsystem;
Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM.
Chapter 4 (continued): Caching; Testing Memory Modules.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Multiprocessing Memory Management
Memory Management 2010.
Memory Organization.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Computer Organization and Architecture
Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.
1 Virtual Memory Management B.Ramamurthy Chapter 10.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
Chapter 6 Memory and Programmable Logic Devices
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
Virtual Memory.
Memory Systems Architecture and Hierarchical Memory Systems
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Chapter 3 Memory Management: Virtual Memory
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 CSCI 2510 Computer Organization Memory System I Organization.
IT253: Computer Organization
Subject: Operating System.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
Computer Architecture Lecture 32 Fasih ur Rehman.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Basic Memory Management 1. Readings r Silbershatz et al: chapters
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Virtual Memory. Cache memory: provides illusion of very high speed Virtual memory: provides illusion of very large size Main memory: reasonable cost,
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
Memory Management Program must be brought (from disk) into memory and placed within a process for it to be run Main memory and registers are only storage.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS 1410 Intro to Computer Tecnology Computer Hardware1.
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Chapter 2 Memory and process management
Memory COMPUTER ARCHITECTURE
Memory Units Memories store data in units from one to eight bits. The most common unit is the byte, which by definition is 8 bits. Computer memories are.
Cache Memory Presentation I
Chapter 9: Virtual-Memory Management
ECE 445 – Computer Organization
Performance metrics for caches
Performance metrics for caches
Contents Memory types & memory hierarchy Virtual memory (VM)
Principle of Locality: Memory Hierarchies
COMP755 Advanced Operating Systems
Performance metrics for caches
Chapter 8 & 9 Main Memory and Virtual Memory
Presentation transcript:

Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM

Memory: Some embedded systems require large amounts; others have small memory requirements Often must use a hierarchy of memory devices Memory allocation may be static or dynamic Main concerns [in embedded systems]: make sure allocation is safe minimize overhead Main points to remember: --choose the appropriate memory for the task at hand --make appropriate use of dynamic memory management caching—can be multiple level virtual storage (paging) --if building hardware, choose appropriate busing strategy, including bus width --may need to add extra bits for error detection, error correction --may need to add extra bits for security --may need to use compression of data or of code

Memory types: RAM DRAM—asynchronous; needs refreshing SRAM—asynchronous; no refreshing Semistatic RAM SDRAM—synchronous DRAM ROM—read only memory PROM—one time EPROM—reprogram (uv light) EEPROM—electrical reprogramming FLASH—reprogram without removing from circuit Altera chips: memory blocks with parity bit (supports error checking) synchronous, can emulate asynchronous can be used as: single port—nonsimultaneous read/write simple dual-port—simultaneous read/write “true” dual-port (bidirectional)—2 read; 2 write; one read, one write at different frequencies shift register ROM FIFO flash memory

fig_04_01 Standard memory configuration: Memory a “virtual array” Address decoder Signals: address, data, control

fig_04_02 ROM—usually read-only (some are programmable-”firmware”) transistor (0 or 1)

fig_04_04 SRAM—similar to ROM In this example—6 transistors per cell (compare to flipflop?)

fig_04_06 Dynamic RAM: only 1 transistor per cell READ causes transistor to discharge; it must be restored each time refresh cycle time determined by part specification

fig_04_08 Comparison—SRAM / DRAM

fig_04_11 Two important time intervals: access time and cycle time

fig_04_12 Terminology for memory systems: Block: logical unit of transfer Block size Page—logical unit; a collection of blocks Bandwidth—word transition rate on the I/O bus (memory can be organized in bits, bytes, words) Latency—time to access first word in a sequence Block access time—time to access entire block “virtual” storage

Memory interface: Restrictions which must be dealt with: Size of RAM or ROM width of address and data I/O lines

fig_04_14 Memory example: 4K x 16 SRAM Uses 2 8-bit SRAMs (to achieve desired word size) Uses 4 1K blocks (to achieve desired number of words) Address: 10 bits within a block 2 bits to specify block— CS (chip select)

If insufficient I/O lines: must multiplex signals and store in registers until data is accumulated (common in embedded system applications) Requires MAR / MDR configuration typically

DRAM: Variations available: EDO, SDRAM, FPM— basically DRAMs trying to accommodate ever faster processors Techniques: --synchronize DRAM to system clock --improve block accessing --allow pipelining As with SRAM, typically there are insufficient I/O pins and multiplexing must be used

fig_04_30 Memory organization: Typical “Memory map” For power loss

Issue in embedded systems design: stack overflow Example: should recursion be used? Control structures: “sequential” +: “Primitive”“Structured programming” GOTOchoice (if-else, case) Cond GOTOiteration (pre, post-test) ?recursion? [functions, macros: how do these fit into the list of control structures?]

fig_04_31 Memory hierarchy

fig_04_32 Paging / Caching Why it typically works: locality of reference (spatial/temporal) “working set” Notes: 1.in real-time embedded systems, behavior may be atypical; but caching may still be a useful technique: how do you decide if behavior is “typical” / “atypical”? 2.In all cases must be careful to prevent thrashing

fig_04_33 Typical memory system with cache: hit rate (miss rate) important

Basic caching strategies: Direct-mapped Associative Block-set associativequestions: what is “associative memory”? what is overhead? what is efficiency (hit rate)? is bigger cache better?

Associative memory: storage location related to data stored Example—hashing: --When software program is compiled or assembled, a symbol table must be created to link addresses with symbolic names --table may be large; even binary search of names may be too slow --convert each name to a number associated with the name, this number will be the symbol table index For example, let a = 1, b = 2, c = 3,… Then “cab” has value = 6 “ababab” has value 3 *(1 + 2) = 9 And “vvvvv” has value 5*22 = 110 Address will be modulo a prime p, if we expect about 50 unique identifiers, can take p = 101 (make storage about twice as large as number of items to be stored, reduce collisions) Now array of names in symbol table will look like: 0—> 1—> 2---> … 6--->cab … 9--->ababab--->vvvvv … Here there is one collision, at address 9; the two items are stored in a linked list Access time for an identifier <= (time to compute address) length of longest linked list ~ constant

Caching: the basic process—note OVERHEAD for each task --program needs information M that is not in the CPU --cache is checked for M how do we know if M is in the cache? --hit: M is in cache and can be retrieved and used by CPU --miss: M is not in cache (M in RAM or in secondary memory) where is M? * M must be brought into cache * if there is room, M is copied into cache how do we know if there is room? * if there is no room, must overwrite some info M’ how do we select M’? ++ if M’ has not been modified, overwrite it how do we know if M’ has been modified? ++ if M’ has been modified, must save changes how do we save changes to M’?

fig_04_34 Example: direct mapping 32-bit words, cache holds 64K words, in K blocks Memory addresses 32 bits Main memory 128M words; 2K pages, each holds 128 blocks (~ cache) fig_04_35 fig_04_36 2 bits--byte; 9 bits--word address; 7 bits—block address (index); 11 (of 15)—tag (page block is from) Tag table: 128 entries (one for each block in the cache). Contains: Tag: page block came from Valid bit: does this block contain data write-through: any change propagated immediately to main memory delayed write: since this data may change again soon, do not propagate change to main memory immediately— this saves overhead; instead, set the dirty bit Intermediate: use queue, update periodically When a new block is brought in, if the valid bit is true and the dirty bit is true, the old block must first be copied into main memory Replacement algorithm: none; each block only has one valid cache location

fig_04_37 Problem with direct mapping: two frequently used parts of code can be in different “Block0’s”—so repeated swapping would be necessary; this can degrade performance unacceptably, especially in realtime systems (similar to “thrashing” in operating system virtual memory system) Another method: associative mapping: put new block anywhere in the cache; now we need an algorithm to decide which block should be removed, if cache is full

fig_04_38 Step 1: locate the desired block within the cache; must search tag table, linear search may be too slow; search all entries in parallel or use hashing Step 2: if miss, decide which block to replace. a.Add time accessed to tag table info, use temporal locality: Least recently used (LRU)— a FIFO-type algorithm Most recently used (MRU)— a LIFO-type algorithm b. Choose a block at random Drawbacks: long search times Complexity and cost of supporting logic Advantages: more flexibility in managing cache contents

fig_04_39 Intermediate method: block-set associative cache Each index now specifies a set of blocks Main memory: divided into m blocks organized into n groups Group number = m mod n Cache set number ~ main memory group number Block from main memory group j can go into cache set j Search time is less, since search space is smaller How many blocks: simulation  answer (one rule of thumb: doubling associativity ~ doubling cache size, > 4-way probably not efficient) Two-way set-associative scheme

Example: 256K memory-64 groups, 512 blocks BlockGroup (m mod 64)

fig_04_40 Dynamic memory allocation (“virtual storage”): --for programs larger than main memory --for multiple processes in main memory --for multiple programs in main memory General strategies may not work well because of hard deadlines for real-time systems in embedded applications— general strategies are nondeterministic Simple setup: Can swap processes/programs And their contexts --Need storage (may be in firmware) --Need small swap time compared to run time --Need determinism Ex: chemical processing, thermal control

fig_04_41 Overlays (“pre-virtual storage”): Seqment program into one main section and a set of overlays (kept in ROM?) Swap overlays Choose segmentation carefully to prevent thrashing

fig_04_42 Multiprogramming: similar to paging Fixed partition size: Can get memory fragmentation Example: If each partition is 2K and we have 3 jobs: J1 = 1.5K, J2 = 0.5K, J3 = 2.1K Allocate to successive partitions (4) J2 is using only 0.5 K J3 is using 2 partitions, one of size 0.1K If a new job of size 1K enters system, there is no place for it, even though there is actually enough unused memory for it Variable size: Use a scheme like paging Include compaction Choose parameters carefully to prevent thrashing

Error checking: simple examples 1.Detect one bit error: add a parity bit 2.Correct a 1-bit error: Hamming code Example: send m message bits + r parity bits The number of possible error positions is m + r + 1, we need 2 r >= m + r + 1 If m = 8, need r = 4; ri checks parity of bits with i in binary representation Pattern: Bit #: Info: r0 r1 m1 r2 m2 m3 m4 r3 m5 m6 m7 m Set parity = 0 for each group r0: bits = r  r0 = 1 r1: bits = r  r1 = 1 r2: bits = r  r2 = 0 r3: bits = r  r3 = 1 Exercise: suppose message is sent and 1 bit is flipped in received message Compute the parity bits to see which bit is incorrect Addition: add an overall parity bit to end of message to also detect two errors Note: a.this is just one example, a more general formulation of Hamming codes using the finite field arithmetic can also be given b. this is one example of how error correcting codes can be obtained, there are many more complex examples, e.g., Reed-Solomon codes used in CD players

In an embedded system may need to COMPRESS data and/or code. To do this, make more frequent symbols have shorter encodings (e.g., Morse code) Simple example—Huffman coding Assign frequency to characters based on data to be compressed, use frequency to build a binary tree to determine encoding ex: in typical English text, the most frequent letters (in order) are ETAOINSHRDLU ….. so we should give shorter encodings to E and T, for example, and longer encodings to letters like X,Y,Z Huffman coding does this by using the character frequencies to build a tree; the more frequent the character, the closer to the root and the shorter its code will be

Lets look at this example from For symbols A, B, C, D, E, F, G, and H, we can choose a code with three bits per character. With such a code, the message BACADAEAFABBAAAGAH needs 54 bits But Suppose we know that the relative frequencies in our data are 8 for A, 3 for B, and 1 each for C-H Then we can build a tree representing these differences. We get the code for a symbol by tracing its position starting from the root and assigning 0 if we go left, 1 if right From this tree we compute the codes: A 0C 1010E 1100G 1110 B 100D 1011F 1101H 1111 And we can encode the above message in only 42 bits