Memory Hierarchies Sonish Shrestha October 3, 2013.

Slides:

Advertisements

Similar presentations

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.

11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

11/3/2005Comp 120 Fall November 10 classes to go! Cache.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1 Lecture 21: Virtual Memory, I/O Basics Today’s topics:  Virtual memory  I/O overview Reminder:  Assignment 8 due Tue 11/21.

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan

Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.

Systems I Locality and Caching

1 The Storage Hierarchy Registers Cache memory Main memory (RAM) Hard disk Removable media (CD, DVD etc) Internet Fast, expensive, few Slow, cheap, a lot.

CMPE 421 Parallel Computer Architecture

Lecture 19: Virtual Memory

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

IT253: Computer Organization

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Improving Memory Access The Cache and Virtual Memory

CSE 351 Section 9 3/1/12.

ECE232: Hardware Organization and Design

CS161 – Design and Architecture of Computer

Lecture 12 Virtual Memory.

Improving Memory Access 1/3 The Cache and Virtual Memory

CSC 4250 Computer Architectures

How will execution time grow with SIZE?

Basic Performance Parameters in Computer Architecture:

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Cache Memory Presentation I

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Part V Memory System Design

Andy Wang Operating Systems COP 4610 / CGS 5765

Lecture 23: Cache, Memory, Virtual Memory

Lecture 08: Memory Hierarchy Cache Performance

Lecture 22: Cache Hierarchies, Memory

Andy Wang Operating Systems COP 4610 / CGS 5765

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

CSE 351: The Hardware/Software Interface

CS-447– Computer Architecture Lecture 20 Cache Memories

CSC3050 – Computer Architecture

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

Memory Hierarchies Sonish Shrestha October 3, 2013

Memory Hierarchies Busses – The wires that move data around in a computer, from memory to cpu or to disc controller or screen are called busses. Front-Side Bus(FSB):connects processor to memory – Typically slower than processor, one reason caches are needed.

Latency and Bandwidth – Latency: Delay between the processor issuing a request for a memory item and the item actually arriving. (Meaning transfer from memory to cache, cache to register, or summarize them all). Nano Seconds or Clock periods – Bandwidth: Rate at which data arrives at its destination. Kilobyte,Megabyte,gigabyte per sec or per clock cycle. Time that a message takes from start to finish – T(n) = α + βn α – latency Β – inverse of bandwidth(time per byte) n - number of bytes

Registers: – The registers are what the processor actually operates on – High bandwidth and low latency because they are part of the processor. A = B + C – Load the value of B from memory into a register – Load the value of C form memory into another register – Compute sum and write that into another register – Write sum value back to memory location A

Cache Cache is small high speed memory that contains the most recently accessed pieces of main memory Example: Library Let's give the librarian a backpack into which he will be able to store 10 books (in computer terms, the librarian now has a 10-book cache). In this backpack, he will put the books the clients return to him, up to a maximum of 10. Let's use the prior example, but now with our new-and-improved caching librarian. The day starts. The backpack of the librarian is empty. Our first client arrives and asks for Moby Dick. No magic here -- the librarian has to go to the storeroom to get the book. He gives it to the client. Later, the client returns and gives the book back to the librarian. Instead of returning to the storeroom to return the book, the librarian puts the book in his backpack and stands there (he checks first to see if the bag is full -- more on that later). Another client arrives and asks for Moby Dick. Before going to the storeroom, the librarian checks to see if this title is in his backpack. He finds it! All he has to do is take the book from the backpack and give it to the client. There's no journey into the storeroom, so the client is served more efficiently. What if the client asked for a title not in the cache (the backpack)? In this case, the librarian is less efficient with a cache than without one, because the librarian takes the time to look for the book in his backpack first. One of the challenges of cache design is to minimize the impact of cache searches, and modern hardware has reduced this time delay to practically zero. Even in our simple librarian example, the latency time (the waiting time) of searching the cache is so small compared to the time to walk back to the storeroom that it is irrelevant. The cache is small (10 books), and the time it takes to notice a miss is only a tiny fraction of the time that a journey to the storeroom takes.

Core L1 L2 L3 Main Memory ~300 cycles ~ 1-2 cycle ~10-15 cycle ~ 30 cycle(say) (Shared) (In multicore,Private(L1 D & L1 I)

Locality When instructions or data are accessed, they usually show locality. Temporal Locality: – Temporal locality is the property that instructions or data are accessed multiple times over time. X at time t X at time t+i In a loop, instructions and data are accessed repeatedly. Spatial locality – Spatial locality is the property is that instructions or data in consecutive addresses are accessed over time. X at time t X+1 at time t+i

Cache misses Compulsory miss: – When a data is accessed first time, it is not in the cache. This is called compulsory miss. Conflict miss: – Suppose a data loaded and replaced by some other data. Then it is accessed again, it is not in the cache because it is replaced. This is called conflict miss. Capacity miss: – In a fully associative cache, we reduce the possibility of conflict misses by using all the available cache lines. But if all the lines are in use, then one of them should be replaced. And miss can occur when we try to access the replaced data. This is called Capacity miss.

Cache line and TLB Cache line: – data is moved from memory to cache in consecutive chunks named cachelines. TLB(Translation Look-aside Buffer) – The TLB is a cache of frequently used Page Table Entries: it provides fast address translation for a number of pages. If a program needs a memory location, the TLB is consulted to see whether this location is in fact on a page that is remembered in the TLB. – The case where the page is not remembered in the TLB is called a TLB miss

Replacement policies LRU(Least Recently Used) FIFO(First In First Out) MRU(Most Recently Used)

Cache mapping Byte Address Offset 8 Byte words Assume 64 Byte cache Data Array Index (Sets) 8 8-byte words ABCD Tag Array ABCD Compare Direct Mapping Note: All the addresses whose index bits and byte offset bits are the same are mapped to the same cache line. To tell the different addresses, we keep the remaining upper bits of address bits as a tag.

Set Associativity Byte Address Index (Sets) ABCD ABCC Compare Way 1Way 2

Full Associativity Assume you have a parking lot where they have handed out many parking permits. In fact, there's more parking permits than parking spots. This is not uncommon at a college. When a lot fills up, the students park in an overflow lot.Suppose there's 1000 parking spots, but 5000 students. With a fully associative scheme, a student can park in any of the 1000 parking spots. Search the entire cache for an address If a Main memory block can be placed in any of the Cache slots, then the cache is said to be mapped in fully associative.

Calculation Metric Cache size = Number of cache lines * cache line size (or block size) Number of cache lines = 2 index bits cache line size = 2 byte offset bits Tag bits = Number of bits in a word – index bits – offset bits

Exercise 1. Suppose we have a 16KB of data in a direct- mapped cache with 4 word blocks. 2. Suppose we have a 16KB of data in a 2-wat set associative cache with 4 word blocks. Find the size of index, offset and tag bits.

Answers 1) Cache size = 16KB = 16 * 2^10 bytes cache line size = 4 words = 4 * 4 bytes = 16 bytes Number of cache lines = 16 * 2^10 bytes / 16 bytes = 2^10 Index bits = 10 Offset bits = 4 Tag bits = 32 – 10 – 4 = 18 2) Cache size = 16 * 2^10 bytes cache line size = 16 bytes Set size = cache line size * set associativity = 16 bytes * 2 = 32 bytes Number of sets = 16 * 2^10 bytes / 32 bytes = 2^9 Index bits = 9 Offset bits = 4 Tag bits = 32 – 9 – 4 = 19

Data Reuse Exercise for (i = 0; i <= 10000; i=i+1000) { for (j = 0; j < 1000; j++ C[i+j] = A[j]; } When and which data is being reused in the above example?