Cache Organization of Pentium

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Translation Buffers (TLB’s)
Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 6, 2002 Topic: 1. Virtual Memory; 2. Cache Coherence.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
CS 241 Section Week #12 (04/22/10).
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 28 Fasih ur Rehman.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Chapter 91 Logical Address in Paging  Page size always chosen as a power of 2.  Example: if 16 bit addresses are used and page size = 1K, we need 10.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COSC 3330/6308 Second Review Session Fall Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.
1 Lecture 7: Implementing Cache Coherence Topics: implementation details.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
ARM 7 & ARM 9 MICROCONTROLLERS AT91 1 ARM920T Processor.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Protection in Virtual Mode
Virtual Memory Chapter 7.4.
Cache Organization of Pentium
ECE232: Hardware Organization and Design
תרגול מס' 5: MESI Protocol
CSC 4250 Computer Architectures
12.4 Memory Organization in Multiprocessor Systems
Lecture 23: Cache, Memory, Virtual Memory
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Translation Buffers (TLB’s)
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
CSE 471 Autumn 1998 Virtual memory
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E
Presentation transcript:

Cache Organization of Pentium

Instruction & Data Cache of Pentium Both caches are organized as 2-way set associative caches with 128 sets (total 256 entries) There are 32 bytes in a line (8K/256) An LRU algorithm is used to select victims in each cache.

Structure of 8KB instruction and data cache Each entry in a set has its own tag. Tags in the data cache are triple ported, used for U pipeline V pipeline Bus snooping

Data Cache of Pentium Bus Snooping: It is used to maintain consistent data in a multiprocessor system where each processor has a separate cache Each entry in data cache can be configured for writethrough or write-back

Instruction Cache of Pentium Instruction cache is write protected to prevent self-modifying code. Tags in instruction cache are also triple ported Two ports for split-line accesses Third port for bus snooping

Split-line Access In Pentium (since CISC), instructions are of variable length(1-15bytes) Multibyte instructions may staddle two sequential lines stored in code cache Then it has to go for two sequential access which degrades performance. Solution: Split line Access

Split-line Access

Split-line Access It permits upper half of one line and lower half of next to be fetched from code cache in one clock cycle. When split-line is read, the information is not correctly aligned. The bytes need to be rotated so that prefetch queue receives instruction in proper order.

Instruction & Data Cache of Pentium Parity bits are used to maintain data integrity Each tag and every byte in data cache has its own parity bit. There is one parity bit for every 8 byte of data in instruction cache.

Translation Lookaside Buffers They translate virtual addresses to physical addresses Data Cache: Data cache contains two TLBs First: 4-way set associative with 64 entries Translates addresses for 4KB pages of main memory

Translation Lookaside Buffers First: The lower 12 bits addresses are same The upper 20-bits of virtual address are checked against four tags and translated into upper 20-bit physical address during a hit Since translation need to be quick, TLB is kept small Second: 4 way set-associative with 8 entries Used to handle 4MB pages

Translation Lookaside Buffers Both TLBs are parity protected and dual ported. Instruction Cache: Uses a single 4-way set associative TLB with 32 entries Both 4KB and 4MB are supported (4MB in 4KB chunks) Parity bits are used on tags and data to maintain data integrity Entries are placed in all 3 TLBs through the use of a 3-bit LRU counter stored in each set.

Cache Coherency in Multiprocessor System When multiple processors are used in a single system, there needs to be a mechanism whereby all processors agree on the contents of shared cache information. For e.g., two or more processors may utilize data from the same memory location,X. Each processor may change value of X, thus which value of X has to be considered?

Cache coherency in Multiprocessor Systems If each processor change the value of the data item, we have different(incoherent) values of X’s data in each cache. Solution : Cache Coherency Mechanism

A multiprocessor system with incoherent cache data

Cache Coherency Pentium’s mechanism is called MESI (Modified/Exclusive/Shared/Invalid)Protocol. This protocol uses two bits stored with each line of data to keep track of the state of cache line.

Cache Coherency The four states are defined as follows: Modified: The current line has been modified and is only available in a single cache. Exclusive: The current line has not been modified and is only available in a single cache Writing to this line changes its state to modified

Cache Coherency Shared: Invalid: Copies of the current line may exist in more than one cache. A write to this line causes a writethrough to main memory and may invalidate the copies in the other cache Invalid: The current line is empty A read from this line will generate a miss A write will cause a writethrough to main memory

Cache Coherency Only the shared and invalid states are used in code cache. MESI protocol requires Pentium to monitor all accesses to main memory in a multiprocessor system. This is called bus snooping.

Cache Coherency Consider the above example. If the Processor 3 writes its local copy of X(30) back to memory, the memory write cycle will be detected by the other 3 processors. Each processor will then run an internal inquire cycle to determine whether its data cache contains address of X. Processor 1 and 2 then updates their cache based on individual MESI states.

Cache Coherency Inquire cycles examine the code cache as well (as code cache supports bus snooping) Pentium’s address lines are used as inputs during an inquire cycle to accomplish bus snooping.