Cache Organization of Pentium

Slides:



Advertisements
Similar presentations
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Advertisements

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Virtual Memory. Why do we need VM? Program address space: 0 – 2^32 bytes –4GB of space Physical memory available –256MB or so Multiprogramming systems.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Cache Organization of Pentium
CS 241 Section Week #12 (04/22/10).
Lecture 19: Virtual Memory
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
Chapter 91 Logical Address in Paging  Page size always chosen as a power of 2.  Example: if 16 bit addresses are used and page size = 1K, we need 10.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Lecture 11 Virtual Memory
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Protection in Virtual Mode
Virtual Memory Chapter 7.4.
Processor support devices Part 2: Caches and the MESI protocol
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
תרגול מס' 5: MESI Protocol
Lecture 12 Virtual Memory.
Virtual Memory User memory model so far:
Section 9: Virtual Memory (VM)
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
12.4 Memory Organization in Multiprocessor Systems
Cache Memory Presentation I
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
William Stallings Computer Organization and Architecture 7th Edition
Lecture 21: Memory Hierarchy
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Comparison of Two Processors
Page that info back into your memory!
Lecture 5: Snooping Protocol Design Issues
Andy Wang Operating Systems COP 4610 / CGS 5765
Virtual Memory فصل هشتم.
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
* From AMD 1996 Publication #18522 Revision E
Virtual Memory Overcoming main memory size limitation
Lecture 25: Multiprocessors
High Performance Computing
Translation Buffers (TLB’s)
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 8: Efficient Address Translation
CSE 471 Autumn 1998 Virtual memory
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E
Translation Buffers (TLBs)
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Operating Systems: Internals and Design Principles, 6/E
Andy Wang Operating Systems COP 4610 / CGS 5765
Review What are the advantages/disadvantages of pages versus segments?
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Cache Organization of Pentium

Instruction & Data Cache of Pentium Both caches are organized as 2-way set associative caches with 128 sets (total 256 entries) There are 32 bytes in a line (8K/256) An LRU algorithm is used to select victims in each cache.

Structure of 8KB instruction and data cache Each entry in a set has its own tag. Tags in the data cache are triple ported, used for U pipeline V pipeline Bus snooping

Data Cache of Pentium Bus Snooping: It is used to maintain consistent data in a multiprocessor system where each processor has a separate cache Each entry in data cache can be configured for writethrough or write-back

Instruction Cache of Pentium Instruction cache is write protected to prevent self-modifying code. Tags in instruction cache are also triple ported Two ports for split-line accesses Third port for bus snooping

Split-line Access In Pentium (since CISC), instructions are of variable length(1-15bytes) Multibyte instructions may staddle two sequential lines stored in code cache Then it has to go for two sequential access which degrades performance. Solution: Split line Access

Split-line Access

Split-line Access It permits upper half of one line and lower half of next to be fetched from code cache in one clock cycle. When split-line is read, the information is not correctly aligned. The bytes need to be rotated so that prefetch queue receives instruction in proper order.

Instruction & Data Cache of Pentium Parity bits are used to maintain data integrity Each tag and every byte in data cache has its own parity bit. There is one parity bit for every 8 byte of data in instruction cache.

Translation Lookaside Buffers They translate virtual addresses to physical addresses Data Cache: Data cache contains two TLBs First: 4-way set associative with 64 entries Translates addresses for 4KB pages of main memory

Translation Lookaside Buffers First: The lower 12 bits addresses are same The upper 20-bits of virtual address are checked against four tags and translated into upper 20-bit physical address during a hit Since translation need to be quick, TLB is kept small Second: 4 way set-associative with 8 entries Used to handle 4MB pages

Translation Lookaside Buffers Both TLBs are parity protected and dual ported. Instruction Cache: Uses a single 4-way set associative TLB with 32 entries Both 4KB and 4MB are supported (4MB in 4KB chunks) Parity bits are used on tags and data to maintain data integrity Entries are placed in all 3 TLBs through the use of a 3-bit LRU counter stored in each set.

Cache Coherency in Multiprocessor System When multiple processors are used in a single system, there needs to be a mechanism whereby all processors agree on the contents of shared cache information. For e.g., two or more processors may utilize data from the same memory location,X. Each processor may change value of X, thus which value of X has to be considered?

Cache coherency in multiprocessor Systems If each processor change the value of the data item, we have different(incoherent) values of X’s data in each cache. Solution : Cache Coherency Mechanism

A multiprocessor system with incoherent cache data

Cache Coherency Pentium’s mechanism is called MESI (Modified/Exclusive/Shared/Invalid)Protocol. This protocol uses two bits stored with each line of data to keep track of the state of cache line.

Cache Coherency The four states are defined as follows: Modified: The current line has been modified and is only available in a single cache. Exclusive: The current line has not been modified and is only available in a single cache Writing to this line changes its state to modified

Cache Coherency Shared: Invalid: Copies of the current line may exist in more than one cache. A write to this line causes a writethrough to main memory and may invalidate the copies in the other cache Invalid: The current line is empty A read from this line will generate a miss A write will cause a writethrough to main memory

Cache Coherency Only the shared and invalid states are used in code cache. MESI protocol requires Pentium to monitor all accesses to main memory in a multiprocessor system. This is called bus snooping.

Cache Coherency Consider the above example. If the Processor 3 writes its local copy of X(30) back to memory, the memory write cycle will be detected by the other 3 processors. Each processor will then run an internal inquire cycle to determine whether its data cache contains address of X. Processor 1 and 2 then updates their cache based on individual MESI states.

Cache Coherency Inquire cycles examine the code cache as well (as code cache supports bus snooping) Pentium’s address lines are used as inputs during an inquire cycle to accomplish bus snooping.