Update : about 8~16% are writes

Slides:

Advertisements

Similar presentations

Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.

Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.

Performance of Cache Memory

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

CSCI 232© 2005 JW Ryder1 Cache Memory Systems Introduced by M.V. Wilkes (“Slave Store”) Appeared in IBM S360/85 first commercially.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

CS161 – Design and Architecture of Computer

CMSC 611: Advanced Computer Architecture

Translation Lookaside Buffer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Computer Organization

Processor support devices Part 2: Caches and the MESI protocol

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

CS161 – Design and Architecture of Computer

Associativity in Caches Lecture 25

Improving Memory Access 1/3 The Cache and Virtual Memory

Computer Architecture

CSC 4250 Computer Architectures

Multilevel Memories (Improving performance using alittle “cash”)

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Cache Memory Presentation I

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers

William Stallings Computer Organization and Architecture 7th Edition

Bojian Zheng CSCD70 Spring 2018

Performance metrics for caches

Performance metrics for caches

Adapted from slides by Sally McKee Cornell University

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Performance metrics for caches

Miss Rate versus Block Size

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

CS 704 Advanced Computer Architecture

Translation Buffers (TLB’s)

CS 3410, Spring 2014 Computer Science Cornell University

CSE451 Virtual Memory Paging Autumn 2002

Translation Buffers (TLB’s)

CSC3050 – Computer Architecture

Chapter Five Large and Fast: Exploiting Memory Hierarchy

How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.

Translation Buffers (TLBs)

Page Cache and Page Writeback

Performance metrics for caches

10/18: Lecture Topics Using spatial locality

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Update : about 8~16% are writes Main Memory Update Policy Update : about 8~16% are writes Write-through: Write data through to the memory as soon as they are placed on any cache. Reliable, but poor performance. Write-back (copy-back): Modifications written to the cache and then written through to the memory later. Fast: some data may be overwritten before they are written back, and so need never be written at all. Poor reliability: unwritten data will be lost whenever a user machine crashes. Clean and dirty blocks? Dirty bit: Indicate whether a line is modified while in the cache. When a “dirty line” is replaced it must be written back to the main Memory. Write-buffer: A queue that holds data while the data is waiting to be written to memory. 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b Main Memory Fetch Policy • Demand fetch: Fetching a block when it is needed and is not already in the cache, i.e. to fetch the required block on a miss. • Prefetch: Fetching blocks before they are requested. A simple prefetch strategy is to prefetch the (i+1)th block when the ith block is initially referenced on the expectation that it is likely to be needed if the ith block is needed. • Selective fetch: Not always fetching blocks, dependent upon some defined criterion, and in these cases using the main memory rather than the cache to hold the information. 2019/5/11 \course\cpeg323-07F\Topic7b

How to Handle Read/Write Read Easy Send the address to the appropriate cache. The address comes either from the PC (for an instruction read) or from ALU (for an data access). . If the cache signals hit: the requested word is available on the data lines. If the cache signals miss: we send the full address to the main memory. When the memory returns with the data, we write it into the cache. 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b Read Miss is easy to be handled quickly: Read tag and read block can be done simultaneously before we know it is a hit. Write is usually slower: Read tag and write block cannot be done simultaneously. (Except: for one-word-line caches) One-word-line: (DEC 3100) 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b For multiple-word-line: when write on a write miss, it is a read - modify - write cycle the original a portion write the block block The tag comparison cannot be done in parallel, so it is slower 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b Example Assume x, y map to the same set and cache has x initially Main Memory x x4 x3 x2 x1 y4 y3 y2 y1 y Cache Tag Data 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b Assume before write, cache contain line x. when write y: a miss, and a write occurs, e.g.: write y.3 z Note after write miss, if not careful, we get But later a write back will destroy y1, y2, y4! Tag Data x4 z x2 x1 y 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-07F\Topic7b There are two options on a write miss: Write allocate (also called fetch on write): The block is loaded, followed by the write-hit actions above. This is similar to a read miss. No write allocate (also called write around): The block is modified in the lower level and not loaded into the cache. Think what you do when have a write miss! 2019/5/11 \course\cpeg323-07F\Topic7b

When Write Through May Be Better? When 2-level cache is used: CPU on chip small cache + A large off-chip cache Consistency is easier Memory traffic is avoid by the 2nd cache 2019/5/11 \course\cpeg323-07F\Topic7b

Write-back vs. Write-through Speed (write-back is fast) Traffic (in general, copy-back is better) If more than one read hit to the line So attractive to multiprocessor in this sense Cache consistency (write-through is better) Logic (copy-back more complicated) 2019/5/11 \course\cpeg323-07F\Topic7b

Write-back vs. Write-through Cont’d Write-back vs. Write-through Buffering (4 is best for write-through) Needed for both, but copy-back only need 1. Management is complicated because when a ref is made, it must consult the buffer. Reliability (write-through is better) because main memory has error detection “There is no clear choice” in terms of performance ... 2019/5/11 \course\cpeg323-07F\Topic7b

\course\cpeg323-08F\Topic7b Advantage 2019/5/11 \course\cpeg323-08F\Topic7b