\course\cpeg324-08F\Topic7c

Slides:

Advertisements

Similar presentations

Cache and Virtual Memory Replacement Algorithms

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Memory Hierarchy and Cache Memory Jennifer Tsay CS 147 Section 3 October 8, 2009.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

IT253: Computer Organization

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

Computer Architecture Lecture 26 Fasih ur Rehman.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

Virtual Memory Chapter 8.

CSCI206 - Computer Organization & Programming

CMSC 611: Advanced Computer Architecture

COSC3330 Computer Architecture

CSE 351 Section 9 3/1/12.

Associativity in Caches Lecture 25

ITEC 202 Operating Systems

Replacement Policy Replacement policy:

Multilevel Memories (Improving performance using alittle “cash”)

Basic Performance Parameters in Computer Architecture:

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson

Morgan Kaufmann Publishers

William Stallings Computer Organization and Architecture 7th Edition

Cache memory Direct Cache Memory Associate Cache Memory

Andy Wang Operating Systems COP 4610 / CGS 5765

An Introduction to Cache Design

ECE 445 – Computer Organization

Andy Wang Operating Systems COP 4610 / CGS 5765

CPE 631 Lecture 05: Cache Design

Module IV Memory Organization.

Virtual Memory فصل هشتم.

Performance metrics for caches

Performance metrics for caches

Adapted from slides by Sally McKee Cornell University

\course\cpeg324-05F\Topic7c

Performance metrics for caches

Morgan Kaufmann Publishers

How can we find data in the cache?

CS 704 Advanced Computer Architecture

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia

CS 3410, Spring 2014 Computer Science Cornell University

CSC3050 – Computer Architecture

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Performance metrics for caches

Cache - Optimization.

Cache Memory and Performance

Principle of Locality: Memory Hierarchies

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Performance metrics for caches

10/18: Lecture Topics Using spatial locality

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Overview Problem Solution CPU vs Memory performance imbalance

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

\course\cpeg324-08F\Topic7c Cache Parameters Cache size : Scache (lines) Set number: N (sets) Line number per set: K (lines/set) Scache = KN (lines) = KN * L (bytes)  Here L is line size in bytes K-way set-associative 2018/9/22 \course\cpeg324-08F\Topic7c

Trade-offs in Set-Associativity Fully-associative: - Higher hit ratio, concurrent search, but slow access when associativity is large. Direct mapping: - Fast access (if hits) and simplicity for comparison. - Trivial replacement algorithm. Problem with hit ratio, e.g. in extreme case: if alternatively use 2 blocks which mapped into the same cache block frame: “trash” may happen. 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Note Main memory size: Smain (blocks) Cache memory Size: Scache (blocks) Let P = Since P >>1. Average search length is much greater than 1. Set-associativity provides a trade-off between: Concurrency in search. Average search/access time per block. You need search! Smain Scache 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c 1 N Scache < Full associative Set Direct Mapped Number of sets 2018/9/22 \course\cpeg324-08F\Topic7c

Important Factors in Cache Design Address partitioning strategy (3-dimention freedom). Total cache size/memory size Work load 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Address Partitioning M bits Log N Log L Set number byte address in a line Byte addressing mode Cache memory size data part = NKL (bytes) Directory size (per entry) M - log2N - log2L Reduce clustering (randomize accesses) set size 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Note: The exists a knee 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 8 10 20 30 40 Cache Size 0.34 Miss Ratio General Curve Describing Cache Behavior 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c …the data are sketchy and highly dependent on the method of gathering... … designer must make critical choices using a combination of “hunches, skills, and experience” as supplement… “a strong intuitive feeling concerning a future event or result.” 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Basic Principle Typical workload study + intelligent estimate of others Good Engineering: small degree over-design “30% rule”: Each doubling of the cache size reduces misses by 30% by Alan J. Smith. Cache Memories. Computing Surveys, Vol. 14., No 13, Sep 1982. It is a rough estimate only. 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c K: Associativity Bigger  Miss ratio Smaller is better in: Faster Cheaper 4 ~ 8 get best miss ratio Simpler 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c L : Line Size Atomic unit of transmission Miss ratio Smaller Larger average delay Less traffic Larger average hardware cost for associative search Larger possibility of “Line crossers” Workload dependent 16 ~ 128 byte Memory references spanning the boundary between two cache lines 2018/9/22 \course\cpeg324-08F\Topic7c

Cache Replacement Policy FIFO (first-in-first-out) replace the block loaded furthest in the past LRU (least-recently used) replace the block used furthest in the past OPT (furthest-future used) replace the block which will be used furthest in the future. Do not retain lines that have next occurrence in the most distant future Note: LRU performance is close to OPT for frequently encountered program structures. 2018/9/22 \course\cpeg324-08F\Topic7c

Example: Misses and Associativity Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. Direct Mapped Cache. Blue text Data used in time t. Black text Data used in time t-1. 5 misses for the 5 accesses 2018/9/22 \course\cpeg324-08F\Topic7c

Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. Two-way set-associative. LRU replacement policy Blue text Data used in time t. Black text Data used in time t-1. 4 misses for the 5 accesses 2018/9/22 \course\cpeg324-08F\Topic7c

Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. Fully associative Cache. Any memory block can be stored in any cache block. Blue text Data used in time t. Black text Data used in time t-1. Red text  Data used in time t-2. 3 misses for the 5 accesses 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Program Structure for i = 1 to n for j = 1 to n endfor Last-in-first-out feature makes the recent past likes the near future …. 2018/9/22 \course\cpeg324-08F\Topic7c

\course\cpeg324-08F\Topic7c Problem with LRU Not good in mimic sequential/cyclic Example ABCDEF ABC…… ABC…… Exercise: With a set size of 3, what is the miss ratio assuming all 6 addresses mapped to the same set ? 2018/9/22 \course\cpeg324-08F\Topic7c

Performance Evaluation Methods for Workload Analytical modeling. Simulation Measuring 2018/9/22 \course\cpeg324-08F\Topic7c

Cache Analysis Methods Hardware monitoring: Fast and accurate. Not fast enough (for high-performance machines). Cost. Flexibility/repeatability. 2018/9/22 \course\cpeg324-08F\Topic7c

Cache Analysis Methods cont’d Cache Analysis Methods Address traces and machine simulator: Slow. Accuracy/fidelity. Cost advantage. Flexibility/repeatability. OS/other impacts - How to put them in? 2018/9/22 \course\cpeg324-08F\Topic7c

Trace Driven Simulation for Cache Workload dependence: Difficulty in characterizing the load. No general accepted model. Effectiveness: Possible simulation for many parameters. Repeatability. 2018/9/22 \course\cpeg324-08F\Topic7c

Problem in Address Traces Representative of the actual workload (hard) Only cover a small fraction of real workload. Diversity of user programs. Initialization transient Use long enough traces to absorb the impact of cold misses Inability to properly model multiprocessor effects 2018/9/22 \course\cpeg324-08F\Topic7c