Basic Performance Parameters in Computer Architecture:

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
Cache Memory Yi-Ning Huang. Principle of Locality Principle of Locality A phenomenon that the recent used memory location is more likely to be used again.
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
Improving Memory Access The Cache and Virtual Memory
Soner Onder Michigan Technological University
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
The Hardware/Software Interface CSE351 Winter 2013
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
William Stallings Computer Organization and Architecture 7th Edition
CS61C : Machine Structures Lecture 6. 2
Basic Performance Parameters in Computer Architecture:
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
CS61C : Machine Structures Lecture 6. 2
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Lecture 08: Memory Hierarchy Cache Performance
Lecture 22: Cache Hierarchies, Memory
Lecture 22: Cache Hierarchies, Memory
CPE 631 Lecture 05: Cache Design
Performance metrics for caches
Performance metrics for caches
ECE232: Hardware Organization and Design
Performance metrics for caches
Miss Rate versus Block Size
CS 704 Advanced Computer Architecture
Main Memory Cache Architectures
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
/ Computer Architecture and Design
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Performance metrics for caches
Cache - Optimization.
Fundamentals of Computing: Computer Architecture
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Performance metrics for caches
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Overview Problem Solution CPU vs Memory performance imbalance
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Basic Performance Parameters in Computer Architecture:

Levels of Transformation:

Good Old Moore’s Law: (Technology vs Architects) For every 18-24 months, 2x transistors on the same chip area. Processor Speed Doubles every 18-24 months Energy Operation halves every 18-24 months Memory Capacity doubles every 18–24 months

Parameters for Metric and Evaluation: What does better mean in Computer Architecture? Is the speed (GHz) or the Memory size(GB)? Latency and Throughput are two key performance parameters. Latency: time taken from start to end for a process Throughput: Number of computations per second (#/second)

Comparing CPU Performance:

Introduction to Caches:

Locality Principle: Which of these are not good examples of locality? Rained 3 times today, likely to rain again. Ate dinner at 7pm last week, probably will eat dinner around 7pm this week It was New Years Eve yesterday, probably it will be new years eve today. Things that will happen soon are likely to be close to things that just happened.

Memory Locality:

Accessed Address X recently Likely to Access X again soon Likely to Access address close to X too

Temporal & Spatial Locality Implementation: for (j = 0; j < 1000 ; j++) print arr[j]

Locality and Data Access: Library : Repository to store data, large but slow to access Library Accesses have temporal and spatial locality A student 1. Will go to library find information, go home 2. Borrow the book 3. Take all books and build a library at home

Cache Lookups: Fast Small Not Everything will fit Access: Cache Hit : Found in the cache  FAST  Cache Miss : Not in Cache, Access RAM, slow memory  : Copy this location to Cache

Cache Performance: Miss Rate  Should be low; large and/or smart cache Average Memory Access Time (AMAT) AMAT = HIT TIME + MISS RATE x MISS PENALTY Hit Time  Should be low; small and fast cache Miss Rate  Should be low; large and/or smart cache Miss Penalty  Main Memory Access Time, Large (10-100s cycles) MISS TIME = HIT TIME + MISS PENALTY (RAM Access time when Cache Miss)

Cache Size in Real Processors: Complication : Several Caches in the Processor L1 Cache  Directly service all RD/WR requests from Processor Size: 16-64 KB  Large enough to get ~ 90% hit rate  Small enough to hit in 1 – 3 cycles

Cache Organization: How to determine HIT or MISS ? How to determine what to kick out ? Address from Data Has to be large enough to satisfy spatial locality, if more than 1 block needs to be replaced when cache miss. HIT DATA (Bytes in each entry) Block Size / Line Size 32 to 128 bytes Block size can’t be as large as 1 KB, as precious Cache memory will remain unused.

Blocks in Cache and Main Memory: A line is a Cache slot where a Memory block can effectively fit. 4 8 12 BLOCK 16 20 24 28 32 36 40 44 LINE

Block Offset and Block Number: 31-------------------------4 3---------0 Block Block Number Offset Cache Data Block # tells which block is tried to be found in Cache. Once we find the Block #, use Block offset to get the correct data Block Size = 16 bytes; 2^4

Cache Block Number Quiz: 32 Byte Block Size 16 bit address created by the Processor 1111 0000 1010 0101 What is the block number corresponding to the above address? What is the block offset?

Cache Tag(Compares data b/w Block & Cache): Cache Tag has block # tells which block is in Cache Data # matches Line in Cache; determines which line is present in Cache. Compare Block# with each Tag. Cache Hit if match produces 1. Thereafter, the offset will tell which line contains the data to be supplied to the Processor. Cache Data Cache Tag Block # in cache = = = 1 = In Cache Miss, the Data is put in Cache and Block # is put in the corresponding Tag. Processor Generated Address Block # Offset

Hit  (Tag == Block#) and Valid = 1 Valid Bit: During Boot up, no data from Cache needed. Garbage Data accessed if Memory Block and TAG match. Cache Data Garbage Data not brought from RAM. Cache TAG 000000 (Initial) VALID Any initial value at the Cache Tag will be problematic, not just zero. Therefore Valid bit = 0 Hit  (Tag == Block#) and Valid = 1 0X 0000 0000 001C

Types of Caches: Fully Associative : Any block can be in any Cache Line, N lines with N comparisons (Extreme flexible form of Set Associative) Set Associative: N Lines where a block can be (Middle Ground) Direct Mapped: A block can go into 1 line (Extreme rigid form of Set Associative)

Direct Mapped Cache: B Memory 1 2 3 4 5 6 B Cache 1 2 3 1 2 3 4 5 6 B Cache 1 2 3 Each block of Memory can have multiple locations, Blocks match the lines sequentially. Offset: Where is the data in cache block, if block found Index: Where in the Cache, block can be found (2 bit) Processor Generated Address Block # Index Block Offset TAG

Adv./Disadvantages of Direct Mapped Cache: Looks only in one place, 1:1 mapping:  Fast: Since one location checked, less traffic, Hit Time (good)  Cheap: Less complex design, one tag and valid bit comparator  Energy Efficient: Lesser Power Dissipation due to smaller design Blocks must go in one place:  Frequent accesses to A B A B which map to same place in cache  Simultaneous kicking out of A & B. Conflict over one spot  Hence, suffers a high miss rate

Set Associative Caches: N – Way Set Associative  Block can be in one of N lines SET 0 SET 1 SET 2 SET 3 Line 0 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 2 Way Set Associative, N = 2, 2 lines / block Few bits of block address allocated for which set the block will go Within a set, there are 2 lines which can contain the block

Fully Associative Cache: No index bits, as destination can be in any of the cache. TAG Offset

Offset = log2(block size) Cache Summary: Direct Mapped  1 Way Set Associative Fully Associative  N Way Set Associative, N = # of lines, no sets TAG INDEX OFFSET Index = log2(sets) Offset = log2(block size)

Cache Replacement: Cache Miss during a full set  Need a new block in set Which block to kick out?  Random  FIFO: Kick out which has been in the longest  LRU : Kick out block not been used for the longest

Implementing LRU : Implements Locality Maintaining count is complicated. For N – way set associative Cache, we have N counters with size log2(N) – bit Counters. Here we have 4, 2–bit counters to count from 0 to 3. Block TAG VALID LRU Counter A 0 3 2 1 0 B 1 0 3 2 1 C 2 1 0 3 2 D 3 2 1 0 3 E 3 2 1 1 B 0 3 2 3 C 1 0 0 0 D 2 1 3 2 Cost: N log2(N) Counters Energy: Change N Counters on each access (even Cache hits)

Write Policy of Caches: Do we insert blocks we write (Write Miss) ?  Write Allocate: Bring block into cache (helps locality RD/WR)  No Write Allocate: Do not bring block into cache Do we write just to Cache or also to Memory ?  Write Through: Update Memory immediately  Write Back: Write to Cache, only write to RAM when Cache block replaced (High Locality WR will only update Cache frequently)

Write Back Caches: Dirty Bit  1  Block is dirty (need to write back on replacement) Dirty Bit  0  Block is clean (not written since brought from RAM How to we know? Add a dirty bit to Cache Blocks we didn’t write, when replaced  No need to write to RAM Blocks we did write, when replaced  Write to Memory (RAM)