Basic Performance Parameters in Computer Architecture:

Basic Performance Parameters in Computer Architecture:

Levels of Transformation:

Good Old Moore’s Law: (Technology vs Architects)
For every months, 2x transistors on the same chip area. Processor Speed Doubles every months Energy Operation halves every months Memory Capacity doubles every 18–24 months

Parameters for Metric and Evaluation:
What does better mean in Computer Architecture? Is the speed (GHz) or the Memory size(GB)? Latency and Throughput are two key performance parameters. Latency: time taken from start to end for a process Throughput: Number of computations per second (#/second)

Comparing CPU Performance:

Introduction to Caches:

Locality Principle: Which of these are not good examples of locality? Rained 3 times today, likely to rain again. Ate dinner at 7pm last week, probably will eat dinner around 7pm this week It was New Years Eve yesterday, probably it will be new years eve today. Things that will happen soon are likely to be close to things that just happened.

Memory Locality:

Accessed Address X recently
Likely to Access X again soon Likely to Access address close to X too

Temporal & Spatial Locality Implementation:
for (j = 0; j < 1000 ; j++) print arr[j]

Locality and Data Access:
Library : Repository to store data, large but slow to access Library Accesses have temporal and spatial locality A student 1. Will go to library find information, go home 2. Borrow the book 3. Take all books and build a library at home

Cache Lookups: Fast Small Not Everything will fit Access:
Cache Hit : Found in the cache  FAST  Cache Miss : Not in Cache, Access RAM, slow memory  : Copy this location to Cache

Cache Performance: Miss Rate  Should be low; large and/or smart cache
Average Memory Access Time (AMAT) AMAT = HIT TIME + MISS RATE x MISS PENALTY Hit Time  Should be low; small and fast cache Miss Rate  Should be low; large and/or smart cache Miss Penalty  Main Memory Access Time, Large (10-100s cycles) MISS TIME = HIT TIME + MISS PENALTY (RAM Access time when Cache Miss)

Cache Size in Real Processors:
Complication : Several Caches in the Processor L1 Cache  Directly service all RD/WR requests from Processor Size: KB  Large enough to get ~ 90% hit rate  Small enough to hit in 1 – 3 cycles

Cache Organization: How to determine HIT or MISS ?
How to determine what to kick out ? Address from Data Has to be large enough to satisfy spatial locality, if more than 1 block needs to be replaced when cache miss. HIT DATA (Bytes in each entry) Block Size / Line Size 32 to 128 bytes Block size can’t be as large as 1 KB, as precious Cache memory will remain unused.

Blocks in Cache and Main Memory:
A line is a Cache slot where a Memory block can effectively fit. 4 8 12 BLOCK 16 20 24 28 32 36 40 44 LINE

Block Offset and Block Number:
Block Block Number Offset Cache Data Block # tells which block is tried to be found in Cache. Once we find the Block #, use Block offset to get the correct data Block Size = 16 bytes; 2^4

Cache Block Number Quiz:
32 Byte Block Size 16 bit address created by the Processor What is the block number corresponding to the above address? What is the block offset?

Cache Tag(Compares data b/w Block & Cache):
Cache Tag has block # tells which block is in Cache Data # matches Line in Cache; determines which line is present in Cache. Compare Block# with each Tag. Cache Hit if match produces 1. Thereafter, the offset will tell which line contains the data to be supplied to the Processor. Cache Data Cache Tag Block # in cache = = = 1 = In Cache Miss, the Data is put in Cache and Block # is put in the corresponding Tag. Processor Generated Address Block # Offset

Hit  (Tag == Block#) and Valid = 1
Valid Bit: During Boot up, no data from Cache needed. Garbage Data accessed if Memory Block and TAG match. Cache Data Garbage Data not brought from RAM. Cache TAG (Initial) VALID Any initial value at the Cache Tag will be problematic, not just zero. Therefore Valid bit = 0 Hit  (Tag == Block#) and Valid = 1 0X C

Types of Caches: Fully Associative : Any block can be in any Cache Line, N lines with N comparisons (Extreme flexible form of Set Associative) Set Associative: N Lines where a block can be (Middle Ground) Direct Mapped: A block can go into 1 line (Extreme rigid form of Set Associative)

Direct Mapped Cache: B Memory 1 2 3 4 5 6 B Cache 1 2 3
1 2 3 4 5 6 B Cache 1 2 3 Each block of Memory can have multiple locations, Blocks match the lines sequentially. Offset: Where is the data in cache block, if block found Index: Where in the Cache, block can be found (2 bit) Processor Generated Address Block # Index Block Offset TAG

Adv./Disadvantages of Direct Mapped Cache:
Looks only in one place, 1:1 mapping:  Fast: Since one location checked, less traffic, Hit Time (good)  Cheap: Less complex design, one tag and valid bit comparator  Energy Efficient: Lesser Power Dissipation due to smaller design Blocks must go in one place:  Frequent accesses to A B A B which map to same place in cache  Simultaneous kicking out of A & B. Conflict over one spot  Hence, suffers a high miss rate

Set Associative Caches:
N – Way Set Associative  Block can be in one of N lines SET 0 SET 1 SET 2 SET 3 Line 0 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 2 Way Set Associative, N = 2, 2 lines / block Few bits of block address allocated for which set the block will go Within a set, there are 2 lines which can contain the block

Fully Associative Cache:
No index bits, as destination can be in any of the cache. TAG Offset

Offset = log2(block size)
Cache Summary: Direct Mapped  1 Way Set Associative Fully Associative  N Way Set Associative, N = # of lines, no sets TAG INDEX OFFSET Index = log2(sets) Offset = log2(block size)

Cache Replacement: Cache Miss during a full set  Need a new block in set Which block to kick out?  Random  FIFO: Kick out which has been in the longest  LRU : Kick out block not been used for the longest

Implementing LRU : Implements Locality
Maintaining count is complicated. For N – way set associative Cache, we have N counters with size log2(N) – bit Counters. Here we have 4, 2–bit counters to count from 0 to 3. Block TAG VALID LRU Counter A B C D E B C D Cost: N log2(N) Counters Energy: Change N Counters on each access (even Cache hits)

Write Policy of Caches:
Do we insert blocks we write (Write Miss) ?  Write Allocate: Bring block into cache (helps locality RD/WR)  No Write Allocate: Do not bring block into cache Do we write just to Cache or also to Memory ?  Write Through: Update Memory immediately  Write Back: Write to Cache, only write to RAM when Cache block replaced (High Locality WR will only update Cache frequently)

Write Back Caches: Dirty Bit  1  Block is dirty (need to write back on replacement) Dirty Bit  0  Block is clean (not written since brought from RAM How to we know? Add a dirty bit to Cache Blocks we didn’t write, when replaced  No need to write to RAM Blocks we did write, when replaced  Write to Memory (RAM)

Basic Performance Parameters in Computer Architecture:

Similar presentations

Presentation on theme: "Basic Performance Parameters in Computer Architecture:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic Performance Parameters in Computer Architecture:

Similar presentations

Presentation on theme: "Basic Performance Parameters in Computer Architecture:"— Presentation transcript:

Similar presentations

About project

Feedback