Download presentation
Presentation is loading. Please wait.
2
Review CPSC 321 Andreas Klappenecker
3
Announcements Tuesday, November 30, midterm exam
4
Cache Placement strategies direct mapped fully associative set-associative Replacement strategies random FIFO LRU
5
Mapping: address modulo the number of blocks in the cache, x -> x mod B Direct Mapped Cache
6
Set Associative Caches Each block maps to a unique set, the block can be placed into any element of that set, Position is given by (Block number) modulo (# of sets in cache) If the sets contain n elements, then the cache is called n-way set associative
7
Cache with 1024=2 10 words tag from cache is compared against upper portion of the address If tag=upper 20 bits and valid bit is set, then we have a cache hit otherwise it is a cache miss What kind of locality are we taking advantage of? Direct Mapped Cache The index is determined by address mod 1024 Byte offset
8
Taking advantage of spatial locality: Direct Mapped Cache Block offset
9
Address Determination reconstruction of the memory address = tag bits || set index bits || block offset || byte offset Example: 32 bit words, cache capacity 2^12 = 4096 words, blocks of 8 words, direct mapped byte offset = 2 bits, block offset = 3 bits, set index bits = 9 bits, tag bits = 18 bits
11
Example Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes. How many bits are needed to realize a direct mapped cache? 8 KByte = 2K words = 512 blocks = 2^9 blocks direct mapped => # index bits = log(2^9)=9. 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits = number of blocks x (bits per block + tag + valid bit) How many bits are needed to realize a 8-way set associative cache? Number of tag bits increase by 3. Why?
12
Typical Questions Show the evolution of a cache Determine the number of bits needed in an implementation of a cache Know the placement and replacement strategies Be able to design a cache according to specifications Determine the number of cache misses Measure cache performance
13
Typical Questions What kind of placement is typically used in virtual memory systems? What is a translation lookaside buffer? Why is a TLB used?
14
Pages: virtual memory blocks Page faults: if data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through takes too long so we use writeback Example: page size 2 12 =4KB; 2 18 physical pages; main memory <= 1GB; virtual memory <= 4GB
15
Page Faults Incredible high penalty for a page fault Reduce number of page faults by optimizing page placement Use fully associative placement full search of pages is impractical pages are located by a full table that indexes the memory, called the page table the page table resides within the memory
16
Page Tables The page table maps each page to either a page in main memory or to a page stored on disk
17
Page Tables
18
Making Memory Access Fast Page tables slow us down Memory access will take at least twice as long access page table in memory access page What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
19
Making Address Translation Fast A cache for address translations: translation lookaside buffer
20
MIPS Processor and Variations
21
Datapath for MIPS instructions Note the seven control signals!
22
Single Cycle Datapath
23
Pipelined Version
24
Obstacles to Pipelining Structural Hazards hardware cannot support the combination of instructions in the same clock cycle Control Hazards need to make decision based on results of one instruction while other is still executing Data Hazards instruction depends on results of instruction still in pipeline
25
Control Hazards Resolution (for branch) Stall pipeline predict result delayed branch
26
Stall on Branch Assume that all branch computations are done in stage 2 Delay by one cycle to wait for the result
27
Branch Prediction Predict branch result For example, predict always that branch is not taken (e.g. reasonable for while instructions) if choice is correct, then pipeline runs at full speed if choice is incorrect, then pipeline stalls
28
Branch Prediction
29
Delayed Branch
30
Data Hazards A data hazard results if an instruction depends on the result of a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 // $s0 to be determined These dependencies happen often, so it is not possible to avoid them completely Use forwarding to get missing data from internal resources once available
31
Forwarding add $s0, $t0, $t1 sub $t2, $s0, $t3
34
Typical Questions Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards. Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).
35
Example add $1, $2, $3 _ _ _ _ _ add $4, $5, $6 _ _ _ _ _ add $7, $8, $9 _ _ _ _ _ add $10, $11, $12 _ _ _ _ _ add $13, $14, $1 _ _ _ _ _ (data arrives early OK) add $15, $16, $7 _ _ _ _ _ (data arrives on time OK) add $17, $18, $13 _ _ _ _ _ (uh, oh) add $19, $20, $17 _ _ _ _ _ (uh, oh)
36
Verilog
37
Mixed Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.