Download presentation
Presentation is loading. Please wait.
1
CMPT 886: Computer Architecture Primer
Dr. Alexandra Fedorova School of Computing Science SFU
2
Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism
3
Caches Level 1 / Level 2 / Level 3 Instruction/Data or unified
4
Direct-Mapped Cache Line size = 32 bytes Cache eviction
5
Set-Associative Cache
4-way set associative cache The data can go into any of the four locations When the entire set is full, which line should we replace? LRU – least recently used (LRU stack)
6
Cache Hit/Miss Cache hit – the data is found in the cache
Cache miss – the data is not in the cache Miss rate: misses per instruction misses per cycle misses per access (also miss ratio) Hit rate: the opposite
7
Cache Miss Latency How long you have to wait if you miss in the cache
Miss in L1 L2 latency (~20 cycles) Miss in L2 memory latency (~300 cycles) (if there is no L3)
8
Writing in Cache Write through – write directly to memory
Write back – write to memory later, when the line is evicted
9
Caches on Multiprocessor Systems
Bus memory © Herlihy-Shavit 2007
10
Processor Issues Load Request
cache cache cache Bus memory data data © Herlihy-Shavit 2007
11
Another Processor Issues Load Request
I want data I got data data data cache cache cache Bus Bus Bus memory data © Herlihy-Shavit 2007
12
Processor Modifies Data
cache data data cache cache Bus Now other copies are invalid memory data © Herlihy-Shavit 2007
13
Send Invalidation Message to Others
Other caches lose read permission Invalidate! data cache data data cache cache Bus Bus No need to change now: other caches can provide valid data memory data © Herlihy-Shavit 2007
14
Processor Asks for Data
I want data cache data data data cache cache Bus Bus memory data © Herlihy-Shavit 2007
15
Shared Caches Filled on demand No control over cache shares
Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 2 Filled on demand No control over cache shares An aggressive thread can grab a large cache share, hurt others
16
Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism
17
Branching and CPU Pipeline
18
Branching Hurts Pipelining
19
Branch Prediction
20
Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism
21
Out-of-order Execution
Modern CPUs are super-scalar They can issue more than one instructions per clock cycle If consecutive instructions depend on each other instruction-level parallelism is limited To keep the processor going at full speed, issue instructions out of order
22
Speculative Execution
Out-of-order execution is limited to basic blocks To go beyond basic blocks, use speculative execution
23
Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism
24
Instruction-Level Parallelism
Many programs fail to keep processor busy Code with lots of loads Code with frequent and unpredictable branches CPU cycles are wasted: power is consumed, no useful work is done Running multiple threads on the chip helps this
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.