Download presentation
Presentation is loading. Please wait.
Published byNoah Gilmore Modified over 9 years ago
1
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU
2
SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism
3
SYNAR Systems Networking and Architecture Group Caches Level 1 / Level 2 / Level 3 Instruction/Data or unified
4
SYNAR Systems Networking and Architecture Group Direct-Mapped Cache Line size = 32 bytes Cache eviction
5
SYNAR Systems Networking and Architecture Group Set-Associative Cache 4-way set associative cache The data can go into any of the four locations When the entire set is full, which line should we replace? LRU – least recently used (LRU stack)
6
SYNAR Systems Networking and Architecture Group Cache Hit/Miss Cache hit – the data is found in the cache Cache miss – the data is not in the cache Miss rate: – misses per instruction – misses per cycle – misses per access (also miss ratio) Hit rate: – the opposite
7
SYNAR Systems Networking and Architecture Group Cache Miss Latency How long you have to wait if you miss in the cache Miss in L1 L2 latency (~20 cycles) Miss in L2 memory latency (~300 cycles) (if there is no L3)
8
SYNAR Systems Networking and Architecture Group Writing in Cache Write through – write directly to memory Write back – write to memory later, when the line is evicted
9
SYNAR Systems Networking and Architecture Group Caches on Multiprocessor Systems Bus cache memory cache © Herlihy-Shavit 2007
10
SYNAR Systems Networking and Architecture Group Processor Issues Load Request Bus cache memory cache data © Herlihy-Shavit 2007
11
SYNAR Systems Networking and Architecture Group Another Processor Issues Load Request Bus cache memory cache data Bus I got data data Bus I want data © Herlihy-Shavit 2007
12
SYNAR Systems Networking and Architecture Group memory Bus Processor Modifies Data cache data Now other copies are invalid data © Herlihy-Shavit 2007
13
SYNAR Systems Networking and Architecture Group Send Invalidation Message to Others memory Bus cache data Invalidate ! Bus Other caches lose read permission No need to change now: other caches can provide valid data © Herlihy-Shavit 2007
14
SYNAR Systems Networking and Architecture Group Processor Asks for Data memory Bus cache data Bus I want data data © Herlihy-Shavit 2007
15
SYNAR Systems Networking and Architecture Group Shared Caches Filled on demand No control over cache shares An aggressive thread can grab a large cache share, hurt others Thread 1 Thread 2 Thread 1 Thread 2
16
SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism
17
SYNAR Systems Networking and Architecture Group Branching and CPU Pipeline CPU pipeline
18
SYNAR Systems Networking and Architecture Group Branching Hurts Pipelining
19
SYNAR Systems Networking and Architecture Group Branch Prediction
20
SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism
21
SYNAR Systems Networking and Architecture Group Out-of-order Execution Modern CPUs are super-scalar They can issue more than one instructions per clock cycle If consecutive instructions depend on each other instruction-level parallelism is limited To keep the processor going at full speed, issue instructions out of order
22
SYNAR Systems Networking and Architecture Group Speculative Execution Out-of-order execution is limited to basic blocks To go beyond basic blocks, use speculative execution
23
SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism
24
SYNAR Systems Networking and Architecture Group Instruction-Level Parallelism Many programs fail to keep processor busy – Code with lots of loads – Code with frequent and unpredictable branches CPU cycles are wasted: power is consumed, no useful work is done Running multiple threads on the chip helps this
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.