Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPT 886: Computer Architecture Primer

Similar presentations


Presentation on theme: "CMPT 886: Computer Architecture Primer"— Presentation transcript:

1 CMPT 886: Computer Architecture Primer
Dr. Alexandra Fedorova School of Computing Science SFU

2 Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism

3 Caches Level 1 / Level 2 / Level 3 Instruction/Data or unified

4 Direct-Mapped Cache Line size = 32 bytes Cache eviction

5 Set-Associative Cache
4-way set associative cache The data can go into any of the four locations When the entire set is full, which line should we replace? LRU – least recently used (LRU stack)

6 Cache Hit/Miss Cache hit – the data is found in the cache
Cache miss – the data is not in the cache Miss rate: misses per instruction misses per cycle misses per access (also miss ratio) Hit rate: the opposite

7 Cache Miss Latency How long you have to wait if you miss in the cache
Miss in L1  L2 latency (~20 cycles) Miss in L2  memory latency (~300 cycles) (if there is no L3)

8 Writing in Cache Write through – write directly to memory
Write back – write to memory later, when the line is evicted

9 Caches on Multiprocessor Systems
Bus memory © Herlihy-Shavit 2007

10 Processor Issues Load Request
cache cache cache Bus memory data data © Herlihy-Shavit 2007

11 Another Processor Issues Load Request
I want data I got data data data cache cache cache Bus Bus Bus memory data © Herlihy-Shavit 2007

12 Processor Modifies Data
cache data data cache cache Bus Now other copies are invalid memory data © Herlihy-Shavit 2007

13 Send Invalidation Message to Others
Other caches lose read permission Invalidate! data cache data data cache cache Bus Bus No need to change now: other caches can provide valid data memory data © Herlihy-Shavit 2007

14 Processor Asks for Data
I want data cache data data data cache cache Bus Bus memory data © Herlihy-Shavit 2007

15 Shared Caches Filled on demand No control over cache shares
Thread 1 Thread 1 Thread 2 Thread 2 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 1 Thread 2 Filled on demand No control over cache shares An aggressive thread can grab a large cache share, hurt others

16 Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism

17 Branching and CPU Pipeline

18 Branching Hurts Pipelining

19 Branch Prediction

20 Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism

21 Out-of-order Execution
Modern CPUs are super-scalar They can issue more than one instructions per clock cycle If consecutive instructions depend on each other instruction-level parallelism is limited To keep the processor going at full speed, issue instructions out of order

22 Speculative Execution
Out-of-order execution is limited to basic blocks To go beyond basic blocks, use speculative execution

23 Outline Caches Branch prediction Out-of-order execution
Instruction Level Parallelism

24 Instruction-Level Parallelism
Many programs fail to keep processor busy Code with lots of loads Code with frequent and unpredictable branches CPU cycles are wasted: power is consumed, no useful work is done Running multiple threads on the chip helps this


Download ppt "CMPT 886: Computer Architecture Primer"

Similar presentations


Ads by Google