Download presentation
Presentation is loading. Please wait.
Published byPauline Wade Modified over 9 years ago
1
Processor Level Parallelism
2
Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after certain point
3
ILP Instruction Level Parallelism – Ability to run multiple instructions at the same time
4
Superscalar Superscalar : capable of running multiple instructions at a time – Multiple execution units Widen slowest part of pipeline
5
Superscalar Multi-issue : Start multiple instructions per clock – Parallel pipes
6
Superscalar Multi-issue pipeline feeding multiple execution units
7
Superscalar Issue: Dependency issues just got MUCH harder…
8
Superscalar Pro/Con Good – The hardware solves everything: Hardware solves scheduling/registers/etc… Compiler can still help matters – Binary compatibility New hardware issues old instructions in a more efficient way Bad – Complex hardware – Limit to scale
9
VLIW VLIW : Very Large Instruction Word – One instruction contains multiple ops
10
VLIW Instructions VERY large – 240 bits? – Wasted space addressed by bundles No dependencies within bundle
11
Who does work? Compiler assembles long instructions – Reorders at compile time Compiler has more time, information
12
VLIW Uses Itanium : – EPIC : Explicitly Parallel Computing – 3 instruction bundles
13
VLIW Pro/Con Good – Simple hardware Add new functional units with no new scheduling hardware – Better optimization in compiler Bad – Binary compatibility : compiler builds for one specific hardware – Good compilers are HARD to write
14
ARM 15 Modern CPU:
15
Processor Parallelism Process Parallelism : Run multiple instruction streams simultaneously
16
Process vs Thread Process : Program – Own memory space – Has at least one thread
17
Process vs Thread Thread : Instruction sequence – Own registers/stack – Share memory with other threads in process
18
Threaded Code Demo…
19
Context Switching Four threads running in 4-wide pipeline – Can't always fill all 4 issue slots – Have bubbles from memory access, page faults, etc…
20
Context Switching Threads often have bubbles…
21
Multithreading Multithreading Alternate threads to maximize hardware use – Course : run until stall, then switch – Fine : switch every cycle – Either one needs extra hardware
22
Multithreading Superscalar A 2-instruction wide pipeline with multithreading: – Still only one process per cycle Fine grainedCourse grained
23
SMT SMT : Simultaneous Multithreading – AKA Hyperthreading Issue ops from multiple threads in one cycle Maximize use of functional units – But need to track registers each instruction goes with…
24
SMT Challenges Resources must be duplicated or split – Split too thin hurts performance… – Duplicate everything and you aren't maximizing use of hardware…
25
Intel vs AMD Variations on SMT
26
Getting Faster Pipelining helps to a point Superscalar/VLIW helps to a point SMT helps a bit Chips getting faster
27
Getting Faster
28
Power Density Prediction circa 2000 Core 2 Adapted from UC Berkeley "The Beauty and Joy of Computing"
29
Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"
30
Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"
31
Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.