Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Similar presentations


Presentation on theme: "Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after."— Presentation transcript:

1 Processor Level Parallelism

2 Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after certain point

3 ILP Instruction Level Parallelism – Ability to run multiple instructions at the same time

4 Superscalar Superscalar : capable of running multiple instructions at a time – Multiple execution units Widen slowest part of pipeline

5 Superscalar Multi-issue : Start multiple instructions per clock – Parallel pipes

6 Superscalar Multi-issue pipeline feeding multiple execution units

7 Superscalar Issue: Dependency issues just got MUCH harder…

8 Superscalar Pro/Con Good – The hardware solves everything: Hardware solves scheduling/registers/etc… Compiler can still help matters – Binary compatibility New hardware issues old instructions in a more efficient way Bad – Complex hardware – Limit to scale

9 VLIW VLIW : Very Large Instruction Word – One instruction contains multiple ops

10 VLIW Instructions VERY large – 240 bits? – Wasted space addressed by bundles No dependencies within bundle

11 Who does work? Compiler assembles long instructions – Reorders at compile time Compiler has more time, information

12 VLIW Uses Itanium : – EPIC : Explicitly Parallel Computing – 3 instruction bundles

13 VLIW Pro/Con Good – Simple hardware Add new functional units with no new scheduling hardware – Better optimization in compiler Bad – Binary compatibility : compiler builds for one specific hardware – Good compilers are HARD to write

14 ARM 15 Modern CPU:

15 Processor Parallelism Process Parallelism : Run multiple instruction streams simultaneously

16 Process vs Thread Process : Program – Own memory space – Has at least one thread

17 Process vs Thread Thread : Instruction sequence – Own registers/stack – Share memory with other threads in process

18 Threaded Code Demo…

19 Context Switching Four threads running in 4-wide pipeline – Can't always fill all 4 issue slots – Have bubbles from memory access, page faults, etc…

20 Context Switching Threads often have bubbles…

21 Multithreading Multithreading Alternate threads to maximize hardware use – Course : run until stall, then switch – Fine : switch every cycle – Either one needs extra hardware

22 Multithreading Superscalar A 2-instruction wide pipeline with multithreading: – Still only one process per cycle Fine grainedCourse grained

23 SMT SMT : Simultaneous Multithreading – AKA Hyperthreading Issue ops from multiple threads in one cycle Maximize use of functional units – But need to track registers each instruction goes with…

24 SMT Challenges Resources must be duplicated or split – Split too thin hurts performance… – Duplicate everything and you aren't maximizing use of hardware…

25 Intel vs AMD Variations on SMT

26 Getting Faster Pipelining helps to a point Superscalar/VLIW helps to a point SMT helps a bit Chips getting faster

27 Getting Faster

28 Power Density Prediction circa 2000 Core 2 Adapted from UC Berkeley "The Beauty and Joy of Computing"

29 Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"

30 Moore's Law Related Curves Adapted from UC Berkeley "The Beauty and Joy of Computing"

31 Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"


Download ppt "Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after."

Similar presentations


Ads by Google