Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP-25212 Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction.

Similar presentations


Presentation on theme: "COMP-25212 Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction."— Presentation transcript:

1 COMP-25212 Multithreading

2 Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction stream to recover Good to compensate for infrequent, but expensive pipeline disruption

3 CS25212 Fine Grain Multithreading Learning Objectives: – To be able to describe a fine grain multithreading implementation – To be able to describe performance characteristics – To be able to describe Simultaneous Multithreading implementations

4 Fine-Grain Multithreading Switch CPU Threads with minimal (zero?) overhead Multithreading now helps resolve fine-grain dependencies (e.g. forwarding?) 1234567 Inst aIFIDEXMEMWB Inst MIFIDEXMEMWB Inst bIFIDEXMEMWB Inst NIFIDEXMEM Inst cIFIDEX Inst PIFID

5 1234567 Inst aIFIDEXMEMWB Inst MIFIDEXMEMWB Inst bIFIDEXMEMWB Inst NIFIDEXMEM Inst cIFIDEX Inst PIFID Fine Grain Multithreading What about cache misses? This has the advantage of simplicity 4567 M MISS EXMEMWB ID IFIDEXMEM IFID 4567 M MISSMiss WB EXMEMWB ID(ID) EX IFIDEXMEM IF(IF) IFID

6 Fine Grain Multithreading Alternatively, if 1 CPU thread stalled, issue every clock from alternate thread 1234567 Inst aIFIDEXM-MISSMiss WB Inst MIFIDEXMEMWB Inst bIFID(ID) EX Inst NIFIDEXMEM Inst PIFIDEX Inst QIFID Fine-grain dependency assistance? Other comments?

7 CPU Support for Fine Grain MT Data Cache Fetch Logic Decode LogicFetch LogicExec LogicFetch LogicMem LogicWrite Logic Inst CachePC A PC B VA Mapping A VA Mapping B Address Translation GPRs A GPRs B

8 PC A PC B Simultaneous Multi-Threading “permit different threads to occupy the same pipeline stage at the same time” This makes most sense with superscalar issue Inst Issue LogicFetch LogicDecode+Registers Inst Cache Data Cache Fetch LogicMem LogicWrite Logic

9 Simultaneous MultiThreading Let’s look simply at instruction issue: 12345678910 Inst aIFIDEXMEMWB Inst bIFIDEXMEMWB Inst MIFIDEXMEMWB Inst NIFIDEXMEMWB Inst cIFIDEXMEMWB Inst PIFIDEXMEMWB Inst QIFIDEXMEMWB Inst dIFIDEXMEMWB Inst eIFIDEXMEMWB Inst RIFIDEXMEMWB

10 SMT issues Asymmetric pipeline stall – One part of pipeline stalls – we want other pipeline to continue Overtaking – want unstalled thread to make progress Pipeline overcrowding – may need extra wide pipeline registers (why?) Existing implementations (mainly) on O-o-O, register renamed architectures

11 How Far Can SMT go? From Intel Core i7 description: From Intel publication 248966-020

12 Core i7 Instruction Issue Logic Alternate clock cycles to alternate CPU threads Out-of-Order engine supports up to 128 uOps

13 SMT: Glimpse Into The Future? Scout threads? – A thread to prefetch memory – reduce cache miss overhead Speculative threads? – Allow a thread to execute speculatively way past branch/jump/call/miss/etc – Needs revised O-o-O logic – Needs and extra memory support – See Transactional Memory

14 CPU Multithreading Summary A cost-effective way of finding additional parallelism for the CPU pipeline Available in x86, Itanium, Power and SPARC (Most architectures) Present additional CPU thread as additional CPU to Operating System Operating Systems Beware!!! (why?)

15 But… Performance problems with multithreading? a)……………….. b)……………….. c)………………..


Download ppt "COMP-25212 Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction."

Similar presentations


Ads by Google