CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate performance of this implementation – To be able to state important assumptions of this performance model
CPU Support for Multithreading Data Cache Fetch Logic Decode LogicFetch LogicExec LogicFetch LogicMem LogicWrite Logic Inst CachePC A PC B VA Mapping A VA Mapping B Address Translation GPRs A GPRs B Design Issue: when to switch threads
Coarse-Grain Multithreading Switch Thread on “expensive” operation: – E.g. I-cache miss – E.g. D-cache miss Some are easier than others!
Switch Threads on Icache miss Inst aIFIDEXMEMWB Inst bIFIDEXMEMWB Inst cIF MISSIDEXMEMWB Inst dIFIDEXMEM Inst eIFIDEX Inst fIFID Inst X Inst Y Inst Z ----
Performance of Coarse Grain Assume (conservatively) – 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks) – 1 i-cache miss per 100 instructions – 1 instruction per clock otherwise Then, time to execute 100 instructions without multithreading – clock cycles – Inst per Clock = 100 / 120 = With multithreading: time to exec 100 instructions: – 100 [+ 1] – Inst per Clock = 100 / 101 =
Switch Threads on Dcache miss Inst aIFIDEXM-MissWB Inst bIFIDEXMEMWB Inst cIFIDEXMEMWB Inst dIFIDEXMEM Inst eIFIDEX Inst fIFID MISS Inst X Inst Y Performance: similar calculation (STATE ASSUMPTIONS!) Where to restart after memory cycle? I suggest instruction “a” – why? Abort these
Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction stream to recover Good to compensate for infrequent, but expensive pipeline disruption
But… Performance problems with multithreading? a)……………….. b)……………….. c)………………..