Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations


Presentation on theme: "Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

1 Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Dr. Leo Porter Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

2 Fine-Grain Multithreading Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary goal of such an approach? Selection“Best” argument AService each thread equally BHide instruction latencies CReduce replicated resources DExploit idle resources ENone of the above

3 Fine-Grain Multithreading Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary drawback of such an approach? Selection“Best” argument APoor scalability (benefits for 8 threads exceed benefits for 64 threads). BExtra hardware CNeed for large number of threads D1 and 2 E2 and 3

4 Course-Grain Multithreading Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary goal of such an approach? SelectionBest argument AService each thread equally BHide instruction latencies CReduce replicated resources DExploit idle resources ENone of the above

5 Course-Grain Multithreading Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary drawback of such an approach? SelectionBest argument APoor scalability (benefits for 8 threads exceed benefits for 64 threads) BContext switch times are slow CExtra hardware D1 and 2 E2 and 3

6 Context Switch What happens on context switch? –Transfer of register state –Transfer of PC –Draining of the pipeline Additionally: –Warm up caches –Warm up branch predictors

7 Multithreading Issue Width Coarse GrainFine Grain SMT

8 Simultaneous Multithreading 1. More functional units 2. Larger instruction queue 3. Larger reorder buffer 4. Means to differentiate between threads in the instruction queue, regrename, and reorder buffer 5. Ability to fetch from multiple programs SelectionRequired Resources A1, 2, 3, 4, 5 B1, 3, 5 C1, 4, 5 D4, 5 ENone of the above Given a modern out of order processor with register renaming, inst. queue, reorder buffer, etc. – What is REQUIRED to perform speculative multithreading Point is – if you can just fetch from multiple streams – the processor is usually over provisioned anyway

9 Modern OOO Processor FetchDecode Instruction Queue Register Rename INT ALU INT ALU INT ALU FP ALU FP ALU Load Queue Store Queue L1 Reorder Buffer Draw just the need to fetch more insructions

10 SMT vs. early multi-core The argument was between a single aggressive SMT out- of-order processor and a number of simpler processors. At the time – the advantage for the simpler processors was a higher clock rate. The disadvantage for the simpler processors were lack of functional units / in-order execution / smaller caches/ etc.

11 SM vs. MP

12 SMT vs. early CMP SMT – 4 issue, 4 int ALU, 4 FP ALU CMP – 2 cores each 2-issue, 2 int ALU, 2 FP ALUs Say you have 4 threads Say you have 2 threads – one is floating point intense and the other is integer intense Say you have 1 thread Point out single thread drives benchmark tests – no one buys a processor which does worse!

13 Multi-core recently Instruction queues were taking up 20% of a core area for 4-issue, how complex would it be for 8-issue? Simpler hardware does not mean faster CR. Tons of die space. Larger caches weren’t helping performance that much Why not just replicate a single advanced processor (core)?

14 SMT vs. CMP - Revised SMT – 4 issue, 4 int ALU, 4 FP ALU CMP – 2 cores each 4-issue, 4 int ALU, 4 FP ALUs Say you have 4 threads Say you have 2 threads – one is floating point intense and the other is integer intense. Say you have 1 thread….

15 Multi-core Today 4-8 cores per chip. “Multi-core Era” Throughput scales well with the number of cores. Each core is frequently SMT as well (for more throughput) Great when you have 4-8 threads (most of us have a fair number at any given time) What to do when we get 128 cores (“Many core era”)??

16 Multithreading Key Points Simultaneous Multithreading –Inexpensive addition to increase throughput for multiple threads –Enables good throughput for multiple threads –Does not impact single thread performance Single Chip Multiprocessors –ILP wall/Memory Wall/ Power Wall – all point to multi-core –Enables excellent throughput for multiple threads Where do we find all these threads? Field of dreams argument


Download ppt "Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."

Similar presentations


Ads by Google