Download presentation
Presentation is loading. Please wait.
Published byClifton Sutton Modified over 9 years ago
1
Sam Sandbote CSE 8383 Advanced Computer Architecture Multithreaded Microprocessors and Multiprocessor SoCs Sam Sandbote CSE 8383 Advanced Computer Architecture February 23, 2006
2
Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Technical Drivers 2.Simultaneous Multithreading 3.Alternative Perspectives 4.What is an SoC, Anyway?
3
Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Technical Drivers 2.Simultaneous Multithreading 3.Alternative Perspectives 4.What is an SoC, Anyway?
4
Sam Sandbote CSE 8383 Advanced Computer Architecture Why Consider MP on Chip? The industry does not fundamentally change unless it is forced against a wall We would prefer to scale as we always have, if we could Most programmers are not skilled in the art of parallel programming Confluence of 3 trends has forced the industry to go MP 1.Architectural tricks to speed up single programs have limits Locality of reference (cache size) ILP (superscalar issue width, window size) 2.Building faster clocked logic is getting exponentially harder 3.Process tech still shrinking designs… must “use” that area!
5
Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Technical Drivers 2.Simultaneous Multithreading 3.Alternative Perspectives 4.What is an SoC, Anyway?
6
Sam Sandbote CSE 8383 Advanced Computer Architecture Simultaneous MT Concept: multiplex the execution of 2 or more threads Each maintains its own architectural register state PC, R0-Rn, SP, CC, etc – these are maintained per-thread What happens when we mix two instruction streams? They are guaranteed not to have any data dependencies between them Even for memory addresses! Only register dependencies are considered by out-of-order machines Conceptually, available ILP is doubled We have enough unrelated instructions from a second thread to fill in pipeline bubbles left by the first
7
Sam Sandbote CSE 8383 Advanced Computer Architecture Multithreaded Usage Models Coarse: Application-Level Parallelism Each context corresponds to a process under OS control Make the OS believe two processors exist Still hard to implement: Intel took 2 years to get the bugs out of Pentium 4 HT Fine: Native Multithreaded ISA Constructs fork, join, quit are machine instructions What happens when we fork more threads than hardware supports? Ultra-Fine: Well, basically same as ILP
8
Sam Sandbote CSE 8383 Advanced Computer Architecture What Can Be Shared, at What Cost? Resource Impact on Single- Thread Performance Notes Fetch Bandwidth High Instruction Cache Medium Must support hit-under-miss Branch Predictor State Medium Exec Units None Small and cheap to replicate Data Cache Very High Must support hit-under-miss
9
Sam Sandbote CSE 8383 Advanced Computer Architecture Athlon64 Die Photo
10
Sam Sandbote CSE 8383 Advanced Computer Architecture Proliferation of Context Arbitration Sharing implies: Programmer declares a QoS for a thread upon its startup This QoS must be distributed Arbitration must exist for: Fetch bandwidth Dispatch and/or Issue Cache and/or Branch Predictor Utilization This is a very good area for research External Access BW/latency Additional Pipeline Cycles for Arbitration Introduced This is BAD!
11
Sam Sandbote CSE 8383 Advanced Computer Architecture Why Simultaneous MT, Then? Most efficient in terms of aggregate IPC Consider 4 threads each with a typical instruction mix 20% loads, 10% stores 20% branching 50% in-CPU instructions ADD, MOV, etc. Using 4 superscalar speculative processors 4 processors, each IPC around 0.8 Using a 4-way multithreaded processors 1 (larger) processor with IPC 1.4 or better
12
Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Technical Drivers 2.Simultaneous Multithreading 3.Alternative Perspectives 4.What is an SoC, Anyway?
13
Sam Sandbote CSE 8383 Advanced Computer Architecture Alternative: The Fast Context Switch Argument: Arbitration does not add value and detracts from performance Only fill in the giant bubbles – when a cache misses or when a process is swapped out. Support register state for two or more processes OS may see that 2 processes may be running at the same time No arbitration at all - only executing one at a time These are the first commercial attempts at multithreading, because verification is much easier
14
Sam Sandbote CSE 8383 Advanced Computer Architecture Alternative: Multiprocessor-on-Chip Argument: Benefit of resource sharing does not outweigh cost of performance degradation on a single thread CTO of Intel plans to step-and-repeat a smaller, simpler core such as Centrino Each processor will have independent L1 D$ and I$ May or may not share very large central L2 DRAM controller has long since been integrated For the foreseeable future the model will be tight SMP Processors connected to DRAM controller via their old “front side bus,” which is morphed into a cache-coherent switch fabric.
15
Sam Sandbote CSE 8383 Advanced Computer Architecture Mainstream CPU of 2008/2009 (45nm) L2 P0 I$D$ L2 PCIX PHY 0 DDR2 CTL P2 D$I$ P3 D$I$ P1 I$D$ L2 PCIX PHY 1 L2 X
16
Sam Sandbote CSE 8383 Advanced Computer Architecture Alternative: Heterogeneous MP Argument: Most systems can benefit from having several different types of processors. TI wireless OMAP™ chips are necessarily heterogeneous. Multiple tasks are very different: QoS requirements MIPS requirements Memory bandwidth and access patterns Word width (some custom hardware for deframing) …and some analog sugar sprinkles, too
17
Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Technical Drivers 2.Simultaneous Multithreading 3.Alternative Perspectives 4.What is an SoC, Anyway?
18
Sam Sandbote CSE 8383 Advanced Computer Architecture BYOD – Bring Your Own Definition Embedded memory? Embedded processor? Just a “big ASIC”? Just another buzz-word SSI MSI LSI VLSI ULSI… we’re tired, here. Let’s just call them SoCs.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.