Download presentation
Presentation is loading. Please wait.
Published byAldous Kennedy Modified over 9 years ago
1
CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex Ramírez 1,2 Mateo Valero 1,2 1 UPC-Barcelona 2 Barcelona Supercomputing Center
2
CMP-MSI Feb. 11 th 2007 2 Overview Introduction Simulation Methodology Results Conclusions
3
CMP-MSI Feb. 11 th 2007 3 Introduction As Process Technology advances it is more important what to do with transistors. Current trend to replicate cores. Intel: Pentium4, Core Duo, Core 2 Duo, Core 2 Quad AMD: Opteron Dual-Core, Opteron Quad-Core IBM: POWER4, POWER5 Sun Microsystems: Niagara T1, Niagara T2
4
CMP-MSI Feb. 11 th 2007 4 Introduction Power4 (CMP) Power5 (CMP+SMT) Memory Subsystem (green) spreads over more than half the chip area.
5
CMP-MSI Feb. 11 th 2007 5 Introduction Each L1 is connected to each L2 bank with a bus- based interconnection network.
6
CMP-MSI Feb. 11 th 2007 6 Goal Is directly applicable prior research in the SMT field in the new CMP+SMT scenario? NO…we have to revisit well-known SMT ideas. Instruction Fetch Policy
7
CMP-MSI Feb. 11 th 2007 7 ICOUNT Fetch ROB
8
CMP-MSI Feb. 11 th 2007 8 ICOUNT Fetch ROB L2 miss FETCH Stalled Processor’s resources balanced between running threads. All resources devoted to blue thread unused until L2 miss resolution.
9
CMP-MSI Feb. 11 th 2007 9 FLUSH Fetch ROB L2 miss All resources devoted to the pending instructions of the blue thread are freed. FLUSH Triggered
10
CMP-MSI Feb. 11 th 2007 10 FLUSH Fetch ROB L2 miss Freed resources allow additional forward progress. L2 miss late detection L2 miss prediction. Thread Stalled
11
CMP-MSI Feb. 11 th 2007 11 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More pressure on both: Interconnection Network Shared L2 banks
12
CMP-MSI Feb. 11 th 2007 12 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More Unpredictable L2 Access Latency - BAD for FLUSH
13
CMP-MSI Feb. 11 th 2007 13 Overview Introduction Simulation Methodology Results Conclusions
14
CMP-MSI Feb. 11 th 2007 14 Simulation Methodology Trace driven SMT simulator derived from SMTsim. C2T2, C3T2, C4T2 multicore configurations. (CXTY, where X= Num. Cores and Y= Num. Threads/Core) I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core Core Details (* per thread)
15
CMP-MSI Feb. 11 th 2007 15 Simulation Methodology Instruction Fetch Policies: ICOUNT FLUSH Workload classified per type: ILP All threads have good memory behavior. MEM All threads have bad memory behavior. MIX Mixes both types of threads.
16
CMP-MSI Feb. 11 th 2007 16 Overview Introduction Simulation Methodology Results Conclusions
17
CMP-MSI Feb. 11 th 2007 17 Results : Single-Core (2 threads) FLUSH yields 22% average speedup over ICOUNT, in MIX workloads. Mainly on MEM/MIX workloads
18
CMP-MSI Feb. 11 th 2007 18 Results : Multi-Core (2 threads/core) FLUSH drops to 9% average slowdown over ICOUNT in a four-cored multicore. + Cores - Speedup
19
CMP-MSI Feb. 11 th 2007 19 Results : L2 Hits Latency on Multi-Core +Cores +latency +dispersion L2 hit latency (cycles)
20
CMP-MSI Feb. 11 th 2007 20 Results : L2 miss prediction In this four-cored example, the best choice is predicting L2 miss after 90 cycles.
21
CMP-MSI Feb. 11 th 2007 21 Results : L2 miss prediction But, in this other four-cored example the best choice is not to predict L2 miss.
22
CMP-MSI Feb. 11 th 2007 22 Overview Introduction Simulation Methodology Results Conclusions
23
CMP-MSI Feb. 11 th 2007 23 Conclusions Future high-degree CMPs open new challenging research topics in CMP+SMT cooperation. The CMP outer cache level and interconnection characteristics may heavily affect SMT intra-core performance. For example, FLUSH relies on a predictable L2 hit latency, heavily affected in a CMP+SMT scenario. FLUSH drops from 22% average speedup to 9% average slowdown when moving from single-core to quad-core configuration.
24
CMP-MSI Feb. 11 th 2007 Thank you Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.