CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex.

CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex Ramírez 1,2 Mateo Valero 1,2 1 UPC-Barcelona 2 Barcelona Supercomputing Center

CMP-MSI Feb. 11 th 2007 2 Overview  Introduction  Simulation Methodology  Results  Conclusions

CMP-MSI Feb. 11 th 2007 3 Introduction  As Process Technology advances it is more important what to do with transistors.  Current trend to replicate cores.  Intel: Pentium4, Core Duo, Core 2 Duo, Core 2 Quad  AMD: Opteron Dual-Core, Opteron Quad-Core  IBM: POWER4, POWER5  Sun Microsystems: Niagara T1, Niagara T2

CMP-MSI Feb. 11 th 2007 4 Introduction Power4 (CMP) Power5 (CMP+SMT)  Memory Subsystem (green) spreads over more than half the chip area.

CMP-MSI Feb. 11 th 2007 5 Introduction  Each L1 is connected to each L2 bank with a bus- based interconnection network.

CMP-MSI Feb. 11 th 2007 6 Goal  Is directly applicable prior research in the SMT field in the new CMP+SMT scenario?  NO…we have to revisit well-known SMT ideas.  Instruction Fetch Policy

CMP-MSI Feb. 11 th 2007 7 ICOUNT Fetch ROB

CMP-MSI Feb. 11 th 2007 8 ICOUNT Fetch ROB L2 miss FETCH Stalled  Processor’s resources balanced between running threads.  All resources devoted to blue thread unused until L2 miss resolution.

CMP-MSI Feb. 11 th 2007 9 FLUSH Fetch ROB L2 miss  All resources devoted to the pending instructions of the blue thread are freed. FLUSH Triggered

CMP-MSI Feb. 11 th 2007 10 FLUSH Fetch ROB L2 miss  Freed resources allow additional forward progress.  L2 miss late detection  L2 miss prediction. Thread Stalled

CMP-MSI Feb. 11 th 2007 11 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More pressure on both: Interconnection Network Shared L2 banks

CMP-MSI Feb. 11 th 2007 12 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More Unpredictable L2 Access Latency - BAD for FLUSH

CMP-MSI Feb. 11 th 2007 14 Simulation Methodology  Trace driven SMT simulator derived from SMTsim.  C2T2, C3T2, C4T2 multicore configurations. (CXTY, where X= Num. Cores and Y= Num. Threads/Core) I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core Core Details (* per thread)

CMP-MSI Feb. 11 th 2007 15 Simulation Methodology  Instruction Fetch Policies:  ICOUNT  FLUSH  Workload classified per type:  ILP  All threads have good memory behavior.  MEM  All threads have bad memory behavior.  MIX  Mixes both types of threads.

CMP-MSI Feb. 11 th 2007 17 Results : Single-Core (2 threads)  FLUSH yields 22% average speedup over ICOUNT, in MIX workloads.  Mainly on MEM/MIX workloads

CMP-MSI Feb. 11 th 2007 18 Results : Multi-Core (2 threads/core)  FLUSH drops to 9% average slowdown over ICOUNT in a four-cored multicore. + Cores  - Speedup

CMP-MSI Feb. 11 th 2007 19 Results : L2 Hits Latency on Multi-Core +Cores  +latency +dispersion L2 hit latency (cycles)

CMP-MSI Feb. 11 th 2007 20 Results : L2 miss prediction  In this four-cored example, the best choice is predicting L2 miss after 90 cycles.

CMP-MSI Feb. 11 th 2007 21 Results : L2 miss prediction  But, in this other four-cored example the best choice is not to predict L2 miss.

CMP-MSI Feb. 11 th 2007 23 Conclusions  Future high-degree CMPs open new challenging research topics in CMP+SMT cooperation.  The CMP outer cache level and interconnection characteristics may heavily affect SMT intra-core performance.  For example, FLUSH relies on a predictable L2 hit latency, heavily affected in a CMP+SMT scenario.  FLUSH drops from 22% average speedup to 9% average slowdown when moving from single-core to quad-core configuration.

CMP-MSI Feb. 11 th 2007 Thank you Questions?

CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex.

Similar presentations

Presentation on theme: "CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex.

Similar presentations

Presentation on theme: "CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex."— Presentation transcript:

Similar presentations

About project

Feedback