Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex.

Similar presentations


Presentation on theme: "CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex."— Presentation transcript:

1 CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex Ramírez 1,2 Mateo Valero 1,2 1 UPC-Barcelona 2 Barcelona Supercomputing Center

2 CMP-MSI Feb. 11 th 2007 2 Overview  Introduction  Simulation Methodology  Results  Conclusions

3 CMP-MSI Feb. 11 th 2007 3 Introduction  As Process Technology advances it is more important what to do with transistors.  Current trend to replicate cores.  Intel: Pentium4, Core Duo, Core 2 Duo, Core 2 Quad  AMD: Opteron Dual-Core, Opteron Quad-Core  IBM: POWER4, POWER5  Sun Microsystems: Niagara T1, Niagara T2

4 CMP-MSI Feb. 11 th 2007 4 Introduction Power4 (CMP) Power5 (CMP+SMT)  Memory Subsystem (green) spreads over more than half the chip area.

5 CMP-MSI Feb. 11 th 2007 5 Introduction  Each L1 is connected to each L2 bank with a bus- based interconnection network.

6 CMP-MSI Feb. 11 th 2007 6 Goal  Is directly applicable prior research in the SMT field in the new CMP+SMT scenario?  NO…we have to revisit well-known SMT ideas.  Instruction Fetch Policy

7 CMP-MSI Feb. 11 th 2007 7 ICOUNT Fetch ROB

8 CMP-MSI Feb. 11 th 2007 8 ICOUNT Fetch ROB L2 miss FETCH Stalled  Processor’s resources balanced between running threads.  All resources devoted to blue thread unused until L2 miss resolution.

9 CMP-MSI Feb. 11 th 2007 9 FLUSH Fetch ROB L2 miss  All resources devoted to the pending instructions of the blue thread are freed. FLUSH Triggered

10 CMP-MSI Feb. 11 th 2007 10 FLUSH Fetch ROB L2 miss  Freed resources allow additional forward progress.  L2 miss late detection  L2 miss prediction. Thread Stalled

11 CMP-MSI Feb. 11 th 2007 11 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More pressure on both: Interconnection Network Shared L2 banks

12 CMP-MSI Feb. 11 th 2007 12 Single vs Multi Core I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core I$D$ Core L2 b0L2 b1L2 b2L2 b3 More Unpredictable L2 Access Latency - BAD for FLUSH

13 CMP-MSI Feb. 11 th 2007 13 Overview  Introduction  Simulation Methodology  Results  Conclusions

14 CMP-MSI Feb. 11 th 2007 14 Simulation Methodology  Trace driven SMT simulator derived from SMTsim.  C2T2, C3T2, C4T2 multicore configurations. (CXTY, where X= Num. Cores and Y= Num. Threads/Core) I$D$ Core L2 b0 I$D$ Core I$D$ Core L2 b1L2 b2L2 b3 I$D$ Core Core Details (* per thread)

15 CMP-MSI Feb. 11 th 2007 15 Simulation Methodology  Instruction Fetch Policies:  ICOUNT  FLUSH  Workload classified per type:  ILP  All threads have good memory behavior.  MEM  All threads have bad memory behavior.  MIX  Mixes both types of threads.

16 CMP-MSI Feb. 11 th 2007 16 Overview  Introduction  Simulation Methodology  Results  Conclusions

17 CMP-MSI Feb. 11 th 2007 17 Results : Single-Core (2 threads)  FLUSH yields 22% average speedup over ICOUNT, in MIX workloads.  Mainly on MEM/MIX workloads

18 CMP-MSI Feb. 11 th 2007 18 Results : Multi-Core (2 threads/core)  FLUSH drops to 9% average slowdown over ICOUNT in a four-cored multicore. + Cores  - Speedup

19 CMP-MSI Feb. 11 th 2007 19 Results : L2 Hits Latency on Multi-Core +Cores  +latency +dispersion L2 hit latency (cycles)

20 CMP-MSI Feb. 11 th 2007 20 Results : L2 miss prediction  In this four-cored example, the best choice is predicting L2 miss after 90 cycles.

21 CMP-MSI Feb. 11 th 2007 21 Results : L2 miss prediction  But, in this other four-cored example the best choice is not to predict L2 miss.

22 CMP-MSI Feb. 11 th 2007 22 Overview  Introduction  Simulation Methodology  Results  Conclusions

23 CMP-MSI Feb. 11 th 2007 23 Conclusions  Future high-degree CMPs open new challenging research topics in CMP+SMT cooperation.  The CMP outer cache level and interconnection characteristics may heavily affect SMT intra-core performance.  For example, FLUSH relies on a predictable L2 hit latency, heavily affected in a CMP+SMT scenario.  FLUSH drops from 22% average speedup to 9% average slowdown when moving from single-core to quad-core configuration.

24 CMP-MSI Feb. 11 th 2007 Thank you Questions?


Download ppt "CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex."

Similar presentations


Ads by Google