Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg
Dividing the Design A definition Processing Cores All levels of cache Interconnect Ports to Memory and IO
What this Talk is About How to improve performance of a CMP by improving the processing the interconnect is not fully utilized by all workloads if it is, there’s nothing to gain here by enabling exploitation of the full potential of the interconnection
The Provisioning Factor Balance in provisioned resources need ports to the interconnect If the same interconnect is enough for a quad-core, then it was over-provisioned for a dual-core.
The Provisioning Factor Balance in provisioned resources If the design is well provisioned with the same interconnect, then it must have been over-provisioned in the baseline. some technique that boosts general performance
The Underutilization Factor Interconnect not fully utilized by all applications workloads that depend the most on interconnect have a louder say in what a well-provisioned design constitutes
He’s not much for a conversation. But if he was, it would be a conversation about saving you execution time. The One-size-fits-all Factor A single solution has limited performance RISC v. CISC wide v. narrow issuing deep v. shallow pipelining large v. small issue queue large v. small issue queue Changing these trade-offs will improve performance for some workloads and degrade it for others.
The Shrinking Factor Progressively less die area for the cores ` better return on increasing the interconnection resources
The Shrinking Factor Progressively less die area for the cores
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Intel Niagara-1 Intel Pentium IBM Power4 IBM Power5 IBM Power6 IBM Power3 Niagara-2 - Intel 8086 Intel 8088 Intel 80286Intel 486DX Intel Pentium III Intel Core Duo Intel Pentium IV The Shrinking Factor Progressively less die area for the cores
Program 1 Program 2 Single Core Design: Optimized for all workloads The Diversity Factor Can provide diversity in the core designs
Code 2 Code 1 Heterogeneous Cores: Optimized for workload The Diversity Factor Can provide diversity in the core designs
Program 1 Program 2 Core-Selectability: Optimized for workload. Core-Selectability
Selectability
Recap can reduce verification effort by splitting up workload space can improve performance without increasing power density results in a homogeneous design Provisioning FactorOne-size-fits-all FactorShrinking Factor Underutilization Factor Diversity Factor Core-Selectability Port Sharing
Core-Selectability Remains homogeneous at a high level CMP
Empirical Evaluation Based on Fabscalar A library of the synthesized implementation of different configurations for different microarchitectural units of a contemporary superscalar processor.
The selection of cores Core-UCore-ACore-B FETCH STAGES435 DECODE STAGES111 RETIRE STAGES222 ISSUE WIDTH325 ROB SIZE IWINDOW SIZE Clock period.6ns normalized exec. time
On Individual Benchmarks normalized execution time
The Effect of Selectability normalized exec. time
Under Different Task Arrival Patterns Average task turnaround time for (a) normal traffic, and (b) bursty traffic.
Overhead of Reconfigurability Issue-Q sizeWakeup DelaySelect DelayWake & Select DelayReconfig. Delay ns0.54ns1.09ns1.55ns ns0.59ns1.38ns1.89ns ns0.65ns1.62ns2.10ns ns0.76ns2.00ns2.30ns
Implementation of Port Sharing L1 Data Cache core-selection Core A Core B extra switching extra wire (100fF) 26ps added propagation delay
Overhead of Reconfigurability With reconfigurability, change is implemented within a core – with complex coupling between pipeline stages. With Core-Selectability, change is implemented at the core level – with less complex coupling between core and interconnect.
Thank you It’s as if he knows you like to save execution time.