CMP Design Space Exploration Subject to Physical Constraints Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, Kevin Skadron HPCA’06 01/27/2010.

Slides:



Advertisements
Similar presentations
Exploring the Potential of CMP Core Count Management on Data Center Energy Savings Ozlem Bilgir * Margaret Martonosi * Qiang Wu * Princeton University.
Advertisements

Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi.
Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas.
Performance, Area and Bandwidth Implications on Large-Scale CMP Cache Design Li Zhao, Ravi Iyer, Srihari Makineni, Jaideep Moses, Ramesh Illikkal, Don.
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Ensuring Robustness via Early- Stage Formal Verification Multicore Power Management: Anita Lungu *, Pradip Bose **, Daniel Sorin *, Steven German **, Geert.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Configurational Workload Characterization Hashem H. Najaf-abadi Eric Rotenberg.
Energy-efficient Cluster Computing with FAWN: Workloads and Implications Vijay Vasudevan, David Andersen, Michael Kaminsky*, Lawrence Tan, Jason Franklin,
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Yefu Wang and Kai Ma. Project Goals and Assumptions Control power consumption of multi-core CPU by CPU frequency scaling Assumptions: Each core can be.
Performance and Energy Bounds for Multimedia Applications on Dual-processor Power-aware SoC Platforms Weng-Fai WONG 黄荣辉 Dept. of Computer Science National.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
The many-core architecture 1. The System One clock Scheduler (ideal) distributes tasks to the Cores according to a task map Cores 256 simple RISC Cores,
Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
Tile Size Selection for Low-Power Tile-based Architectures Michael Brown.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Erkan Çetiner. Outline Introduction Related Works Modeling Methodology Baseline Results DTM Techniques Conclusions.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Multi-Core Architectures
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
1 Some Limits of Power Delivery in the Multicore Era Runjie Zhang, Brett H. Meyer, Wei Huang, Kevin Skadron and Mircea R. Stan University of Virginia,
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
PPEP: online Performance, power, and energy prediction framework
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Dept. of Electrical & Computer Engineering Self-Morphing Cores for Higher Power Efficiency and Improved Resilience Nithesh Kurella, Sudarshan Srinivasan.
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone.
Sandeep Navada © 2013 A Unified View of Non-monotonic Core Selection and Application Steering in Heterogeneous Chip Multiprocessors Sandeep Navada, Niket.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
CSE 591: Energy-Efficient Computing Lecture 3 SPEED: processor Anshul Gandhi 347, CS building
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware WU DI NOV. 3, 2015.
Core Architecture Optimization for Heterogeneous CMPs R. Kumar, D. M. Tullsen, and N.P. Jouppi İlker YILDIRIM
Power-Optimal Pipelining in Deep Submicron Technology
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin
Zhichun Zhu Zhao Zhang ECE Department ECE Department
Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
ISPASS th April Santa Rosa, California
Impact of Parameter Variations on Multi-core chips
Die Stacking (3D) Microarchitecture -- from Intel Corporation
CS533 Concepts of Operating Systems Class 18
A Case for Interconnect-Aware Architectures
CMP Design Choices Finding Parameters that Impact CMP Performance
Presentation transcript:

CMP Design Space Exploration Subject to Physical Constraints Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, Kevin Skadron HPCA’06 01/27/2010

Issues Power and thermal issues are critical to architectural design Design space exploration under physical constraints –core count, pipeline depth, superscalar width, L2 cache, and voltage and frequency, under area and thermal constraints Prior work –exclusively on performance or on single-core

Contributions Various new observations for the CMP design given the physical constraints Experiment methodology which largely reduces the cost of design space exploration

Approach There are so many design parameters to optimize and co-optimize In this paper, several methods are used – Modeling and approximation Performance, power and area scaling Temperature – Decoupled core and interconnect/cache simulations. Simulation infrastructures are modular – Simpoint for representative simulation points

Approach Modeling –Formulas to model the power and performance scaling and area for pipeline width and depth –Temperature - at the granularity of core Decoupled Simulation –Use IBM’s Turnandot/PowerTimer to generate L2 cache-access traces – one time cost –Feed the traces to Zauber, a cache simulator. –Interpolation

n

Approaches DVFS Workloads –SPEC 2000 –CPU bound and memory bound Constraints –200 + LR+ MEMORY (Area + Thermal + CPU/Memory) Performance and power/performance efficiency

Results Without constraints CPU-bound benchmarks favor deeper pipelines Memory-bound benchmarks favor shallower pipelines

With Area Constraints To meet the area constraints, –Workloads Decrease the cache size for CPU-bound workloads Decrease the number of cores for memory-bound workloads – Pipeline dimensions Shifting to narrower widths provides greater area impact CPU-bound and memory-bound workloads have different, incompatible optima

Results Optimal Configurations with Varying Pipeline Width, Fixed Depth (18FO4)

Results Optimal Configurations with Varying Pipeline Depth, Fixed Width (4D)

With Thermal Constraints To meet the thermal constraints –Decrease the cache size for CPU-bound workloads –Decrease the number of cores for Memory- bound workloads

Thermal Constraints Thermal constraints exert great influence on the optimal design configurations Thermal constraints should be considered early in the design process

Conclusions Joint optimization across multiple design variables is necessary Thermal constraints appear to dominate other physical constraints and tend to favor shallower pipelines and narrower cores