McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.

Slides:



Advertisements
Similar presentations
An International Technology Roadmap for Semiconductors
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
Yefu Wang and Kai Ma. Project Goals and Assumptions Control power consumption of multi-core CPU by CPU frequency scaling Assumptions: Each core can be.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Link Division Multiplexing (LDM) for NoC Links IEEE 2006 LDM Link Division Multiplexing Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion –
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Designing Memory Systems for Tiled Architectures Anshuman Gupta September 18,
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Low Power Techniques in Processor Design
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Low-Power Wireless Sensor Networks
Crossbar switches By Alejandro Ayala. Hardware design Show hardware design of several modern crossbar switches used for multiprocessing system on chip.
Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research.
Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.
1 Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems University.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Enabling Multi-threaded Applications on Hybrid Shared Memory Manycore Architectures Tushar Rawat and Aviral Shrivastava Arizona State University, USA CML.
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
1 Some Limits of Power Delivery in the Multicore Era Runjie Zhang, Brett H. Meyer, Wei Huang, Kevin Skadron and Mircea R. Stan University of Virginia,
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Barroso, Gharachorloo, McNamara, et. Al Proceedings of the 27 th Annual ISCA, June.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Sangyeun Cho and Lei Jin Dept. of Computer Science University of Pittsburgh.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
By Islam Atta Supervised by Dr. Ihab Talkhan
Oct 31 st 2007University of Utah1 Multi-Cores: Architecture/VLSI Perspective The Hardware-Software Relationship: Date or Dump?
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
NoCVision: A Network-on-Chip Dynamic Visualization Solution
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Presented by: Nick Kirchem Feb 13, 2004
Lynn Choi School of Electrical Engineering
Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei
Multi-core processors
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
‘99 ACM/IEEE International Symposium on Computer Architecture
Multi-core processors
Energy Efficient Power Distribution on Many-Core SoC
Seminar Tittles 1-Modeling and Optimization of soft-error reliability of Sequential circuits. 2-Statistical Estimation of Sequential Circuit Activities.
Presentation transcript:

McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09

Motivation “Tools both limit and drive research directions” New demands ◦ Multicore/manycore ◦ Evaluate power, timing and area ◦ Different source of power dissipation ◦ Deep-submicron

Related Work Wattch: ◦ Enabling a tremendous surge in power-related research Orion: ◦ Combined Wattch’s core power model with a router power model CACTI: ◦ First tool in rapid power, area, and timing estimation

What’s wrong with previous work? Wattch ◦ No timing and area models ◦ Only models dynamic power consumption ◦ Use simple linear scaling model Orion2 ◦ No short circuit power or timing ◦ “Incomplete” CACTI ◦ ???

Contributions First integrated power, area, and timing modeling framework Model all three types of power dissipation Complete, integrated solution for multithreaded and multicore processor power Deep-submicron tech. that no longer linear

Overview

Integrated Approach Power Dynamic: Similar to Wattch Short Circuit: IEEE TCAD’00 Leakage: MASTAR & Intel Timing CACTI with extension Area CACTI Empirical modeling for complex logic

Hierarchical Approach

Architecture Level Core Divided into several main units: e.g. IFU, EXU, LSU, OOO issue/dispatch unit NoC Signal links and routers On-Chip Caches Coherent cache Memory Controller 3 main hardware structure Empirical model for physical interface (PHY) Clocking PLL and clock distribution network Empirical model for PLL power

Circuit Level Wires Short wires as one-section Pi-RC model Long wires as a buffered wire model Arrays Based on CACTI with extensions Logic Highly regular: CACTI Less regular: Model from Intel, AMD and Sun Highly customized: Empirical, from Intel and Sun Clock Distribution Network Separate circuit model Global, Domain, and Local

Validation

Validation Average error1 st Contributor Error / % in total pwr Niagara1.47W / 23%74% / 1% Niagara21.87W / 26%47% / 5% Alpha W / 26%45% / 3% Xeon Tulsa4.2W / 17%29% / 3%

Scaling & Clustering New generation: Double # of cores Keep the same micro-architecture

Parameters and Benchmarks

Area and Power

Performance and Efficiency

In-order vs. OOO with M5