Exploiting Dark Silicon for Energy Efficiency Nikos Hardavellas Northwestern University, EECS.

Slides:



Advertisements
Similar presentations
Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group.
Advertisements

Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Optimizing Shared Caches in Chip Multiprocessors Samir Sapra Athula Balachandran Ravishankar Krishnaswamy.
Nikos Hardavellas, Northwestern University
1 Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Some Opportunities and Obstacles in Cross-Layer and Cross-Component (Power) Management Onur Mutlu NSF CPOM Workshop, 2/10/2012.
From Yale-45 to Yale-90: Let Us Not Bother the Programmers Guri Sohi University of Wisconsin-Madison Celebrating September 19, 2014.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group.
Presented by Euiwoong Lee Accelerators/Specialization/ Emerging Architectures.
Toward Energy-Efficient Computing Nikos Hardavellas – Parallel Architecture Group Northwestern University.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Dual Voltage Design for Minimum Energy Using Gate Slack Kyungseok Kim and Vishwani D. Agrawal ECE Dept. Auburn University Auburn, AL 36849, USA IEEE ICIT-SSST.
Brain-Implantable Computing Platforms for Emerging Neuroscience Applications Ken Mai Electrical and Computer Engineering Carnegie Mellon University.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Power Management in Multicores Minshu Zhao. Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores.
Veynu Narasiman The University of Texas at Austin Michael Shebanow
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs José A. Joao * M. Aater Suleman * Onur Mutlu ‡ Yale N. Patt * * HPS Research.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Pınar Tözün Anastasia Ailamaki SLICC Self-Assembly of Instruction Cache Collectives for OLTP Workloads Islam Atta Andreas Moshovos.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Η Μικροαρχιτεκτονική Πέθανε. Ζήτω η Μικροαρχιτεκτονική! Η Μικροαρχιτεκτονική Πέθανε. Ζήτω η Μικροαρχιτεκτονική! Christos Kozyrakis Stanford University.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
Nikos Hardavellas – Parallel Architecture Group
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Background Gaussian Elimination Fault Tolerance Single or multiple core failures: Single or multiple core additions: Simultaneous core failures and additions:
Understanding Sources of Inefficiency in General-Purpose Chips R.Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. Lee, S. Richardson, C. Kozyrakis,
Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millenium Norm Jouppi Compaq - WRL Disclaimer: The views expressed herein are the.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Implications of Emerging Hardware Tom Wenisch (University of Michigan) Nikos Hardavellas (Northwestern University) Sangyeun Cho (University of Pittsburgh)
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
Simultaneous Supply, Threshold and Width Optimization for Low-Power CMOS Circuits With an aside on System based shutdown. Gord Allan PhD Candidate ASIC.
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
The Rise of Dark Silicon
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
University of Michigan Electrical Engineering and Computer Science 1 Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
1 Efficient Mixed-Platform Clouds Phillip B. Gibbons, Intel Labs Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai (Intel Labs) Gregory Ganger, David.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
3D-ASICS for X-ray Science
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Computer Architecture: Parallel Task Assignment
Simultaneous Multithreading
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
Energy Efficient Power Distribution on Many-Core SoC
ISCA 2000 Panel Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millennium Moderator: Shubu Mukherjee VSSAD, Alpha Technology Compaq.
Artificial Intelligence: Driving the Next Generation of Chips and Systems Manish Pandey May 13, 2019.
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

Exploiting Dark Silicon for Energy Efficiency Nikos Hardavellas Northwestern University, EECS

© Nikos Hardavellas 2 NSF SEEDM Workshop, May 2011 Technology Scaling Changes the HW Landscape Transistor counts increase exponentially, but… Traditional HW power-aware techniques inadequate (e.g., voltage-freq. scaling) Dark Silicon !!! Can no longer power the entire chip (voltages, cooling do not scale) [Watanabe et al., ISCA’10]

© Nikos Hardavellas 3 NSF SEEDM Workshop, May 2011 The Rise of Dark Silicon Exponentially-large area is left unutilized (dark). Should we waste it?

© Nikos Hardavellas 4 NSF SEEDM Workshop, May 2011 Design for Dark Silicon Don’t waste it; harness it instead!  Use dark silicon to implement specialized cores Applications cherry-pick few cores, rest of chip is powered off Vast unused area  many cores  likely to find good matches

© Nikos Hardavellas 5 NSF SEEDM Workshop, May x LOWER ENERGY compared to best conventional alternative First-Order Core Specialization Model Modeling of physically-constrained CMPs across technologies Model of specialized cores based on ASIC implementation of H.264:  Implementations on custom HW (ASICs), FPGAs, multicores (CMP)  Wide range of computational motifs, extensively studied Frames per sec Energy per frame (mJ) Performance gap of CMP vs. ASIC Energy gap of CMP vs. ASIC ASIC304 CMP IME x707x FME x468x Intra x157x CABAC x261x [Hameed et al., ISCA’10]

© Nikos Hardavellas 6 NSF SEEDM Workshop, May 2011 Research Directions Which candidates are best for off-loading to specialized cores? What should these cores look like?  Exploit commonalities to avoid core over-specialization  Impact on ISA? What are the appropriate language/compiler/runtime techniques to drive execution migration?  Impact on scheduler? How to restructure software/algorithms for heterogeneity?

© Nikos Hardavellas 7 NSF SEEDM Workshop, May 2011 Thank You! Questions? References: N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward Dark Silicon in Servers. IEEE Micro, Vol. 31, No. 4, July/August N. Hardavellas. Chip multiprocessors for server workloads. PhD thesis, Carnegie Mellon University, Dept. of Computer Science, August N. Hardavellas, M. Ferdman, A. Ailamaki, and B. Falsafi. Power scaling: the ultimate obstacle to 1K-core chips. Technical Report NWU-EECS-10-05, Northwestern University, Evanston, IL, March R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz. Understanding sources of inefficiency in general-purpose chips. In Proc. of ISCA, June E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: does the future include custom logic, FPGAs, and GPGPUs? In Proc. of MICRO, December 2010.