Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Slides:



Advertisements
Similar presentations
Kwangok Jeong and Andrew B. Kahng UCSD VLSI CAD Laboratory
Advertisements

VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
COMPARISON OF ADAPTIVE VOLTAGE/FREQUENCY SCALING AND ASYNCHRONOUS PROCESSOR ARCHITECTURES FOR NEURAL SPIKE SORTING J. Leverett A. Pratt R. Hochman May.
Twin Logic Gates – Improved Logic Reliability by Redundancy concerning Gate Oxide Breakdown Hagen Sämrow, Claas Cornelius, Frank Sill, Andreas Tockhorn,
1 A Lithography-friendly Structured ASIC Design Approach By: Salman Goplani* Rajesh Garg # Sunil P Khatri # Mosong Cheng # * National Instruments, Austin,
Puneet Sharma and Puneet Gupta Prof. Andrew B. Kahng Prof. Dennis Sylvester System-Level Living Roadmap Annual Review, Sept Basic Ideas Gate-length.
A High-Gain High-Speed Low-Power Class-AB Operational Amplifier Hassan Sarbishaei Tahereh Kahookar Toosi Ehsan Zhian Tabasy Reza Lotfi Integrated Systems.
ECE 201 Circuit Theory I1 Step Response of an RC Circuit + v C (t) - i(t) Close the switch at t = 0 Write KCL at the top node Allow for the possibility.
Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Introduction to CMOS VLSI Design Lecture 21: Scaling and Economics Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,
11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti.
MOS Capacitors MOS capacitors are the basic building blocks of CMOS transistors MOS capacitors distill the basic physics of MOS transistors MOS capacitors.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
EE414 VLSI Design Design Metrics in Design Metrics in VLSI Design [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
TOWARDS AN EARLY DESIGN SPACE EXPLORATION TOOL SET FOR STT-RAM DESIGN Philip Asare and Ben Melton.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
“ Near-Threshold Computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits ” By Ronald G. Dreslinski, Michael Wieckowski, David Blaauw,
Iron Loss Calculation in a Claw-pole Structure
Power Saving at Architectural Level Xiao Xing March 7, 2005.
1 Quarterly Technical Report 1 for Pittsburgh Digital Greenhouse Kyusun Choi The Pennsylvania State University Computer Science and Engineering Department.
Capturing Crosstalk-Induced Waveform for Accurate Static Timing Analysis Masanori Hashimoto, Yuji Yamada, Hidetoshi Onodera Kyoto University.
Power Reduction for FPGA using Multiple Vdd/Vth
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
1 Conservation Cores: Reducing the Energy of Mature Computations Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin,
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Minimum Energy Sub-Threshold CMOS Operation Given Yield Constraints Max Dreo Vincent Luu Julian Warchall.
Supply Voltage Biasing Andy Whetzel and Elena Weinberg University of Virginia.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Basics of Energy & Power Dissipation
Variation-Tolerant Circuits: Circuit Solutions and Techniques Jim Tschanz, Keith Bowman, and Vivek De Microprocessor Technology Lab Intel Corporation,
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Dynamic Logic Dynamic Circuits will be introduced and their performance in terms of power, area, delay, energy and AT2 will be reviewed. We will review.
0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
SPI-07 – May 14, 2007 Spice-Accurate SystemC Macromodels of Noisy on-Chip Communication Channels Alessandro Bogliolo University.
Joshua L. Garrett Digital Circuits Design GroupUniversity of California, Berkeley Compact DSM MOS Modeling for Energy/Delay Estimation Joshua Garrett,
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
UTB SOI for LER/RDF EECS Min Hee Cho. Outline  Introduction  LER (Line Edge Roughness)  RDF (Random Dopant Fluctuation)  Variation  Solution – UTB.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
CS203 – Advanced Computer Architecture
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
A Novel, Highly SEU Tolerant Digital Circuit Design Approach By: Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
M. Atef, Hong Chen, and H. Zimmermann Vienna University of Technology
EE 653: Group #3 Impact of Drowsy Caches on SER Arjun Bir Singh Mohammad Abdel-Majeed Sameer G Kulkarni.
Power-Optimal Pipelining in Deep Submicron Technology
20-NM CMOS DESIGN.
SIMD Lane Decoupling Improved Timing-Error Resilience
Reactive Clocks with Variability-Tracking Jitter
Circuits and Interconnects In Aggressively Scaled CMOS
Conservation Cores: Reducing the Energy of Mature Computations
Circuit Design Techniques for Low Power DSPs
Impact of Parameter Variations on Multi-core chips
R.W. Mann and N. George ECE632 Dec. 2, 2008
EE216A – Fall 2010 Design of VLSI Circuits and Systems
Presentation transcript:

Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

Power Wall The end of Classical Scaling. – Vdd: almost constant – Power density: roughly increase in exponential – Utilization: roughly decrease in exponential We can fabricate more cores than we can power up * From Venkatesh, et. al. ASPLOS’10 Dark Silicon 2Liang Wang, ECE6332 Final

Near-threshold Cores (NVt. Cores) Pros – Low power per-core. – More cores per-chip. Limitations – Low per-core frequency, reducing throughput gains from parallelization. – Variations, harmful for performance and functionality. Will NVt. cores be a viable solution to push down the power-wall? 3Liang Wang, ECE6332 Final

Outline Performance Model Analyses and Results Conclusion 4Liang Wang, ECE6332 Final

System Modeling 5Liang Wang, ECE6332 Final Core Area: A Power: P Area: A Power: P Symmetric Multi-core System Number of active cores Amdahl’s Law Application with parallel ration of  A Single core v Area: a Power: p(v) Freq: f(v) Area: a Power: p(v) Freq: f(v) Dynamic Power Static Power Frequency Fitted to circuit sim.

Simulation Setup Circuit – A single inverter – Ripple carry Adder (32bits, 16bits, 8bits, and 4bits) Technology Library – A modified version of Predictive Technology Model (PTM) Technology Nodes – 45nm, 32nm, 22nm, 16nm Process Variants – HKMGS: High-performance High-K Metal Gate and Stress effect. – LP: Low-power process CAD Tools – RC Compiler – Spectre driven by Ocean Liang Wang, ECE6332 Final6

Voltage-Frequency Scaling 7Liang Wang, ECE6332 Final ~8x ~400x ~15x ~10 3 x LP has much larger frequency drop-down comparing to HP with the same change in vdd 16nm has larger frequency drop-down comparing to 45nm With the same change in vdd

Design space exploration (Area) Liang Wang, ECE6332 Final8 45nm, HKMGS, IO cores, 100w,  =0.99 saturating Peak is capped by total area 2x Peak from 200 to 6.4K

Liang Wang, ECE6332 Final9 Cross-technology study 500mm 2 80W 500mm 2 80W 400mm 2 100W 400mm 2 100W

Compare to Dark Silicon Liang Wang, ECE6332 Final10 NVt. cores alleviate the issue of low utilization. NVt. cores has better performance. (up to 2x) 500mm 2 80W HKMGS 500mm 2 80W HKMGS Available cores on-chip

Variation NVt. cores are very sensitive to variations – Functionality. (ratioed circuits) – Performance. (focused in this project) Monte-Carlo simulation – Performed on every VDD setups – 100 iterations per VDD – Process and mismatch Liang Wang, ECE6332 Final11

Voltage-Frequency Scaling Revisited Liang Wang, ECE6332 Final12 HKMGS – Up to 5x slow down LP – Up to 10x slow down HKMGS – Up to 10x slow down LP – Up to 100x slow down

Impact of Variation 13Liang Wang, ECE6332 Final 400mm 2, 100W, IO Lower Utilization Worse Perf. Flatten Vdd

Conclusion In terms of performance – Simple core (IO) is better. – HP process (HKMGS) is better. Lowering VDD reduces dark silicon, improves throughput. Vulnerable to process variation. 14Liang Wang, ECE6332 Final