Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network.

Slides:



Advertisements
Similar presentations
Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique Mohab Anis, Shawki Areibi *, Mohamed Mahmoud.
Advertisements

NTHU-CS VLSI/CAD LAB TH EDA De-Shiuan Chiou Da-Cheng Juan Yu-Ting Chen Shih-Chieh Chang Department of CS, National Tsing Hua University, Taiwan Fine-Grained.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages
Leakage and Dynamic Glitch Power Minimization Using MIP for V th Assignment and Path Balancing Yuanlin Lu and Vishwani D. Agrawal Auburn University ECE.
Predictably Low-Leakage ASIC Design using Leakage-immune Standard Cells Nikhil Jayakumar Sunil P. Khatri University of Colorado at Boulder.
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
Minimal Skew Clock Synthesis Considering Time-Variant Temperature Gradient Hao Yu, Yu Hu, Chun-Chen Liu and Lei He EE Department, UCLA Presented by Yu.
Statistical Full-Chip Leakage Analysis Considering Junction Tunneling Leakage Tao Li Zhiping Yu Institute of Microelectronics Tsinghua University.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
Design of Variable Input Delay Gates for Low Dynamic Power Circuits
August 12, 2005Uppalapati et al.: VDAT'051 Glitch-Free Design of Low Power ASICs Using Customized Resistive Feedthrough Cells 9th VLSI Design & Test Symposium.
Yan Lin, Fei Li and Lei He EE Department, UCLA
Power-Aware Placement
Power Modeling and Architecture Evaluation for FPGA with Novel Circuits for Vdd Programmability Yan Lin, Fei Li and Lei He EE Department, UCLA
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
NTHU-CS VLSI/CAD LAB TH EDA Student : Da-Cheng Juan Advisor : Shih-Chieh Chang Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization.
Off-chip Decoupling Capacitor Allocation for Chip Package Co-Design Hao Yu Berkeley Design Chunta Chu and Lei He EE Department.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
Outline Noise Margins Transient Analysis Delay Estimation
Circuit Simulation Based Obstacle-Aware Steiner Routing Yiyu Shi, Paul Mesa, Hao Yu and Lei He EE Department, UCLA Partially supported by NSF Career Award.
Fuzzy Evolutionary Algorithm for VLSI Placement Sadiq M. SaitHabib YoussefJunaid A. Khan Department of Computer Engineering King Fahd University of Petroleum.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
EE4800 CMOS Digital IC Design & Analysis
Lecture 7: Power.
EE 447 VLSI Design 4: DC and Transient Response1 VLSI Design DC & Transient Response.
RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (
CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.
Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,
The CMOS Inverter Slides adapted from:
MOS Inverter: Static Characteristics
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
ICCAD 2003 Algorithm for Achieving Minimum Energy Consumption in CMOS Circuits Using Multiple Supply and Threshold Voltages at the Module Level Yuvraj.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Pattern Selection based co-design of Floorplan and Power/Ground Network with Wiring Resource Optimization L. Li, Y. Ma, N. Xu, Y. Wang and X. Hong WuHan.
Lecture 12 Review and Sample Exam Questions Professor Lei He EE 201A, Spring 2004
A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work Leakage Power Analysis and Comparison of Deep Submicron Logic Gates Geoff Merrett.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
Ashley Brinker Karen Joseph Mehdi Kabir ECE 6332 – VLSI Fall 2010.
An ASIC Design methodology with Predictably Low Leakage, using Leakage-immune Standard Cells Nikhil Jayakumar, Sunil P Khatri ISLPED’03.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Ho-Lin Chang, Hsiang-Cheng Lai, Tsu-Yun Hsueh, Wei-Kai Cheng, Mely Chen Chi Department of Information and Computer Engineering, CYCU A 3D IC Designs Partitioning.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
Multi-Split-Row Threshold Decoding Implementations for LDPC Codes
Post-Layout Leakage Power Minimization Based on Distributed Sleep Transistor Insertion Pietro Babighian, Luca Benini, Alberto Macii, Enrico Macii ISLPED’04.
Solid-State Devices & Circuits
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,
1 Timing Closure and the constant delay paradigm Problem: (timing closure problem) It has been difficult to get a circuit that meets delay requirements.
Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.
Power-Optimal Pipelining in Deep Submicron Technology
Memory Segmentation to Exploit Sleep Mode Operation
Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
University of Colorado at Boulder
Post-Silicon Calibration for Large-Volume Products
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

Changbo Long ECE Department, UW-Madison Lei He EDA Research Group EE Department, UCLA Distributed Sleep Transistor Network for Power Reduction* *Partially sponsored by NSF CAREER Award , SRC grant HJ-1008 and Intel Corporation

Outline Motivation Background DSTN Distributed sleep transistor network (DSTN)  Structure, advantages, modeling and sizing algorithm Experiment results Conclusion and future work

Motivation Leakage power will become the dominant power component  Reduced feature size  Increased system integration  more idle modules Leakage reduction techniques  To reduce leakage for active modules Dual threshold voltage assignment for sub-threshold leakage [Mahesh et-al, ICCAD’02] Pin reordering for gate leakage [Lee et-al, DAC’03]  To reduce leakage for idle modules Input vector control [Johnson et-al, DAC’99] Power gating Power gating [Kao et-al, DAC’98][Anis-et al, DAC’02]

Motivation PMP System level: use power management processor (PMP) to generate control signals [Mutoh et-al, JSSC’96]  PMP  PMP can be distributed Gate level: use sleep transistors to turns off power supply  Concerned with performance loss and area overhead PMP Sleep g1g1g1g1 gngngngn Virtual GND V dd Sleep tr. Sleep

Performance Loss Performance loss  Increase in the propagation delay V st Performance loss is proportional to V st MSSC i st  Maximum Simultaneous Switching Current (MSSC) g1g1g1g1 gngngngn V dd i st

MSSC MSSC: MSSC: maximum current in the time domain and the input vector domain g1g1 g2g2 g3g3 g1g1 g2g2 g3g3 i g1 i g2 i g3 Input vector Time MSSC t t t t t t t t i total + + =

Area Overhead Area overhead: the sleep transistor area and the routing area of virtual ground wires Design convention: given performance loss , minimize area overhead g1g1g1g1 gngngngn V dd MSSC

Related Work Module-based design methodology [Mutoh-et al, JSSC’95 ’96] [Kao-et al, DAC’98] singlelarge  A single and large sleep transistor accommodates entire module [JSSC’96]  Manual sizing  automatic sizing considering discharge patterns [Kao-et al, DAC’98] long  Voltage drop on long virtual ground wires is nontrivial, and results in large area

Related Work Module-based design methodology [Mutoh-et al, JSSC’95 ’96] [Kao-et al, DAC’98] singlelarge  A single and large sleep transistor accommodates entire module [JSSC’96]  Manual sizing  automatic sizing considering discharge patterns [Kao-et al, DAC’98] long  Voltage drop on long virtual ground wires is nontrivial, and results in large area Cluster-based design methodology [Anis-et al, DAC’02] minimize peak current  Group gates into clusters and minimize peak current in clusters by clustering algorithms avoid  Insert a sleep transistor for each cluster to avoid long virtual ground wires conflict  Clustering may conflict with time-driven placement

Sleep transistor area Area*: Area*: the sleep transistor area ignoring the resistance of virtual ground wires  MSSC module ∑ i MSSC cluster_i  area* module area* cluster  MSSC module < ∑ i MSSC cluster_i  area* module <area* cluster

Sleep transistor area Area*: Area*: the sleep transistor area ignoring the resistance of virtual ground wires  MSSC module ∑ i MSSC cluster_i  area* module area* cluster  MSSC module < ∑ i MSSC cluster_i  area* module <area* cluster Area mod Area clu Considering the resistance of virtual ground wires, Area mod > Area clu [Anis-et al, DAC’02] DSTN DSTN has the smallest area  Area DSTN ≈ Area * mod

DSTN: Distributed Sleep Transistor Network DSTN DSTN enhances cluster-based design by connecting clusters with extra virtual ground wires Cluster-based design DSTN

Current Discharging Balance Reduces Size Cluster-based design DSTN private  Current discharges by its private sleep transistor  large transistor size DSTN DSTN bothprivateneighboring  Current discharges by both private and neighboring sleep transistors  small transistor size

Additional Advantages of DSTN Cluster-based design DSTN DSTNNO constraint DSTN introduces NO constraint on placement DSTNsmall Wire overhead of DSTN is small Sleep tr. Additionalwires Additional wires Cluster

Entire module  resistance network plus current source Switching current RiRiRiRi R st Modeling of DSTN

DSTN/SP DSTN Sizing Problem (DSTN/SP) DSTN/SP minimizedsatisfied  Given DSTN topology, DSTN/SP finds the size for every sleep transistor such that the total transistor area of DSTN is minimized and the performance loss constraint is satisfied for every cluster DSTN Sizing Problem R st =? W=? W=? W=? W=? PL<  R st =? V st <ε R st =? V st <ε R st =? V st <ε Switching current

Primary challenge: current source  Dependency between the current sources  Current varies w.r.t. time Secondary challenge: resistance network R st  Given current source, size R st to minimize transistor area while satisfy performance loss constraints Does any algorithms exist in the literature?  No exact solution Close solution for Power/Ground network sizing [Boyd, et-al ISPD’01] special DSTN/SP  We have developed an algorithm based on special properties of DSTN/SP Difficulties of DSTN/SP

Properties of DSTN/SP Solutions P1R i =0 P1: Assuming R i =0,   MSSC   : Performance loss constraint, MSSC: Maximum current

P2Area DSTN R i P2: given current source, Area DSTN increases when R i increases R i << R st  The increase is limited because R i << R st Area DSTN Area cluster  R i =∞, Area DSTN =Area cluster Properties of DSTN/SP Solutions

P3Area DSTN P3: Assuming cluster current and Area DSTN to be constant, to achieve minimum performance loss, Properties of DSTN/SP Solutions

Algorithm for DSTN/SP P1P2DSTN P1, P2: Total sleep transistor area of DSTN is determined by   R i   [0.05, 0.5], empirical parameter increases when R i increases P3 P3: Size of each individual sleep transistor is MSSC module MSSC cluster Key is to estimate MSSC module and MSSC cluster

MSSC module Estimate MSSC module  Circuit current strongly depends on input vector  The space of input vector increase exponentially with the number of primary input  Genetic algorithm (GA) based algorithm is used [Jiang et-al, TVLSI’00] MSSC cluster Efficient algorithm to estimate MSSC cluster has been proposed in the paper Maximum Current Estimation

Cluster-based design without considering placement constraint ∑ i MSSC cluster_i Area cluster  Given a circuit and cluster size, partition gates into clusters such that ∑ i MSSC cluster_i is minimized and Area cluster is minimized in turn Clustering algorithm  Simulated Annealing (SA) Sizing algorithm  Each individual sleep transistor  Total area Base-line Case: Cluster-based Design

Experiment Setup Gate level synthesis  Sizing Estimate maximum current for clusters and the entire module Apply the sizing algorithms  Verification Simulate the circuit and obtain the current source by 10,000 random input vectors performance loss KCLKVL Obtain performance loss by solving the resistance network with circuit KCL and KVL equations maximum performance loss Find the maximum performance loss among the performance loss for each input vector Custom layout  Implement a four-bit CLA using 0.35μm technology SPICE  Determine size by SPICE simulation Cluster-based design: each cluster satisfy the performance loss constraint DSTN DSTN: the entire module satisfy the performance loss constraint

DSTN 49.8% On average, DSTN reduces total W/L by 49.8% with smaller performance loss Result of Gate Level Synthesis C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Cluster-based DSTN W/L of Sleep Transistors Maximum Performance Loss

Each cluster is accommodated by a sleep transistor Sleep transistors Sleep transistors are connected by virtual ground wires Sleep transistors Virtual ground wires Cluster-based design DSTN DSTN Custom Layout in 0.35μm

DSTN50x5x DSTN reduces runtime leakage by 50x and 5x  compared to no sleep transistor and cluster-based design, respectively DSTN6.83x6.6% DSTN reduces sleep transistor area by 6.83x with 6.6% smaller performance degradation  compared to the cluster-based design Custom Layout Comparison Leakage current delay Sleep tr. Area Total area No sleep transistor Cluster-based DSTN

Conclusion and Future Work DSTN We have proposed DSTN and the sizing algorithm  DSTN  DSTN has reduced area, less leakage current and supply voltage drop Future work  Ideal power/ground network is assumed in this paper DSTNpower/ground network  Investigate the co-design of DSTN and the power/ground network