Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University.

Slides:



Advertisements
Similar presentations
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Advertisements

June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Power Reduction Techniques For Microprocessor Systems
Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.
© Digital Integrated Circuits 2nd Inverter EE4271 VLSI Design The Inverter Dr. Shiyan Hu Office: EERC 518 Adapted and modified from Digital.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
Dynamic Scan Clock Control In BIST Circuits Priyadharshini Shanmugasundaram Vishwani D. Agrawal
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Paulo MoreiraInverter1 The CMOS inverter. Paulo MoreiraInverter2 The CMOS inverter.
Logic Synthesis For Low Power CMOS Digital Design.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Performance Driven Crosstalk Elimination at Compiler Level TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan.
1 32-bit parallel load register with clock gating ECE Department, 200 Broun Hall, Auburn University, Auburn, AL 36849, USA Lan Luo ELEC.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.
Computation Energy Randy Huang Sep 29, Outline n Why do we care about energy/power n Components of power consumption n Measurements of power consumption.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Chapter 6 Memory and Programmable Logic Devices
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
The CMOS Inverter Slides adapted from:
ECE 331 – Digital System Design Power Dissipation and Additional Design Constraints (Lecture #14) The slides included herein were taken from the materials.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
Mehdi Sadi, Italo Armenti Design of a Near Threshold Low Power DLL for Multiphase Clock Generation and Frequency Multiplication.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Logic Synthesis For Low Power CMOS Digital Design.
1 Power Dissipation in CMOS Two Components contribute to the power dissipation: »Static Power Dissipation –Leakage current –Sub-threshold current »Dynamic.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Why Low Power Testing? 台大電子所 李建模.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Improving Timing, Area, and Power Speaker: 黃乃珊 Adviser: Prof.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 18: October 14, 2013 Energy and Power.
경종민 Low-Power Design for Embedded Processor.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 17: October 19, 2011 Energy and Power.
© Digital Integrated Circuits 2nd Inverter EE5900 Advanced Algorithms for Robust VLSI CAD The Inverter Dr. Shiyan Hu Office: EERC 731 Adapted.
Basics of Energy & Power Dissipation
Bi-CMOS Prakash B.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Dynamic Logic Circuits Static logic circuits allow implementation of logic functions based on steady state behavior of simple nMOS or CMOS structures.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Dynamic Logic.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instructions Bus Siu-Kei Wong.
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
ELEC Digital Logic Circuits Fall 2014 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
LOW POWER DESIGN METHODS
Damu, 2008EGE535 Fall 08, Lecture 51 EGE535 Low Power VLSI Design Lecture #5 & 6 CMOS Inverter.
IAY 0600 Digital Systems Design
The Inverter EE4271 VLSI Design Professor Shiyan Hu Office: EERC 518
Estimate power saving by clock slowdown for s5378 in 180nm and 32nm CMOS Chao Han ELEC 6270.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Microarchitectural Techniques for Power Gating of Execution Units
Ann Gordon-Ross and Frank Vahid*
The Inverter EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731
University of Texas at Austin
Data Wordlength Reduction for Low-Power Signal Processing Software
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Low Power Digital Design
Presentation transcript:

Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University

Power Dissipation Static dissipation due to leakage circuit Short-circuit dissipation Charge and discharge of output load capacitor

Power Dissipation Static dissipation due to leakage circuit Short-circuit dissipation Charge and discharge of output load capacitor V in V out V DD GND o

Dynamic Power Dissipation Model P: power dissipation C: load capacitance E: avg. transition count of the gate/ clock cycle V dd : supply voltage T cyc : clock period

Dynamic Power Dissipation Model P: power dissipation C: load capacitance E: Avg. transition count of the gate/ clock cycle V dd : supply voltage T cyc : clock period

Motivation Execution frequency of instructions is uneven Take MOV class as an example three instructions 22% execution frequency Profiling from Powerstone

Coupling Sub-decoders Partition an instruction decoder into two coupling sub-decoders The smaller decoder decodes only a small number of instructions When the smaller decoder is active, the larger decoder is turned off The smaller decoder is active frequently

Architecture of Coupling Sub- decoders Controls to turn on/off sub-decoders Activate-Control Input AND-OR Output OR Output bit0 I-Decoder0I-Decoder1 I-Activate Control FF1FF2FF3FFn … instruction I-Control0 I-Control1... S-Decoder0S-Decoder1 S-Activate Control S-Control0S-Control1... FF1 FFn 1101

Instruction Grouping Problem How to decompose Decoder so that the smaller sub-decoder is small the smaller sub-decoder is executed frequently the activate logic is small

Weighted Graph Model of Execution Sequence Node : instruction type Edge (U,V) : instruction U (V) executed after V (U) Weights on nodes and edges: execution frequency mov ldr mov b mul mov mul cmp mul b mov b ldr b mul cmp

Power Model SF i : transition frequency from Mi to Mi CF ij : transition frequency between Mi and Mj Power i : power of Mi estimated by Synopsys mov ldr b mul cmp Mj Mi

Instruction Grouping Problem : Graph Partitioning Generation of transition graph Initial clustering by random walk Initial partition of clusters Iterative improvement by moving clusters among groups

Experimental Process ARM7tdmi Circuit described by Verilog Circuit synthesized by Synopsys Design Compiler Power estimated by PrimePower: switching activities are collected by simulating Powerstone benchmark set

Results on Two-way Decomposition

Power Consumption Comparisons Power (W) Orig.Decomp.Improve Instruction Decoder 4.01E-42.81E % Control Unit 1.03E-38.35E % Lower power consumption

Critical Path and Area Comparisons Shorter critical path timing Area overhead Critical Path Timing (ns)Area Orig.Decomp.ImproveOrig.Decomp.Overhead Instruction Decoder % % Control Unit % %

Results on Multiple-way Decomposition

Power Consumption for Different Multi-way Grouping Two-way decomposition has best power reduction more groups  more overhead 0 1.E-04 2.E-04 3.E-04 4.E-04 5.E-04 Original 2way 3way 4way DecoderOverhead Power (W)

Critical Path Timing for Different Multi-way Grouping Four-way decomposition has best timing reduction Original2 way3 way4 way 5 way T i m i n g ( n s ) DecoderOverhead

Area Comparisons Area for different multi-way grouping Original 2way 3way 4way 5way Area

Conclusions Two-way partitioning has the best results for 142-instruction set Compared to un-decomposed decoder 30% reduction in power consumption 13% improvement in critical path timing Compared to un-decomposed control-U 19% reduction in power consumption 12% improvement in critical path timing

Thank You