Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

Abdullah Aldahami ( ) Jan 29,  This paper propose a new resynthesis algorithm for FPGA area reduction.  The existing resynthesis techniques.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR Topics n Logic synthesis. n Placement and routing.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.
Asynchronous Sequential Logic
Synchronous Digital Design Methodology and Guidelines
CMOS Circuit Design for Minimum Dynamic Power and Highest Speed Tezaswi Raja, Dept. of ECE, Rutgers University Vishwani D. Agrawal, Dept. of ECE, Auburn.
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
Equivalence Checking Using Cuts and Heaps Andreas Kuehlmann Florian Krohm IBM Thomas J. Watson Research Center Presented by: Zhenghua Qi.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Equivalence Checking Sean Weaver.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
Project Proposal RIPE: A Rapid Implication- based Power Estimator Sunil Motaparti, Gaurav Bhatia.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Copyright 2001, Agrawal & BushnellVLSI Test: Lecture 81 Lecture 8 Testability Measures n Origins n Controllability and observability n SCOAP measures 
Timing Analysis of Cyclic Combinational Circuits Marc D. Riedel and Jehoshua Bruck California Institute of Technology IWLS, Temecula Creek, CA, June 4,
Computer ArchitectureFall 2008 © August 20 th, Introduction to Computer Architecture Lecture 2 – Digital Logic Design.
ECE 667 Synthesis and Verification of Digital Systems
Project Report I RIPE: A Rapid Implication- based Power Estimator Sunil Motaparti, Gaurav Bhatia.
Project Report II RIPE: A Rapid Implication- based Power Estimator Sunil Motaparti, Gaurav Bhatia.
1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.
USING SAT-BASED CRAIG INTERPOLATION TO ENLARGE CLOCK GATING FUNCTIONS Ting-Hao Lin, Chung-Yang (Ric) Huang Graduate Institute of Electrical Engineering,
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Modern VLSI Design 3e: Chapter 5,6 Copyright  2002 Prentice Hall PTR Adapted by Yunsi Fei Topics n Sequential machine (§5.2, §5.3) n FSM construction.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications.
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
Why Low Power Testing? 台大電子所 李建模.
L16 : Logic Level Design (2) 성균관대학교 조 준 동 교수
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Fast Algorithms for Slew Constrained Minimum Cost Buffering S. Hu*, C. Alpert**, J. Hu*, S. Karandikar**, Z. Li*, W. Shi* and C. Sze** *Dept of ECE, Texas.
Cut-Based Inductive Invariant Computation Michael Case 1,2 Alan Mishchenko 1 Robert Brayton 1 Robert Brayton 1 1 UC Berkeley 2 IBM Systems and Technology.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
SAT Sweeping with Local Observability Don’t-Cares Qi Zhu 1 Nathan Kitchen 1 Andreas Kuehlmann 1,2 Alberto Sangiovanni-Vincentelli 1 1 University of California.
March 28, Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate University of Toronto Supervisor: Professor Stephen D.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Static Timing Analysis
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Timing Optimization.
1 Alan Mishchenko Research Update June-September 2008.
04/21/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Functional & Timing Verification 10.2: Faults & Testing.
Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Reducing Structural Bias in Technology Mapping
Synthesis for Verification
Alan Mishchenko UC Berkeley
Solving Linear Arithmetic with SAT-based MC
Chapter 8: Main Memory.
Simple Circuit-Based SAT Solver
Applying Logic Synthesis for Speeding Up SAT
Integrating an AIG Package, Simulator, and SAT Solver
Overview: Fault Diagnosis
Timing Analysis 11/21/2018.
Timing Optimization Andreas Kuehlmann
SAT-Based Area Recovery in Technology Mapping
Scalable and Scalably-Verifiable Sequential Synthesis
FPGA Glitch Power Analysis and Reduction
Automatic Test Pattern Generation
Robert Brayton Alan Mishchenko Niklas Een
Fast Min-Register Retiming Through Binary Max-Flow
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Department of EECS UC Berkeley
Integrating AIG Package, Simulator, and SAT Solver
Presentation transcript:

Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas Kuehlmann Cadence Berkeley Labs IWLS 2007

Motivation Dynamic power consumption of clock network consumes 30-40% of total power in current designs  Every register clock input is switched every cycle A large fraction of these transitions can be avoided Clock gating inserts combinational logic on clock path to conditionally block switching  Capacitive load is “hidden” behind gates

Implementation G is the gating condition for the registers R 1 -R 3  The clock is not propagated when active Glitches in G may propagate without a latch clk G clk G R1R1 R2R2 R3R3 clk G clk G R1R1 R2R2 R3R3 Clock GateGlitch-Safe Clock Gate

Clock Gating Problem Problem: How to produce gating conditions that are… 1. Functionally correct 2. Meet timing and physical constraints 3. Result in maximal dynamic power savings 4. Require minimal additional area and power to generate Combinational versus sequential  Combinational gating conditions are functions of signals available within the same cycle This work addresses the combinational problem…

Previous Approaches Human effort  Worthwhile investment at architectural level  Automatic approaches are needed at netlist level Symbolic computation of gating functions Problem 1: Symbolic functional manipulation is not scalable Problem 2: Implementing required logic is unpredictable quality ? 01

New Algorithm Computation is scalable and bounded  Combining simulation and SAT-solving Existing logic is heavily reused  Contains difficulty of synthesis problem  Minimizes design perturbation

Let x be the set of combinational inputs (PIs and states) Given a register R  Let x R be its current state  Let F R (x) be its next state The register doesn’t switch when F R (x) = x R A function G is a valid combinational clock gating function for R if the validity condition is met Terminology G1 (x)G1 (x) G2 (x)G2 (x) G3 (x)G3 (x) G4 (x)G4 (x)

Nodes Assume existing logic network is some sort of DAG Each node in the network implements some function f  Can be used “for free”… less additional load and wiring Nodes of interest are collected for each register Not all pairs need to be enumerated. Constrained by… 1. Physical location 2. Timing information 3. Functionality 4. Potential power savings

Nodes: Constraints Timing Constraints Reason: gating condition must be available before clock Late-arriving signals are discarded discardedkept Physical Constraints Reason: length of clock gating nets Limit nodes to a local region around reg. R

Nodes: Pruning Multiple simulation vectors are pushed through circuit and node / register pairs are examined 1. Search for counterexamples to functional validity  If one is found, node is marked as proven invalid 2. Accumulate probabilistic information about the actual coverage of a gating function  Using size of Boolean ON-set not an effective estimate  If available, simulation traces reflective of actual operation

Nodes: Proving Functional validity of nodes must be conclusive Problem is formulated as Boolean satisfiability using a simple test structure Two powerful speed-ups 1. Solver can be run in incremental mode 2. Counter-examples can be used for further simulation g(x) FR(x)FR(x) R x xRxR SAT?

Nodes are merged together into disjunctive covers  A cover G is a set of nodes { f i }  Is valid gating function for register R iff all f i are valid for R Functional coverage of clock gate is increased with little additional hardware  Only an additional input is required on the AND-gate Covers f1 (x)f1 (x) f3 (x)f3 (x) f2 (x)f2 (x) G (x)

Information collected thus far…  Possible nodes for each register  Validity and probability Selecting gates is an instance of rectangle covering But… correlation between nodes is not known Split rectangle covering into two phases  Cover generation and selection  Redo simulation between to capture correlation Covers: Problem f1f1... f7f7 f 104 f 233 R4R4 R5R5 R 66...R 87 registers nodes Proven Unknown Disproven Functional Validity

Covers: Generation & Selection A heuristic is used generate interesting covers Problem is maximum set covering  NP-complete  Efficient heuristics exist  Problem instance is sparse Covers are greedily selected in order of power savings  Power cost of additional logic included Negative power savings are ignored 0.25 G1G1... G7G7 G 104 G 233 R4R4 R5R5 R R registers covers

There’s More… When clock is gated, register keeps its value. Its input is irrelevant  An observability don’t care (ODC) is produced In general, these are difficult to utilize in logic minimization A purely structural simplification can be applied Rule: Any immediate fanout of f can be rewired to a constant if its transitive fanout contains only registers gated by a cover containing f clk G clk G f  f 0

Results Procedure can be applied both pre- and post- mapping  Post-mapping includes physical and timing information  Pre-mapping exposes non-physical signals If useful for clock gating, mark to be mapped Preliminary experimental results: technology independent netlist without pair-wise compounds  Applied to OpenCores benchmarks  Pre-synthesized using ABC logic synthesis package  Implemented in OpenAccess / OpenAccess Gear

Results Average reduction in register switching: 14.4% Average reduction in size of circuit: 7.7%

Completed Next Steps Technology-dependent version New scheme for delaying SAT-based proof Creating new simple functions of existing nodes  Additional candidates for improving coverage G1 (x)G1 (x) G2 (x)G2 (x) G2(x)G2(x) G3(x)G3(x)

Future Next Steps Hierarchical Clock Gating Bounded sequential gating selection  Reaching forward, to identify more unobservable transitions  Reaching backward, to generate more signals clk G1G1 G2G2 G3G3 clk G1 clk G2 clk G3 xixi x i-1 x i-2 g(x i-2 ) clk clk G