A. Stammermann, D. Helms, M. Schulte OFFIS Research Institute

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond
Ch.7 Layout Design Standard Cell Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. Sait Habib Youssef Junaid A. KhanAimane El-Maleh Department of Computer.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. SaitHabib Youssef Junaid A. KhanAimane El-Maleh Department of Computer Engineering.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
1 3/22/02 Benchmark Update u Carnegie Cell Library: “Free to all who Enter” s Need to build scaling model of standard cell library s Based on our open.
Fuzzy Evolutionary Algorithm for VLSI Placement Sadiq M. SaitHabib YoussefJunaid A. Khan Department of Computer Engineering King Fahd University of Petroleum.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
CSE 242A Integrated Circuit Layout Automation Lecture: Floorplanning Winter 2009 Chung-Kuan Cheng.
CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Decoupling Capacitance Allocation for Power Supply Noise Suppression Shiyou Zhao, Kaushik Roy, Cheng-Kok Koh School of Electrical & Computer Engineering.
1 ENTITY test is port a: in bit; end ENTITY test; DRC LVS ERC Circuit Design Functional Design and Logic Design Physical Design Physical Verification and.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
ASIC Design Flow – An Overview Ing. Pullini Antonio
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Modern VLSI Design 3e: Chapter 10 Copyright  1998, 2002 Prentice Hall PTR Topics n CAD systems. n Simulation. n Placement and routing. n Layout analysis.
1. Placement of Digital Microfluidic Biochips Using the T-tree Formulation Ping-Hung Yuh 1, Chia-Lin Yang 1, and Yao-Wen Chang 2 1 Dept. of Computer Science.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
DEVICES AND DESIGN : ASIC. DEFINITION Any IC other than a general purpose IC which contains the functionality of thousands of gates is usually called.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
The Annealing Algorithm Revisited L.P.P.P. van Ginneken DigiPen Institute of Technology.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 21, 2012 Crosstalk.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Piero Belforte, HDT 1999: PRESTO POWER by Alessandro Arnulfo.
Piero Belforte, HDT, July 2000: MERITA Methodology to Evaluate Radiation in Information Technology Application, methodologies and software solutions by Carla Giachino,
THE CMOS INVERTER.
Digital readout architecture for Velopix
VLSI Testing Lecture 5: Logic Simulation
VLSI Testing Lecture 5: Logic Simulation
Dynamic Graph Partitioning Algorithm
Power Optimization Toolbox for Logic Synthesis and Mapping
Vishwani D. Agrawal Department of ECE, Auburn University
Alpha Blending and Smoothing
Summary Half-Adder Basic rules of binary addition are performed by a half adder, which has two binary inputs (A and B) and two binary outputs (Carry out.
Fault-Tolerant Architecture Design for Flow-Based Biochips
COMBINATIONAL LOGIC.
Timing Optimization Andreas Kuehlmann
Digital Integrated Circuits A Design Perspective
Fast Nearest Neighbor Search on Road Networks
Buffered tree construction for timing optimization, slew rate, and reliability control Abstract: With the rapid scaling of IC technology, buffer insertion.
Multi-Objective Optimization
HIGH LEVEL SYNTHESIS.
HIGH LEVEL SYNTHESIS: Estimations and Transformations
Michele Santoro: Further Improvements in Interconnect-Driven High-Level Synthesis of DFGs Using 2-Level Graph Isomorphism Michele.
Resource Allocation in a Middleware for Streaming Data
Low Power Digital Design
Arithmetic Building Blocks
Arithmetic Circuits.
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

A. Stammermann, D. Helms, M. Schulte OFFIS Research Institute Binding, Allocation and Floorplanning in Low Power High-Level Synthesis A. Stammermann, D. Helms, M. Schulte OFFIS Research Institute A. Schulz, W. Nebel Univ. Of Oldenburg

Optimization Algorithm Evaluation Results Conclusion Outline Motivation Background Optimization Algorithm Evaluation Results Conclusion 11.11.2003 ICCAD 2003

Pdyn =  C VDD2 f Motivation Within actual CMOS-technologies (0.1m) 80-95% of the power consumption emerges from charging and discharging capacitances. VDD Input Output Switching activity  Capacitance of • gates at the output • connecting wires C GND Pdyn =  C VDD2 f Optimization target 11.11.2003 ICCAD 2003

Capacitance Distribution (ITRS Roadmap 2001) Increasing share of interconnect capacitance  Interconnect has to be considered during high-level synthesis. 11.11.2003 ICCAD 2003

Summary Capacitance contributes linear to power dissipation Wire capacitance is increasing Wire capacitance is dominated by its length  Thus it is important that accurate physical information is used during high-level synthesis 11.11.2003 ICCAD 2003

Chip Design RT-netlist Generation Gate-netlist Generation detailed Floorplanning ASIC-Cells rough Floorplanning RT-netlist Generation adder, multiplier Gate-netlist Generation Increasing level of details Floorplanning ASIC-Cells Interconnect 11.11.2003 ICCAD 2003

RT-Netlist Generation cstep 1 cstep 2 cstep 3 cstep 4 +1 +2 +3 +5 +4 Scheduling When are operations executed +1 +2 +3 +5 +4 Allocation How many resources Binding Which operation is executed on which resource +1 +2 +3 +5 +4 11.11.2003 ICCAD 2003

RT-Netlist Generation +1 +2 +3 +5 +4 +1 +2 +3 +5 +4 Reg1 Reg2 Reg3 Reg1 Reg2 Reg3 ADD1,2 ADD3 ADD4,5 ADD1 ADD2,3 ADD4,5 11.11.2003 ICCAD 2003

Impact on Power Dissipation Binding/Allocation Interleaving changes the characteristic of input streams: The switching activity  at the input of resources (and according wires) can increase or decrease. Influences netlist topology: The wire length resp. capacitance can increase or decrease. 1 +2 +1 ADD 11.11.2003 ICCAD 2003

Power Estimation Switching activity Simulation of behaviour description Power consumption of resources (e.g. adder) Power models Wire capacitance Capacitance model Floorplan 11.11.2003 ICCAD 2003

Floorplanning Slicing Floorplans Standard Supports softmacros (leafs are flexible in their aspect-ratio) Efficient representation as binary tree x y vertical + 1 * 4 2 3 5 2 4 1 3 5 horizontal 11.11.2003 ICCAD 2003

Optimization Algorithm Power optimization algorithm for RT-level netlist with regard to interconnect power Performs simultaneoulsy Slicing-tree structured based floorplanning Functional unit binding and allocation Changes in the netlist topology are mended immediately in the actual floorplan In contrast to previous approaches a generation from scratch is not necessary Non-deterministic, iterative optimization technique Simulated Annealing Cost function: PFE + PInter + Area 11.11.2003 ICCAD 2003

Simulated Annealing based floorplanner Floorplan moves Floorplan-Annealing Simulated Annealing based floorplanner Floorplan moves F1: Exchange two leafs F2: Exchange leaf and node F3: Change direction F4: Exchange two nodes F5: Displace a leaf / node 2 3 1 4 * + 2 3 1 4 F2: Exchange leaf and node * + 2 3 4 1 2 3 4 1 11.11.2003 ICCAD 2003

Floorplan-Annealing Initial Floorplan Floorplan move Fi new costs < old costs random number < e-Cost/T Undo move Fi N N Y Y Y Stopping criteria met? N Y Optimized floorplan 11.11.2003 ICCAD 2003

Architecture-Annealing Architecture moves (mutate binding/allocation) A1: Share Merge two resources res1 and res2 to one single resource res1 und res2 must be instances of the same type (e.g. ripple adder) A2: Split Inverse of Share Split a single resource into two resources A3: Swap Swap the inputs of commutative operations A1-A3 in combination are able to create every possible binding solution 11.11.2003 ICCAD 2003

Architecture-Annealing Changes in the architecture are mended immediately in the actual floorplan. Moves A1-A3 may cause resources to vanish or appear. Moves A1-A3 changes the netlist topology.  The floorplan is not optimal anymore after a move A1-A3. Solution: Each move A1-A3 is followed by a floorplan update (annealing). Supporting softmacros. Optimal inserting of new resources into floorplan. 11.11.2003 ICCAD 2003

Algorithm e-Cost/T Initial architecture/floorplan solution Each architecture move is followed by a floorplan-annealing Architecture move Aj floorplan- annealing new costs < old costs random number < e-Cost/T Undo move Aj Undo floorplan update N N Y Y Stopping criteria met? Optimized architecture/floorplan solution N Y 11.11.2003 ICCAD 2003

Inserting new resources 1 2 3 4 new softmacros hardmacros 1 2 3 4 new 8.2 LE 4.6 LU best position 1 2 3 4 11.11.2003 ICCAD 2003

Evaluation 3 experiments are performed: Full parallel Each operation is mapped on one single resource (no resource sharing). Consecutive Binding optimization and floorplan optimization are executed consecutively. This interconnect unaware optimization is similarly to the traditionally procedure in high-level synthesis. Simultaneous Binding and floorplanning is optimized simultaneously. 11.11.2003 ICCAD 2003

Results 11.11.2003 ICCAD 2003

Conclusion Compared to consecutively (traditionally) procedure the proposed technique reduces interconnect power almost by half increases the functional unit power insignificantly The CPU times vary from 6 seconds (diffeq) to 138 seconds (turbo_decoder). [1,0 GHz Athlon™ based PC with 256 MB memory] 11.11.2003 ICCAD 2003

Thank You! 11.11.2003 ICCAD 2003

Capacitance model Capacitance extracted from layout vs. capacitance from our model (0,35 m) 11.11.2003 ICCAD 2003

ITRS Roadmap 2001 International Technology Roadmap for Semiconductors 11.11.2003 ICCAD 2003

Cross Section on Die 11.11.2003 ICCAD 2003