Give qualifications of instructors: DAP

Slides:



Advertisements
Similar presentations
Technology Mapping. Perform the final gate selection from a particular library Two basic approaches 1. ruled based technique 2. graph covering technique.
Advertisements

FPGA-Based System Design: Chapter 4 Copyright  2004 Prentice Hall PTR Topics n Logic synthesis. n Placement and routing.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
ECE 667 Synthesis and Verification of Digital Systems
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
Modern VLSI Design 2e: Chapter4 Copyright  1998 Prentice Hall PTR.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Technology Mapping.
Chapter 3 Simplification of Switching Functions
Logic Synthesis 1 Outline –Logic Synthesis Problem –Logic Specification –Two-Level Logic Optimization Goal –Understand logic synthesis problem –Understand.
Reconfigurable Computing (EN2911X, Fall07)
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
Technology Mapping 1 Outline –What is Technology Mapping? –Rule-Based Mapping –Tree Pattern Matching Goal –Understand technology mapping –Understand mapping.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
 2000 M. CiesielskiPTL Synthesis1 Synthesis for Pass Transistor Logic Maciej Ciesielski Dept. of Electrical & Computer Engineering University of Massachusetts,
ECE 331 – Digital System Design Multi-level Logic Circuits and NAND-NAND and NOR-NOR Circuits (Lecture #8) The slides included herein were taken from the.
Technology Mapping Outline Goal What is Technology Mapping?
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
Logic Decomposition ECE1769 Jianwen Zhu (Courtesy Dennis Wu)
Overview Part 2 – Circuit Optimization 2-4 Two-Level Optimization
1 VLSI CAD Flow: Logic Synthesis, Lecture 13 by Ajay Joshi (Slides by S. Devadas)
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
B-1 Appendix B - Reduction of Digital Logic Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
ECE 331 – Digital System Design NAND and NOR Circuits, Multi-level Logic Circuits, and Multiple-output Logic Circuits (Lecture #9) The slides included.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Shantanu Dutt ECE Dept. UIC
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Combinational network delay. n Logic optimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
L16 : Logic Level Design (2) 성균관대학교 조 준 동 교수
ETE 204 – Digital Electronics
Topics Combinational network delay.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
ICS 252 Introduction to Computer Design Lecture 12 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
Logic synthesis flow Technology independent mapping –Two level or multilevel optimization to optimize a coarse metric related to area/delay Technology.
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Combinational network delay. n Logic optimization.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)
1 CS 352 Introduction to Logic Design Lecture 4 Ahmed Ezzat Multi-level Gate Circuits and Combinational Circuit Design Ch-7 + Ch-8.
©2010 Cengage Learning SLIDES FOR CHAPTER 8 COMBINATIONAL CIRCUIT DESIGN AND SIMULATION USING GATES Click the mouse to move to the next page. Use the ESC.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Reducing Structural Bias in Technology Mapping
CS137: Electronic Design Automation
ESE535: Electronic Design Automation
Applying Logic Synthesis for Speeding Up SAT
Reconfigurable Computing
SAT-Based Optimization with Don’t-Cares Revisited
Topics Logic synthesis. Placement and routing..
Sungho Kang Yonsei University
ECE 667 Synthesis and Verification of Digital Systems
Timing Optimization.
Technology Mapping I based on tree covering
VLSI CAD Flow: Logic Synthesis, Placement and Routing Lecture 5
ESE535: Electronic Design Automation
CS137: Electronic Design Automation
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

Overview Logic synthesis LUT Clustering LUT capacity Chortle – example technology mapper Architecture-specific optimization

Boolean network A Boolean network is the main representation of the logic functions for technology independent optimizations. Each node can be represented as sum-of- products (or product-of-sums). Provides multi-level structure, but functions in the network need not correspond to logic gates.

Boolean network example out1 = k2 + x2’ out2 = k3 + x1 k2 = x1’ x2 x4 + k1 k3 = k1 x4’ k1 = x2 + x3 x1 x2 x3 x4 primary outputs primary inputs

Support: set of variables used by a function. Terms Support: set of variables used by a function. Transitive fanout: all the primary outputs and intermediate variables of a function. Transitive fanin: all the primary inputs and intermediate variables used by a function. Transistive fanin determines a cone of logic. cone primary inputs output

Partially-specified function x1 x2 x3 1 don’t care

Network restructuring. Delay restructuring. Optimizations Simplification. Changing the way a function is represented. Network restructuring. Adding and removing nodes. Delay restructuring. Optimizations that reduce the height of critical paths.

Partial collapsing f1 f4 F f4 f2 f3 f3 before after

Technology mapping Cover the function:

FPGA tech mapping Cost (number of inputs) doesn’t always increase with added functions:

Cost metric for static gates is literal: FPGAs vs. custom logic Cost metric for static gates is literal: ax + bx’ has four literals, requires 8 transistors. Cost metric for FPGAs is logic element: All functions that fit in an LE have the same cost.

LUT-based logic synthesis Find the largest logic cone that will fit into the LUT: r = q + s’ s = d’ q = g’ + h d = a + b

How much fits in a LUT? One 2-input NAND gate frequently used for comparison. Approximately 12 ~ 15 gates per four-input LUT. 216 functions -> 80 after IO swapping 14 after IO inversion 4-input determined to be optimal [Rose 1990] A B C D A B C D

Technology-Independent Logic Optimization Improve circuit based on cost Keep same functionality Boolean Evaluation/decomposition Simple factoring -> minimizing literals f = ac + ad + bc + bd g = a + b + c e = a + b g = e + c f = e(c + d)

Factorization Based on division: formulate candidate divisor; test how it divides into the function; if g = f/c, we can use c as an intermediate function for f. Algebraic division: don’t take into account Boolean simplification. Less expensive then Boolean division.

Library-based Technology Mapping – MIS II Three steps: decomposition, matching, covering Circuit first decomposed into NAND representations Different collections of NANDs can be implemented differently in VLSI Inv, cost 2 NAND2, cost 3 AOI-21, cost 4

MIS II Cost = Decompose into NAND-2 using Boolean techniques Use dynamic programming to match subtrees with libraries Choose lowest cost implementation that covers all primitives.

Tech Mapping for LUTs Minimize total number of LUTs Minimize the number of levels of LUTs Many different approaches Partitioning -> Flowmap BDDs -> XMAP Chortle -> Covering Basic Xilinx tech mapping follows Chortle with modification to handle registers.

Chortle-crf Secondary goal Dynamic programming approach Minimize # LUTs – primary goal Minimize # input circuit root uses Secondary goal Operates on AND-OR circuits. A B C D E F w x G H I J K L M y z Locate boundaries

Chortle-crf Major innovation is bin packing Simultaneously addresses decomposition and matching Goal: Find decomposition of every node in the network that minimizes # LUTs in final circuit Without decomp 4-LUTs With decomposition 2-LUTs

Mapping Each Tree Dynamically visit each node in the graph Fanin nodes drive the node under evaluation Boxes -> fanin LUTs, cost is number of inputs Bins -> N input LUT (in this case 5) First Fit Decreasing /* construct 2-level decomp */ box list <- fanin LUTs sorted by size bin list <- 0 while (box list is not 0) { box <- largest LUT find bin that will contain LUT if bin doesn’t exist bin <- box /* create new bin */ else bin <- box /* pack in exisiting */

Multi-Level Decomposition Chain LUTs together Output of largest second level LUT connected to LUT with unused input May need to add a new LUT Leads to min LUTs and fanout LUT with smallest # input This fanout LUT used as input to next stage

b) Two-level Decomposition Examples a) Fanin LUTs u v w x y b) Two-level Decomposition y x z.2 z.1 w v u y u v w x z.1 c) Multi-level Decomposition

Optimality For LUTs with fewer than 6 inputs Chortle will create an optimal result for subtree Combination of sub-trees is not optimized. Local optimizations needed to ensure global optimality. Reconvergent paths -> net drives multiple gates. Replicating logic -> creating additional fanout

Translating a Design to an FPGA Improve 2-level decomposition to take fanout into account Replace FFD with an exhaustive search that repeatedly invokes FFD. Try both with and without reconvergent path and select best mapping (forced merging) Inputs must reconverge at node being decomposed.

Reconvergent Paths Frequently, more than one pair of fan-in LUTs share inputs For each combination of pairs that share inputs, perform FFD. Two-level decomp with fewest bins and smallest least filled bin retained Reconverge pair list <- all pairs of fanin LUTs with shared inputs best LUTs <- 0 for all possible pairs from pair list { merged LUTs <- copy of fanin LUTs with forced merge FFD(merged LUTs) /* best combo */ }

Maximum Share Decreasing Exhaustive search prohibitive Select box using following criteria Greatest # inputs Shares greatest # inputs with any existing bin Shares greatest # of inputs with existing (remaining) boxes Reduces to FFD for no input sharing Points 2 and 3 optimize network sharing

Node Replication Without Replication With Replication Apply replication to fanout nodes Map without replication first Locally decompose fanout nodes to determine savings Ordering important

Results – Chortle-crf 20 netlists mapped to 5-input LUTs Reconvergence reduced LUTs by 2.7% Replication reduced LUTs by 3.7% Combined 14% reduction achieved Replication exposes reconvergent paths creating additional opportunities for optimization.

Chortle-d Minimize delay through circuit Generally increases hardware required Reduced logic levels by 38% Increased # LUTs by 79% Note most delay in FPGA in interconnect

Other Approaches MIS-PGA Groups inputs into LUTs Decompose into 4-LUTs (Roth-Karp) 47 times slower than Chortle 14% fewer LUTs XMAP Represent circuit as BDDs Effective for multiplexer based devices. Also, BDS-PGA

Flowmap 1. Use network flow to partition circuit. 2. Determine point where minimum flow achieved for minimum cut 3. Cut until LUTs of size N achieved.

Taking Flip flops into Account FPGA devices contain fixed resources – FFs Technology mapping should take these into account Consider fanout nodes. FF

LUT Packing - VPACK Seed BLE – choose BLE with most inputs. Select next BLE -> BLE which shares most inputs and outputs with cluster Continue until cluster is full or adding any BLE will overflow I -> # inputs Hill Climbing – exceed I limit temporarily to find better minimum.

Summary Many tech mapping algorithms exist to minimize delay/area Chortle use dynamic programming heuristic to perform mapping Largely a solved problem More sophisticated techniques evaluated recently