CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.

Slides:



Advertisements
Similar presentations
Technology Mapping. Perform the final gate selection from a particular library Two basic approaches 1. ruled based technique 2. graph covering technique.
Advertisements

CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE 667 Synthesis and Verification of Digital Circuits
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1 Maximum flow sender receiver Capacity constraint Lecture 6: Jan 25.
Timing Optimization. Optimization of Timing Three phases 1globally restructure to reduce the maximum level or longest path Ex: a ripple carry adder ==>
ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
ECE 667 Synthesis and Verification of Digital Systems
HW2 Solutions. Problem 1 Construct a bipartite graph where, every family represents a vertex in one partition, and table represents a vertex in another.
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Technology Mapping.
EDA (CS286.5b) Day 2 Covering. Why covering now? Nice/simple cost model problem can be solved well (somewhat clever solution) general/powerful technique.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 23, 2008 Covering.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Maximum Flows Lecture 4: Jan 19. Network transmission Given a directed graph G A source node s A sink node t Goal: To send as much information from s.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 11, 2008 Static Timing Analysis and Multi-Level Speedup.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 16: March 23, 2009 Covering.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
FPGA Technology Mapping Algorithms
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
Maximization of Network Survivability against Intelligent and Malicious Attacks (Cont’d) Presented by Erion Lin.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimisation in Lookup- Table Based FPGA Designs 04/06/ Presented by Qiwei Jin.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Give qualifications of instructors: DAP
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
CS 312: Algorithm Design & Analysis Lecture #29: Network Flow and Cuts This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
CS137: Electronic Design Automation
ESE535: Electronic Design Automation
Reconfigurable Computing
SAT-Based Optimization with Don’t-Cares Revisited
ECE 667 Synthesis and Verification of Digital Systems
Technology Mapping I based on tree covering
ESE535: Electronic Design Automation
Maximum Flow Neil Tang 4/8/2008
CS137: Electronic Design Automation
Fast Min-Register Retiming Through Binary Max-Flow
CS137: Electronic Design Automation
Presentation transcript:

CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping

Last Time Saw that small LUTs good compute block –Area K=3--4 –Delay K=5--6 –in this range interconnect dominate not much savings available to go to non-LUT Larger blocks promising –hardwired composition of small LUTs

Today How do we map to LUTs? –get designs into LUT netlist –…do studies like last time Lessons… –for non-LUTs –for delay-oriented partitioning

LUT Mapping Problem: Map logic netlist to LUTs –minimizing area –minimizing delay Old problem? –Technology mapping? Library approach require 2 2 K gates in library Exploit fact that K-LUT implement any K-input function?

LUT Mapping NP-Hard in general Fanout-free -- can solve optimally given decomposition –(but which one?) Delay optimal mapping achievable in Polynomial time Area w/ fanout NP-complete

Preliminaries What matters/makes this interesting? –Area / Delay target –Decomposition –Fanout replication reconvergent

Area vs. Delay

Decomposition

Fanout: Replication

Fanout: Reconvergence

Dynamic Programming Optimal covering of a logic cone is: –Minimum cost (all possible coverings) Evaluate costs of each node based on: –cover node –cones covering each fanin to node cover Evaluate node costs in topological order Key: are calculating optimal solutions to subproblems –only have to evaluate covering options at each node

Dynamic Programming

Flowmap Key Idea: –LUT holds anything with K inputs –Use network flow to find cuts  logic can pack into LUT including reconvergence …allows replication –Optimal depth arise from optimal depth solution to subproblems

Delay objective: –minimum height, K-feasible cut –I.e. cut no more than K edges –start by bounding fanin  K Height of node will be: –height of predecessors or –one greater than height of predecessors Check shorter first Flowmap

Construct flow problem –sink  target node being mapped –source  start set (primary inputs) –flow infinite into start set –flow of one on each link –to see if height same as predecessors collapse all predecessors of maximum height into sink (single node, cut must be above) height +1 case is trivially true

Flowmap Max-flow Min-cut algorithm to find cut Use augmenting paths to until discover max flow > K O(K|e|) time to discover K-feasible cut –(or that does not exist) Depth identification: O(KN|e|)

Flowmap Min-cut may not be unique To minimize area achieving delay optimum –find max volume min-cut Compute max flow => find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set

Flowmap Covering from labeling is straightforward –process in reverse topological order –allocate identified K-feasible cut to LUT –remove node, nodes covered no fanning out –postprocess to minimize LUT count Notes: –replication implicit (covered multiple places) –nodes purely internal to one or more covers may not get their own LUTs

DF-Map Duplication Free Mapping –can find optimal area under this constraint –(but optimal area may not be duplication free)

Maximum Fanout Free Cones

DF-Map Partition into graph into MFFCs Optimally map each MFFC In dynamic programming –for each node examine each K-feasible cut pick cut to minimize cost –1 +  MFFCs for fanins

Composing Don’t need minimum delay off the critical path Don’t always want/need minimum delay Composite: –map with flowmap –Greedy decomposition of “most promising” non-critical nodes –DF-map these nodes

Applicability to Non-LUTs? E.g. LUT Cascade –can handle some functions of K inputs How apply?

Adaptable to Non-LUTs Sketch: –Initial decomposition to nodes that will fit (e.g. primitive K on cascaded LUTs) –Find max volume, min-height K-feasible cut –ask if logic block will cover yes => done no => exclude one (or more) nodes from block and repeat –exclude == collapse into start set nodes –this makes heuristic

Partitioning? Effectively partitioning logic into clusters –LUT cluster unlimited internal “gate” capacity limited I/O (K) simple cost model –1 cross between clusters –0 inside cluster

Partitioning Clustering –if strongly I/O limited, same basic idea works for partitioning to components typically: partitioning onto multiple FPGAs assumption: inter-FPGA delay >> intra-FPGA delay –w/ area constraints similar to non-LUT case –make min-cut –will it fit? –Exclude some LUTs and repeat

Partitioning To date: –primarily used for 2-level hierarchy I.e. intra-FPGA, inter-FPGA Open/promising –adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule localize critical paths to smallest subtree possible?

Summary Optimal LUT mapping NP-hard in general –fanout, replication, …. K-LUTs makes delay optimal feasible –single constraint: IO capacity –trick: max-flow/min-cut Heuristic adaptations of basic idea to capacity constrained problem –promising area for interconnect delay optimization