Penn ESE535 Spring 2015 -- DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 27, 2008 Partitioning 2 (spectral, network flow, replication)
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
EDA (CS286.5b) Day 2 Covering. Why covering now? Nice/simple cost model problem can be solved well (somewhat clever solution) general/powerful technique.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 23, 2008 Covering.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 11, 2008 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 16: March 23, 2009 Covering.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 12: March 2, 2009 Sequential Optimization (FSM Encoding)
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2009 Partitioning 2 (spectral, network flow)
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
FPGA Technology Mapping Algorithms
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 26, 2015 Covering Work preclass exercise.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 16, 2015 Scheduling Variants and Approaches.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 14, 2011 Placement (Intro, Constructive)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: February 4, 2014 Partitioning 2 (spectral, network flow)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 2: January 20, 2010 Universality, Gates, Logic Work Preclass Exercise.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 11, 2002 Partitioning 2 (spectral, network flow, replication)
Give qualifications of instructors: DAP
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 26, 2004 Sequential Optimization (FSM Encoding)
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: February 25, 2015 Placement (Intro, Constructive)
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: January 31, 2011 Scheduling Variants and Approaches.
CS137: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
Reconfigurable Computing
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
CS137: Electronic Design Automation
Fast Min-Register Retiming Through Binary Max-Flow
CS137: Electronic Design Automation
Presentation transcript:

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)

Penn ESE535 Spring DeHon 2 Today How do we map to LUTs? What happens when –IO dominates –Delay dominates? Lessons… –for non-LUTs –for delay-oriented partitioning Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

Penn ESE535 Spring DeHon 3 LUT Mapping Problem: Map logic netlist to LUTs –minimizing area –minimizing delay Old problem? –Technology mapping? (Day 3) –How big is the library for K-input LUT? 2 2 K gates in library

Penn ESE535 Spring DeHon 4 Simplifying Structure K-LUT can implement any K-input function

Preclass: Cover in 4-LUT? Penn ESE535 Spring DeHon 5

Preclass: Cover in 4-LUT? Penn ESE535 Spring DeHon 6

Preclass: Cover in 4-LUT? Penn ESE535 Spring DeHon 7

Preclass: Cover in 4-LUT? Penn ESE535 Spring DeHon 8

Preclass: Cover in 4-LUT? Penn ESE535 Spring DeHon 9

10 Cost Function Delay: number of LUTs in critical path –doesn’t say delay in LUTs or in wires –does assume uniform interconnect delay Area: number of LUTs –Assumes adequate interconnect to use LUTs

Penn ESE535 Spring DeHon 11 LUT Mapping NP-Hard in general Fanout-free -- can solve optimally given decomposition –(but which one?) Delay optimal mapping achievable in Polynomial time Area w/ fanout NP-complete

Penn ESE535 Spring DeHon 12 Preliminaries What matters/makes this interesting? –Area / Delay target –Decomposition –Fanout replication reconvergent

Penn ESE535 Spring DeHon 13 Costs: Area vs. Delay

Penn ESE535 Spring DeHon 14 Decomposition

Penn ESE535 Spring DeHon 15 Decomposition

Penn ESE535 Spring DeHon 16 Fanout: Replication

Penn ESE535 Spring DeHon 17 Fanout: Replication

Penn ESE535 Spring DeHon 18 Fanout: Reconvergence

Penn ESE535 Spring DeHon 19 Fanout: Reconvergence

What makes it hard? Cost does not monotonically increase as cover more of graph. Not clear when to stop? We say cost does not have a monotone property Penn ESE535 Spring DeHon 20

Preclass Revisited Penn ESE535 Spring DeHon 21

Penn ESE535 Spring DeHon 22 Definition Cone: set of nodes in the recursive fanin of a node

Penn ESE535 Spring DeHon 23 Example Cones

Penn ESE535 Spring DeHon 24 Delay

Delay of Preclass Circuit? Poll: Delay of preclass circuit Penn ESE535 Spring DeHon 25

Penn ESE535 Spring DeHon 26 Dynamic Programming Optimal covering of a logic cone is: –Minimum cost (all possible coverings) Evaluate costs of each node based on: –cover node –cones covering each fanin to node cover Evaluate node costs in topological order Key: are calculating optimal solutions to subproblems –only have to evaluate covering options at each node

Penn ESE535 Spring DeHon 27 Flowmap Key Idea: –LUT holds anything with K inputs –Use network flow to find cuts  logic can pack into LUT including reconvergence …allows replication –Optimal depth arise from optimal depth solution to subproblems

Max-Flow / Min-Cut The maximum flow in a network is equal to the minimum cut –…the bottleneck We can find the mincut by computing the maxflow. Conceptually, how would we determine the maximum flow? Penn ESE535 Spring DeHon 28

Example Flow cut Penn ESE535 Spring DeHon 29 s t

Example Flow cut Penn ESE535 Spring DeHon 30 s t Visually, what is min-cut?

Penn ESE535 Spring DeHon 31 MaxFlow Set all edge flows to zero –F[u,v]=0 While there is a path from s,t –(breadth-first-search) –for each edge in path f[u,v]=f[u,v]+1 –f[v,u]=-f[u,v] –When c[v,u]=f[v,u] remove edge from search O(|E|*cutsize)

Example Flow cut Penn ESE535 Spring DeHon 32 A B C D E F G H I J s t Find a path?

Example Flow cut Penn ESE535 Spring DeHon 33 A B C D E F G H I J s t

Example Flow cut Penn ESE535 Spring DeHon 34 A B C D E F G H I J s t Find a path?

Example Flow cut Penn ESE535 Spring DeHon 35 A B C D E F G H I J s t

Example Flow cut Penn ESE535 Spring DeHon 36 A B C D E F G H I J s t Find a path?

Penn ESE535 Spring DeHon 37 Delay objective: –minimum height, K-feasible cut –I.e. cut no more than K edges –start by bounding fanin  K Height of node will be: –height of predecessors or –one greater than height of predecessors Check shorter first Flowmap Examples are K=4

Penn ESE535 Spring DeHon 38 Flowmap Construct flow problem –sink  target node being mapped –source  start set (primary inputs) –flow infinite into start set –flow of one on each link –to see if height same as predecessors collapse all predecessors of maximum height into sink (single node, cut must be above) height +1 case is trivially true

Penn ESE535 Spring DeHon Example Subgraph Target: K=4

Penn ESE535 Spring DeHon Trivial: Height +1

Penn ESE535 Spring DeHon Collapse at max height

Penn ESE535 Spring DeHon Collapse at max height Collapsed Node

Penn ESE535 Spring DeHon 43 Augmenting Flows Collapsed Node

Penn ESE535 Spring DeHon 44 Augmenting Flows Collapsed Node

Penn ESE535 Spring DeHon 45 Augmenting Flows Collapsed Node

Penn ESE535 Spring DeHon 46 Augmenting Flows Collapsed Node

Penn ESE535 Spring DeHon 47 Augmenting Flows Collapsed Node Collapsed Node

Penn ESE535 Spring DeHon Collapse at max height: works for K=4 Collapsed Node

Penn ESE535 Spring DeHon Collapse not work (K still 4) (different/larger graph) Forced to label height+1

Penn ESE535 Spring DeHon Reconvergent fanout (yet different graph) Can label at height 1

Penn ESE535 Spring DeHon 51 Flowmap Max-flow Min-cut algorithm to find cut Use augmenting paths until discover max flow > K O(K|E|) time to discover K-feasible cut –(or that does not exist) Depth identification: O(KN|E|)

Penn ESE535 Spring DeHon Mincut may not be unique Collapsed Node

Penn ESE535 Spring DeHon 53 Flowmap Min-cut may not be unique To minimize area achieving delay optimum –find max volume min-cut Compute max flow  find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set

Penn ESE535 Spring DeHon Collapse at max height: works for K=4 Collapsed Node

Penn ESE535 Spring DeHon BFS from Source Collapsed Node

Penn ESE535 Spring DeHon BFS from Source Collapsed Node

Penn ESE535 Spring DeHon BFS from Source Collapsed Node

Penn ESE535 Spring DeHon BFS from Source Collapsed Node Does not find rest.

Penn ESE535 Spring DeHon Max-Volume Mincut Collapsed Node

Penn ESE535 Spring DeHon 60 Flowmap Covering from labeling is straightforward –process in reverse topological order –allocate identified K-feasible cut to LUT –remove node –postprocess to minimize LUT count Notes: –replication implicit (covered multiple places) –nodes purely internal to one or more covers may not get their own LUTs

Penn ESE535 Spring DeHon 61 Flowmap Roundup Label –Work from inputs to outputs –Find max label of predecessors –Collapse new node with all predecessors at this label –Can find flow cut  K? Yes: mark with label (find max-volume cut extent) No: mark with label+1 Cover –Work from outputs to inputs –Allocate LUT for identified cluster/cover –Recurse covering selection on inputs to identified LUT

Penn ESE535 Spring DeHon 62 Area Changing Cost Functions Now (previous was delay)

Penn ESE535 Spring DeHon 63 DF-Map Duplication Free Mapping –can find optimal area under this constraint –(but optimal area may not be duplication free) [Cong+Ding, IEEE TR VLSI Sys. V2n2p137]

Penn ESE535 Spring DeHon 64 Maximum Fanout Free Cones MFFC: bit more general than trees

Penn ESE535 Spring DeHon 65 MFFC Follow cone backward end at node that fans out (has output) outside the cone

Penn ESE535 Spring DeHon 66 MFFC example A CD FG I B E H Identify FFC

Penn ESE535 Spring DeHon 67 MFFC example A CD FG I B E H

Penn ESE535 Spring DeHon 68 DF-Map Partition into graph into MFFCs Optimally map each MFFC In dynamic programming –for each node examine each K-feasible cut –note: this is very different than flowmap where only had to examine a single cut –Example to follow pick cut to minimize cost –1 +  cones for fanins

Penn ESE535 Spring DeHon 69 DF-Map Example Cones?

Penn ESE535 Spring DeHon 70 DF-Map Example

Penn ESE535 Spring DeHon 71 DF-Map Example

Penn ESE535 Spring DeHon 72 DF-Map Example

Penn ESE535 Spring DeHon 73 DF-Map Example

Penn ESE535 Spring DeHon 74 DF-Map Example

Penn ESE535 Spring DeHon 75 DF-Map Example Start mapping cone

Penn ESE535 Spring DeHon 76 DF-Map Example 11 11

Penn ESE535 Spring DeHon 77 DF-Map Example ? 11 11

Penn ESE535 Spring DeHon 78 DF-Map Example ? 11 11

Penn ESE535 Spring DeHon 79 DF-Map Example ? 11 11

Penn ESE535 Spring DeHon 80 DF-Map Example ? 11 11

Penn ESE535 Spring DeHon 81 DF-Map Example

Penn ESE535 Spring DeHon 82 DF-Map Example Similar to previous

Penn ESE535 Spring DeHon 83 DF-Map Example ?

Penn ESE535 Spring DeHon 84 DF-Map Example ?

Penn ESE535 Spring DeHon 85 DF-Map Example ?

Penn ESE535 Spring DeHon 86 DF-Map Example ?

Penn ESE535 Spring DeHon 87 DF-Map Example ?

Penn ESE535 Spring DeHon 88 DF-Map Example ?

Penn ESE535 Spring DeHon 89 DF-Map Example ?

Penn ESE535 Spring DeHon 90 DF-Map Example

Penn ESE535 Spring DeHon 91 DF-Map Example 2?

Penn ESE535 Spring DeHon 92 DF-Map Example 2?

Penn ESE535 Spring DeHon 93 DF-Map Example 2?

Penn ESE535 Spring DeHon 94 DF-Map Example 2?

Penn ESE535 Spring DeHon 95 DF-Map Example 2?

Penn ESE535 Spring DeHon 96 DF-Map Example 2?

Penn ESE535 Spring DeHon 97 DF-Map Example 2?

Penn ESE535 Spring DeHon 98 DF-Map Example

Penn ESE535 Spring DeHon 99 DF-Map Example ? 1 11

Penn ESE535 Spring DeHon 100 DF-Map Example ? 1 11

Penn ESE535 Spring DeHon 101 DF-Map Example ?

Penn ESE535 Spring DeHon 102 DF-Map Example ?

Penn ESE535 Spring DeHon 103 DF-Map Example ?

Penn ESE535 Spring DeHon 104 DF-Map Example ?

Penn ESE535 Spring DeHon 105 DF-Map Example ?

Penn ESE535 Spring DeHon 106 DF-Map Example ?

Penn ESE535 Spring DeHon 107 DF-Map Example ?

Penn ESE535 Spring DeHon 108 DF-Map Example ?

Penn ESE535 Spring DeHon 109 DF-Map Example ?

Penn ESE535 Spring DeHon 110 DF-Map Example

Penn ESE535 Spring DeHon 111 DF-Map Example ?

Penn ESE535 Spring DeHon 112 DF-Map Example ?

Penn ESE535 Spring DeHon 113 DF-Map Example ?

Penn ESE535 Spring DeHon 114 DF-Map Example ?

Penn ESE535 Spring DeHon 115 DF-Map Example ?

Penn ESE535 Spring DeHon 116 DF-Map Example ?

Penn ESE535 Spring DeHon 117 DF-Map Example ?

Penn ESE535 Spring DeHon 118 DF-Map Example

Penn ESE535 Spring DeHon 119 DF-Map Example

Penn ESE535 Spring DeHon 120 Composing Don’t need minimum delay off the critical path Don’t always want/need minimum delay Composite: –map with flowmap –Greedy decomposition of “most promising” non-critical nodes –DF-map these nodes

Penn ESE535 Spring DeHon 121 Variations on a Theme

Penn ESE535 Spring DeHon 122 Applicability to Non-LUTs? E.g. LUT Cascade –can handle some functions of K inputs How apply?

Penn ESE535 Spring DeHon 123 Adaptable to Non-LUTs Sketch: –Initial decomposition to nodes that will fit –Find max volume, min-height K-feasible cut –ask if logic block will cover yes  done no  exclude one (or more) nodes from block and repeat –exclude == collapse into start set nodes –this makes heuristic

Penn ESE535 Spring DeHon 124 Partitioning? Effectively partitioning logic into clusters –LUT cluster unlimited internal “gate” capacity limited I/O (K) simple delay cost model –1 cross between clusters –0 inside cluster

Penn ESE535 Spring DeHon 125 Partitioning Clustering –if strongly I/O limited, same basic idea works for partitioning to components typically: partitioning onto multiple FPGAs assumption: inter-FPGA delay >> intra-FPGA delay –w/ area constraints similar to non-LUT case –make min-cut –will it fit? –Exclude some LUTs and repeat

Penn ESE535 Spring DeHon 126 Clustering for Delay W/ no IO constraint area is monotone property DP-label forward with delays –grab up largest labels (greatest delays) until fill cluster size Work backward from outputs creating clusters as needed

Penn ESE535 Spring DeHon 127 Area and IO? Real problem: –FPGA/chip partitioning Doing both optimally is NP-hard Heuristic around IO cut first should do well –(e.g. non-LUT slide) –[Yang and Wong, FPGA’94]

Penn ESE535 Spring DeHon 128 Partitioning To date: –primarily used for 2-level hierarchy I.e. intra-FPGA, inter-FPGA Open/promising –adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule localize critical paths to smallest subtree possible?

Penn ESE535 Spring DeHon 129 Summary Optimal LUT mapping NP-hard in general –fanout, replication, …. K-LUTs makes delay optimal feasible –single constraint: IO capacity –technique: max-flow/min-cut Heuristic adaptations of basic idea to capacity constrained problem –promising area for interconnect delay optimization

Penn ESE535 Spring DeHon 130 Today’s Big Ideas: IO may be a dominant cost –limiting capacity, delay Exploit structure: K-LUTs Mixing dominant modes –multiple objectives Define optimally solvable subproblem –duplication free mapping

Penn ESE535 Spring DeHon 131 Admin Reading Wednesday on web Assignment 3 due Thursday