Penn ESE535 Spring 2013 -- DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)

Slides:

Advertisements

Similar presentations

25 May Quick Sort (11.2) CSE 2011 Winter 2011.

Advertisements

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 27, 2008 Partitioning 2 (spectral, network flow, replication)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 13, 2009 Placement II (Simulated Annealing)

VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.

CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 23, 2008 Covering.

EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.

EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.

CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.

EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 16: March 23, 2009 Covering.

EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.

Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2009 Partitioning 2 (spectral, network flow)

EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.

Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 26, 2015 Covering Work preclass exercise.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 16, 2015 Scheduling Variants and Approaches.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 14, 2011 Placement (Intro, Constructive)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: February 4, 2014 Partitioning 2 (spectral, network flow)

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated Annealing…)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 13: March 3, 2015 Routing 1.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.

Circuit Partitioning Divides circuit into smaller partitions that can be efficiently handled Goal is generally to minimize communication between balanced.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)

CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 11, 2002 Partitioning 2 (spectral, network flow, replication)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 16, 2011 Placement II (Simulated Annealing)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)

3/21/ VLSI Physical Design Automation Prof. David Pan Office: ACES Lecture 4. Circuit Partitioning (II)

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: February 25, 2015 Placement (Intro, Constructive)

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.

CS137: Electronic Design Automation

CS137: Electronic Design Automation

CS137: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE534: Computer Organization

ESE534: Computer Organization

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

Presentation transcript:

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)

Preclass Warmup What cut size were you able to achieve? Penn ESE535 Spring DeHon 2

3 Today Partitioning –why important Can be used as tool at many levels –practical attack –variations and issues Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

Penn ESE535 Spring DeHon 4 Motivation (1) Divide-and-conquer –trivial case: decomposition –smaller problems easier to solve net win, if super linear Part(n) + 2  T(n/2) < T(n) –problems with sparse connections or interactions –Exploit structure limited cutsize is a common structural property random graphs would not have as small cuts

Penn ESE535 Spring DeHon 5 Motivation (2) Cut size (bandwidth) can determine –Area, energy Minimizing cuts –minimize interconnect requirements –increases signal locality Chip (board) partitioning –minimize IO Direct basis for placement Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

Penn ESE535 Spring DeHon 6 Bisection Width Partition design into two equal size halves –Minimize wires (nets) with ends in both halves Number of wires crossing is bisection width lower bw = more locality N/2 cutsize

Penn ESE535 Spring DeHon 7 Interconnect Area Bisection width is lower-bound on IC width –When wire dominated, may be tight bound (recursively)

Penn ESE535 Spring DeHon 8 Classic Partitioning Problem Given: netlist of interconnect cells Partition into two (roughly) equal halves (A,B) minimize the number of nets shared by halves “Roughly Equal” –balance condition: (0.5-  )N  |A|  (0.5+  )N

Penn ESE535 Spring DeHon 9 Balanced Partitioning NP-complete for general graphs –[ND17: Minimum Cut into Bounded Sets, Garey and Johnson] –Reduce SIMPLE MAX CUT –Reduce MAXIMUM 2-SAT to SMC –Unbalanced partitioning poly time Many heuristics/attacks

Penn ESE535 Spring DeHon 10 KL FM Partitioning Heuristic Greedy, iterative –pick cell that decreases cut and move it –repeat small amount of non-greediness: –look past moves that make locally worse –randomization

Penn ESE535 Spring DeHon 11 Fiduccia-Mattheyses (Kernighan-Lin refinement) Start with two halves (random split?) Repeat until no updates –Start with all cells free –Repeat until no cells free Move cell with largest gain (balance allows) Update costs of neighbors Lock cell in place (record current cost) –Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points

Penn ESE535 Spring DeHon 12 Efficiency Tricks to make efficient: Expend little work picking move candidate –Constant work ≡ O(1) –Means amount of work not dependent on problem size Update costs on move cheaply [O(1)] Efficient data structure –update costs cheap –cheap to find next move

Penn ESE535 Spring DeHon 13 Ordering and Cheap Update Keep track of Net gain on node == delta net crossings to move a node  cut cost after move = cost - gain Calculate node gain as  net gains for all nets at that node –Each node involved in several nets Sort nodes by gain –Avoid full resort every move B A C

Penn ESE535 Spring DeHon 14 FM Cell Gains Gain = Delta in number of nets crossing between partitions = Sum of net deltas for nets on the node

Penn ESE535 Spring DeHon 15 After move node? Update cost –Newcost=cost-gain Also need to update gains –on all nets attached to moved node –but moves are nodes, so push to all nodes affected by those nets

Penn ESE535 Spring DeHon 16 Composability of Net Gains = -1

Penn ESE535 Spring DeHon 17 FM Recompute Cell Gain For each net, keep track of number of cells in each partition [F(net), T(net)] Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net (think -1 => 0)

Penn ESE535 Spring DeHon 18 FM Recompute Cell Gain For each net, keep track of number of cells in each partition [F(net), T(net)] Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net (think -1 => 0) –if T(net)==1, decrement gain on T side of net (think 1=>0)

Penn ESE535 Spring DeHon 19 FM Recompute Cell Gain Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net –if T(net)==1, decrement gain on T side of net –decrement F(net), increment T(net)

Penn ESE535 Spring DeHon 20 FM Recompute Cell Gain Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net –if T(net)==1, decrement gain on T side of net –decrement F(net), increment T(net) –if F(net)==1, increment gain on F cell

Penn ESE535 Spring DeHon 21 FM Recompute Cell Gain Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net –if T(net)==1, decrement gain on T side of net –decrement F(net), increment T(net) –if F(net)==1, increment gain on F cell –if F(net)==0, decrement gain on all cells (T)

Penn ESE535 Spring DeHon 22 FM Recompute Cell Gain For each net, keep track of number of cells in each partition [F(net), T(net)] Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net (think -1 => 0) –if T(net)==1, decrement gain on T side of net (think 1=>0) –decrement F(net), increment T(net) –if F(net)==1, increment gain on F cell –if F(net)==0, decrement gain on all cells (T)

Penn ESE535 Spring DeHon 23 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes]

Penn ESE535 Spring DeHon 24 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes] +1

Penn ESE535 Spring DeHon 25 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes]

Penn ESE535 Spring DeHon 26 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes]

Penn ESE535 Spring DeHon 27 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes]

Penn ESE535 Spring DeHon 28 FM Recompute (example) [note markings here are deltas…earlier pix were absolutes]

Penn ESE535 Spring DeHon 29 FM Data Structures Partition Counts A,B Two gain arrays –One per partition –Key: constant time cell update Cells –successors (consumers) –inputs –locked status Binned by cost  constant time update

Use FM to Partition Preclass Example Allow partition of size 5 Penn ESE535 Spring DeHon 30

Penn ESE535 Spring DeHon 31 FM Optimization Sequence (ex)

Penn ESE535 Spring DeHon 32 FM Running Time? Randomly partition into two halves Repeat until no updates –Start with all cells free –Repeat until no cells free Move cell with largest gain Update costs of neighbors Lock cell in place (record current cost) –Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points

Penn ESE535 Spring DeHon 33 FM Running Time Assume: –constant number of passes to converge –constant number of random starts N cell updates each round (swap) Updates K + fanout work (avg. fanout K) –assume at most K inputs to each node –For every net attached (K+1) For every node attached to those nets (O(K)) Maintain ordered list O(1) per move –every io move up/down by 1 Running time: O(K 2 N) –Algorithm significant for its speed (more than quality)

Penn ESE535 Spring DeHon 34 FM Starts? 21K random starts, 3K network -- Alpert/Kahng So, FM gives a not bad solution quickly

Penn ESE535 Spring DeHon 35 Weaknesses? Local, incremental moves only –hard to move clusters –no lookahead –Stuck in local minima? Looks only at local structure

Penn ESE535 Spring DeHon 36 Improving FM Clustering Initial partitions Runs Partition size freedom Following comparisons from Hauck and Boriello ‘96

Penn ESE535 Spring DeHon 37 Clustering Group together several leaf cells into cluster Run partition on clusters Uncluster (keep partitions) –iteratively Run partition again –using prior result as starting point instead of random start

Penn ESE535 Spring DeHon 38 Clustering Benefits Catch local connectivity which FM might miss –moving one element at a time, hard to see move whole connected groups across partition Faster (smaller N) –METIS -- fastest research partitioner exploits heavily

Penn ESE535 Spring DeHon 39 How Cluster? Random –cheap, some benefits for speed Greedy “connectivity” –examine in random order –cluster to most highly connected –30% better cut, 16% faster than random Spectral (next week) –look for clusters in placement –(ratio-cut like) Brute-force connectivity (can be O(N 2 ))

Penn ESE535 Spring DeHon 40 Initial Partitions? Random Pick Random node for one side –start imbalanced –run FM from there Pick random node and Breadth-first search to fill one half Pick random node and Depth-first search to fill half Start with Spectral partition

Penn ESE535 Spring DeHon 41 Initial Partitions If run several times –pure random tends to win out –more freedom / variety of starts –more variation from run to run –others trapped in local minima

Penn ESE535 Spring DeHon 42 Number of Runs

Penn ESE535 Spring DeHon 43 Number of Runs % % 20 <20% 50 < 22% …but? 21K random starts, 3K network Alpert/Kahng

Penn ESE535 Spring DeHon 44 Unbalanced Cuts Increasing slack in partitions –may allow lower cut size

Penn ESE535 Spring DeHon 45 Unbalanced Partitions Following comparisons from Hauck and Boriello ‘96 Small/large is benchmark size not small/large partition IO.

Penn ESE535 Spring DeHon 46 Partitioning Summary Decompose problem Find locality NP-complete problem linear heuristic (KLFM) many ways to tweak –Hauck/Boriello, Karypis

Penn ESE535 Spring DeHon 47 Today’s Big Ideas: Divide-and-Conquer Exploit Structure –Look for sparsity/locality of interaction Techniques: –greedy –incremental improvement –randomness avoid bad cases, local minima –incremental cost updates (time cost) –efficient data structures

Penn ESE535 Spring DeHon 48 Admin Reading for Monday online Assignment 2A due on Monday