CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement
CALTECH CS137 Winter DeHon 2 Today Problem Parallelism Cellular Automata Idea Details –Avoid Local Minima –Update locations Results Directions Primary Sources –Wrighton&DeHon FPGA2003 –Wrighton MS Thesis 2003
CALTECH CS137 Winter DeHon 3 Placement Problem: Pick locations for all building blocks –minimizing energy, delay, area –really: minimize wire length minimize channel density –surrogates: Minimizing squared wire length Minimize bounding box
CALTECH CS137 Winter DeHon 4 Parallelism What parallelism exists in placement? –Evaluate costs of prospective moves One set to many perspective locations Many moves each to single location –Perform moves
CALTECH CS137 Winter DeHon 5 Cellular Automata Basic idea: regular array of identical cells with nearest-neighbor communication
CALTECH CS137 Winter DeHon 6 CA Model On each cycle: –Each cell exchanges values with neighbors –Updates state/value based on own state and that of neighbors –E.g. Conway’s LIFE
CALTECH CS137 Winter DeHon 7 Cellular Automata Physical Advantage: –No long wires Area linear in number of nodes Minimum delay small cycle time Good scaling properties
CALTECH CS137 Winter DeHon 8 System Architecture Taxonomy (Subject to continuing refinement and embellishment)
CALTECH CS137 Winter DeHon 9 CA Placement Can we perform placement in a CA?
CALTECH CS137 Winter DeHon 10 Mapping Each cell is a physical placement location State is a logical node assigned to the cell Assume: –Cell knows own location –State knows location of connected nodes
CALTECH CS137 Winter DeHon 11 Costs Assume: –Cell knows own location –State knows location of connected nodes Cell computes: its cost at that location
CALTECH CS137 Winter DeHon 12 Moves Two adjacent cells can exchange graph nodes
CALTECH CS137 Winter DeHon 13 Moves Evaluate goodness of proposed swap –Each cell considers impact of its graph node being in the other cell –Keep if swap reduces cost
CALTECH CS137 Winter DeHon 14 Move Costs Only really need to evaluate delta cost (src.x-sink.x) 2 Moving sink d/dx=-2 (src.x-sink.x) Delta move cost is linear distance
CALTECH CS137 Winter DeHon 15 Parallel Swaps Pair up and perform N/2 swaps in parallel
CALTECH CS137 Winter DeHon 16 Movement Alternate pairings with N,S,E,W neighbor move any directions
CALTECH CS137 Winter DeHon 17 Basic Idea Pair up PEs Compute impact of swaps in parallel Perform swaps in parallel Repeat until converge
CALTECH CS137 Winter DeHon 18 Problems/Details Greedy swaps local minima? How update location of neighbors? –…they are moving, too
CALTECH CS137 Winter DeHon 19 Avoid Greedy Insert randomness in swaps Simulated Annealing Shake up system to get out of local minima Swap if –Randomly decide to swap –OR beneficial to swap Change swap thresholds over time
CALTECH CS137 Winter DeHon 20 Swap?
CALTECH CS137 Winter DeHon 21 Impact of Randomness
CALTECH CS137 Winter DeHon 22 Range Limiting Eurgo, Hauck, & Sharma DAC 2005
CALTECH CS137 Winter DeHon 23 Local Swaps Only Assume there’s an ideal location Each node takes a biased Random Walk away from minimum cost location Gives node a distribution function around the minimum cost location If wander into a better “minimum cost” home, then wanders around new centerpoint Decreasing temperature restricts effective radius of walk
CALTECH CS137 Winter DeHon 24 Local Swap Random Walk Decreasing temperature restricts effective radius of walk
CALTECH CS137 Winter DeHon 25 How update locations? Broadcast? Pipelined Ring? Send to neighbors? –Routing network? Tree? For whom? –Everyone? Only things moved? Only things moved a lot?
CALTECH CS137 Winter DeHon 26 Simple Solution: Ring Drop value in ring Shift around entire array Everyone listens for updates
CALTECH CS137 Winter DeHon 27 Simple Solution: Ring Weakness? –Serial –N cycles to complete –N/2 swaps in O(1) –Then O(N) to update?
CALTECH CS137 Winter DeHon 28 Simple Solution: Ring Linear update bad Idea: allow staleness –Things move slowly –Estimate of position not that bad… –…and continued operation will correct…
CALTECH CS137 Winter DeHon 29 Algorithm
CALTECH CS137 Winter DeHon 30 Algorithm Update Locations
CALTECH CS137 Winter DeHon 31 Algorithm Try Moves
CALTECH CS137 Winter DeHon 32 Quality vs. Parameters
CALTECH CS137 Winter DeHon 33 Iso-Quality Pick point on Iso-Quality Curve that minimizes time
CALTECH CS137 Winter DeHon 34 FPGA Implementation Virtex E (180nm) 10ns cycle (100MHz) 150 cycles for 4-phase swap –(~40 cycles/swap) 400 LUTs / Placement Engine Comparing –2.2GHz Intel Xeon (L2 512KB)
CALTECH CS137 Winter DeHon 35 Results
CALTECH CS137 Winter DeHon 36 Tuning Quality
CALTECH CS137 Winter DeHon 37 Scaling Processor cycles O(N 4/3 ) –VPR Systolic cycles –O(N 1/2 ) – assume geometric refinement; O(N 1/2 ) update –O(N 5/6 ) – mesh sort, same number of swaps as VPR (N 4/3 / N 1/2 )
CALTECH CS137 Winter DeHon 38 Scaling Also includes technology scaling
CALTECH CS137 Winter DeHon 39 Variations Update Schemes Cost Functions Larger bins than PEs
CALTECH CS137 Winter DeHon 40 Update Scheme: Tree Build Reduce Tree (H-Tree) Route to route in O(N 1/2 ) time Route from root to leaves in O(N 1/2 ) times Pipeline Same bandwidth as Ring (1/cycle) But less staleness (only O(N 1/2 ))
CALTECH CS137 Winter DeHon 41 Reducing Broadcast (Idea 1) Don’t update things that haven’t moved (much) –…or things that move and move back before broadcast Keep track of staleness –How far moved from last broadcast Give priority to stalest data Max staleness wins at each tree stage –Break ties with randomness
CALTECH CS137 Winter DeHon 42 Reducing Broadcast (Idea 2) Update locally Don’t need to know if someone far away moved by 1 square …but need to know if near neighbor did Multigrid/multiscale scheme –Only alert nodes in same subtree –When change subtrees at a level, alert all nodes underneath
CALTECH CS137 Winter DeHon 43 Update Scheme: Mesh Route Can Route a permutation in O(N 1/2 ) time on a mesh Build mesh switching Make O(N) swaps Then take O(N 1/2 ) time moving/updating Becomes full simulated annealing –i.e. not just local swaps
CALTECH CS137 Winter DeHon 44 Cost Functions
CALTECH CS137 Winter DeHon 45 Cost Functions Bounding Box 2 phase update –Phase 1: alert source to location of all sinks –Phase 2: source communicates bbox extents to all sinks
CALTECH CS137 Winter DeHon 46 Timing Linear Update: –Topological ordering of netlist –Use tree to distribute updates –Send updates in netlist order – get delay in one pass Mesh: –Compute directly with dataflow-style spreading activation Wait for all inputs; then send output
CALTECH CS137 Winter DeHon 47 Bins
CALTECH CS137 Winter DeHon 48 Node Bins Keep more than one graph node per PE Local swap of one node from each PE node set each step –One with largest benefit? –Randomly select based on cost/benefit? Like rejectionnless annealing
CALTECH CS137 Winter DeHon 49 Admin Parallel Prefix familiarity? Due today: literature review There is class on Monday