CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 13, 2009 Placement II (Simulated Annealing)
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Placement 1 Outline Goal What is Placement? Why Placement?
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 17: November 11, 2005 Placement (Simulated Annealing…)
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 2: January 6, 2006 Spatial Routing.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 12: February 6, 2006 Sorting.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 14, 2011 Placement (Intro, Constructive)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated Annealing…)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 10: February 1, 2006 Dynamic Programming.
UNIT IV INFRASTRUCTURE ESTABLISHMENT. INTRODUCTION When a sensor network is first activated, various tasks must be performed to establish the necessary.
Self-Hosted Placement for Massively Parallel Processor Arrays (MPPAs) Graeme Smecher, Steve Wilton, Guy Lemieux Thursday, December 10, 2009 FPT 2009.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 16, 2011 Placement II (Simulated Annealing)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: February 25, 2015 Placement (Intro, Constructive)
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
CS137 Electronic Design Automation Day 9: February 25, 2004 Placement II André DeHon, Michael Wrighton.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 21: November 28, 2005 Routing 1.
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
CS137: Electronic Design Automation
CS137: Electronic Design Automation
Intra-Domain Routing Jacob Strauss September 14, 2006.
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structure and Organization)
Presentation transcript:

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement

CALTECH CS137 Winter DeHon 2 Today Problem Parallelism Cellular Automata Idea Details –Avoid Local Minima –Update locations Results Directions Primary Sources –Wrighton&DeHon FPGA2003 –Wrighton MS Thesis 2003

CALTECH CS137 Winter DeHon 3 Placement Problem: Pick locations for all building blocks –minimizing energy, delay, area –really: minimize wire length minimize channel density –surrogates: Minimizing squared wire length Minimize bounding box

CALTECH CS137 Winter DeHon 4 Parallelism What parallelism exists in placement? –Evaluate costs of prospective moves One set to many perspective locations Many moves each to single location –Perform moves

CALTECH CS137 Winter DeHon 5 Cellular Automata Basic idea: regular array of identical cells with nearest-neighbor communication

CALTECH CS137 Winter DeHon 6 CA Model On each cycle: –Each cell exchanges values with neighbors –Updates state/value based on own state and that of neighbors –E.g. Conway’s LIFE

CALTECH CS137 Winter DeHon 7 Cellular Automata Physical Advantage: –No long wires Area linear in number of nodes Minimum delay  small cycle time Good scaling properties

CALTECH CS137 Winter DeHon 8 System Architecture Taxonomy (Subject to continuing refinement and embellishment)

CALTECH CS137 Winter DeHon 9 CA Placement Can we perform placement in a CA?

CALTECH CS137 Winter DeHon 10 Mapping Each cell is a physical placement location State is a logical node assigned to the cell Assume: –Cell knows own location –State knows location of connected nodes

CALTECH CS137 Winter DeHon 11 Costs Assume: –Cell knows own location –State knows location of connected nodes Cell computes: its cost at that location

CALTECH CS137 Winter DeHon 12 Moves Two adjacent cells can exchange graph nodes

CALTECH CS137 Winter DeHon 13 Moves Evaluate goodness of proposed swap –Each cell considers impact of its graph node being in the other cell –Keep if swap reduces cost

CALTECH CS137 Winter DeHon 14 Move Costs Only really need to evaluate delta cost (src.x-sink.x) 2 Moving sink d/dx=-2 (src.x-sink.x) Delta move cost is linear distance

CALTECH CS137 Winter DeHon 15 Parallel Swaps Pair up and perform N/2 swaps in parallel

CALTECH CS137 Winter DeHon 16 Movement Alternate pairings with N,S,E,W neighbor  move any directions

CALTECH CS137 Winter DeHon 17 Basic Idea Pair up PEs Compute impact of swaps in parallel Perform swaps in parallel Repeat until converge

CALTECH CS137 Winter DeHon 18 Problems/Details Greedy swaps  local minima? How update location of neighbors? –…they are moving, too

CALTECH CS137 Winter DeHon 19 Avoid Greedy Insert randomness in swaps  Simulated Annealing Shake up system to get out of local minima Swap if –Randomly decide to swap –OR beneficial to swap Change swap thresholds over time

CALTECH CS137 Winter DeHon 20 Swap?

CALTECH CS137 Winter DeHon 21 Impact of Randomness

CALTECH CS137 Winter DeHon 22 Range Limiting Eurgo, Hauck, & Sharma DAC 2005

CALTECH CS137 Winter DeHon 23 Local Swaps Only Assume there’s an ideal location Each node takes a biased Random Walk away from minimum cost location Gives node a distribution function around the minimum cost location If wander into a better “minimum cost” home, then wanders around new centerpoint Decreasing temperature restricts effective radius of walk

CALTECH CS137 Winter DeHon 24 Local Swap Random Walk Decreasing temperature restricts effective radius of walk

CALTECH CS137 Winter DeHon 25 How update locations? Broadcast? Pipelined Ring? Send to neighbors? –Routing network? Tree? For whom? –Everyone? Only things moved? Only things moved a lot?

CALTECH CS137 Winter DeHon 26 Simple Solution: Ring Drop value in ring Shift around entire array Everyone listens for updates

CALTECH CS137 Winter DeHon 27 Simple Solution: Ring Weakness? –Serial –N cycles to complete –N/2 swaps in O(1) –Then O(N) to update?

CALTECH CS137 Winter DeHon 28 Simple Solution: Ring Linear update bad Idea: allow staleness –Things move slowly –Estimate of position not that bad… –…and continued operation will correct…

CALTECH CS137 Winter DeHon 29 Algorithm

CALTECH CS137 Winter DeHon 30 Algorithm Update Locations

CALTECH CS137 Winter DeHon 31 Algorithm Try Moves

CALTECH CS137 Winter DeHon 32 Quality vs. Parameters

CALTECH CS137 Winter DeHon 33 Iso-Quality Pick point on Iso-Quality Curve that minimizes time

CALTECH CS137 Winter DeHon 34 FPGA Implementation Virtex E (180nm) 10ns cycle (100MHz) 150 cycles for 4-phase swap –(~40 cycles/swap) 400 LUTs / Placement Engine Comparing –2.2GHz Intel Xeon (L2 512KB)

CALTECH CS137 Winter DeHon 35 Results

CALTECH CS137 Winter DeHon 36 Tuning Quality

CALTECH CS137 Winter DeHon 37 Scaling Processor cycles O(N 4/3 ) –VPR Systolic cycles –O(N 1/2 ) – assume geometric refinement; O(N 1/2 ) update –O(N 5/6 ) – mesh sort, same number of swaps as VPR (N 4/3 / N 1/2 )

CALTECH CS137 Winter DeHon 38 Scaling Also includes technology scaling

CALTECH CS137 Winter DeHon 39 Variations Update Schemes Cost Functions Larger bins than PEs

CALTECH CS137 Winter DeHon 40 Update Scheme: Tree Build Reduce Tree (H-Tree) Route to route in O(N 1/2 ) time Route from root to leaves in O(N 1/2 ) times Pipeline Same bandwidth as Ring (1/cycle) But less staleness (only O(N 1/2 ))

CALTECH CS137 Winter DeHon 41 Reducing Broadcast (Idea 1) Don’t update things that haven’t moved (much) –…or things that move and move back before broadcast Keep track of staleness –How far moved from last broadcast Give priority to stalest data Max staleness wins at each tree stage –Break ties with randomness

CALTECH CS137 Winter DeHon 42 Reducing Broadcast (Idea 2) Update locally Don’t need to know if someone far away moved by 1 square …but need to know if near neighbor did Multigrid/multiscale scheme –Only alert nodes in same subtree –When change subtrees at a level, alert all nodes underneath

CALTECH CS137 Winter DeHon 43 Update Scheme: Mesh Route Can Route a permutation in O(N 1/2 ) time on a mesh Build mesh switching Make O(N) swaps Then take O(N 1/2 ) time moving/updating Becomes full simulated annealing –i.e. not just local swaps

CALTECH CS137 Winter DeHon 44 Cost Functions

CALTECH CS137 Winter DeHon 45 Cost Functions Bounding Box  2 phase update –Phase 1: alert source to location of all sinks –Phase 2: source communicates bbox extents to all sinks

CALTECH CS137 Winter DeHon 46 Timing Linear Update: –Topological ordering of netlist –Use tree to distribute updates –Send updates in netlist order –  get delay in one pass Mesh: –Compute directly with dataflow-style spreading activation Wait for all inputs; then send output

CALTECH CS137 Winter DeHon 47 Bins

CALTECH CS137 Winter DeHon 48 Node Bins Keep more than one graph node per PE Local swap of one node from each PE node set each step –One with largest benefit? –Randomly select based on cost/benefit? Like rejectionnless annealing

CALTECH CS137 Winter DeHon 49 Admin Parallel Prefix familiarity? Due today: literature review There is class on Monday