CALTECH CS137 Winter2006 -- DeHon 1 CS137: Electronic Design Automation Day 2: January 6, 2006 Spatial Routing.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 22: December 2, 2005 Routing 2 (Pathfinder)
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.
Penn ESE 535 Spring DeHon 1 ESE535: Electronic Design Automation Day 12: February 23, 2011 Routing 2 (Pathfinder)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Routing 1 Outline –What is Routing? –Why Routing? –Routing Algorithms Overview –Global Routing –Detail Routing –Shortest Path Algorithms Goal –Understand.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
EDA (CS286.5b) Day 14 Routing (Pathfind, netflow).
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 21, 2003 Retiming 2: Structures and Balance.
Wireless Sensor Network for Tracking the Traffic in INTERNET Network Routers Part 2 Supervisor:Mark Shifrin Students:Yuri Kipnis Nir Bar-Or Date:30 September.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Penn ESE 535 Spring DeHon 1 ESE535: Electronic Design Automation Day 22: April 20, 2009 Routing 2 (Pathfinder)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Lecture 5: FPGA Routing September 17, 2013 ECE 636 Reconfigurable Computing Lecture 5 FPGA Routing.
Penn ESE 535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 16, 2008 Routing 2 (Pathfinder)
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
Penn ESE 535 Spring DeHon 1 ESE535: Electronic Design Automation Day 14: March 16, 2015 Routing 2 (Pathfinder)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: February 27, 2002 Routing 2 (Pathfinder)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Network Topologies Created by: Ghadeer H. Abosaeed June 22, 2012.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.
FPGA Routing Pathfinder [Ebeling, et al., 1995] Introduced negotiated congestion During each routing iteration, route nets using shortest.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 21: November 28, 2005 Routing 1.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
CS161 – Design and Architecture of Computer
ESE534: Computer Organization
ESE534: Computer Organization
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
CS137: Electronic Design Automation
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
CS137: Electronic Design Automation
CS137: Electronic Design Automation
ESE534: Computer Organization
ESE535: Electronic Design Automation
CS184a: Computer Architecture (Structures and Organization)
CS137: Electronic Design Automation
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
Presentation transcript:

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 2: January 6, 2006 Spatial Routing

CALTECH CS137 Winter DeHon 2 Today Idea Challenges –Path Selection –Victimization –Allocation Methodology Quality, Timing Parallelism Mesh FPGA Implementation

CALTECH CS137 Winter DeHon 3 Global/Detail With limited switching (e.g. FPGA) –can represent routing graph exactly CS137a: Day22

CALTECH CS137 Winter DeHon 4 Pathfinder Review Key step: find-shortest path from src to sink –Mark links by usage –Used links cost most –Shortest path tries to avoid Negotiated Congestion w/ History –Increase cost of congested nodes –Adaptive cost … makes historically congest nodes expensive, try to avoid

CALTECH CS137 Winter DeHon 5 Slow? Why is routing slow? –Each route: search all possible paths from source to sink Number of paths expands as distance 2 Graph of network is MBs large –Large complicated data structure to walk –Won’t all fit in cache –Number of nets = Number of edges –Perform many iterations to converge

CALTECH CS137 Winter DeHon 6 Parallelism? Search all paths in parallel for a single route Search routes for multiple nets in parallel –Don’t overlap –Overlap?

CALTECH CS137 Winter DeHon 7 Initial Key Ideas Augment existing static network structure to route itself Use hardware to exploit parallelism in routing –Search all paths in parallel –Route multiple nets in parallel –Avoid walking irregular graph –Specialized/pipelined hardware at each switch Hardware can perform a route trial in 10s of cycles vs. 10K-100K cycles for software

CALTECH CS137 Winter DeHon 8 Hardware Route Search in Action 2 4

CALTECH CS137 Winter DeHon 9 Path Search Hardware

CALTECH CS137 Winter DeHon 10 Path Search Hardware Idea Existing paths already allocated Drive a one into search paths All free paths pass up

CALTECH CS137 Winter DeHon 11 Challenges How select among paths? What if there are no free paths? Can we work without Pathfinder’s history? How handle fanout? How handle allocation and victimization?

CALTECH CS137 Winter DeHon 12 Select Among Paths? Easy: Randomly –Use PRNG at xover switchbox Otherwise, need to represent costs…

CALTECH CS137 Winter DeHon 13 No Paths? Try stealing a path (rip-up)  victimize existing path Which one? –Randomly select victim –History-free Pathfinder suggest: one with least nets shared with other routes  CountCost –CountNet: one which intersects least existing nets

CALTECH CS137 Winter DeHon 14 CountNet vs. CountCost CountCost: 6 CountNetCost: 1

CALTECH CS137 Winter DeHon 15 Implement Counting? Idea: Delay congested signal Free paths not delayed. Least congested signal arrives at xover first.

CALTECH CS137 Winter DeHon 16 CountNet Approximation Keeping track of which net uses a switch would be much more state/complicated Approximate CountNet by only delaying at conflicting switches

CALTECH CS137 Winter DeHon 17 Implement CounNet Approximation Allow to pass if agrees with switch setting.

CALTECH CS137 Winter DeHon 18 Cost is max of sides Also note: –Actual cost is max(src  xover,sink  xover) instead of sum

CALTECH CS137 Winter DeHon 19 Algorithm Comparison – Random Netlist Total Channels HSRA Array Size

CALTECH CS137 Winter DeHon 20 How Improve? Apologize for lack of history? –Exploit fast –Try multiple starts and exploit randomness –Like multiple starts of FM

CALTECH CS137 Winter DeHon 21 Trading Routing Time for Quality

CALTECH CS137 Winter DeHon 22 Choosing the Right Victims

CALTECH CS137 Winter DeHon 23 CountNet CountNet  best of 20 starts.

CALTECH CS137 Winter DeHon 24 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths.

CALTECH CS137 Winter DeHon 25 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths.

CALTECH CS137 Winter DeHon 26 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths.

CALTECH CS137 Winter DeHon 27 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths. Add a state bit at every switch –Set when allocate during the current net search. –Clear when we begin to route a new net Order the destinations associated with a single source For each destination, –Search from sink as before (only from sink) –At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. –Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

CALTECH CS137 Winter DeHon 28 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths.

CALTECH CS137 Winter DeHon 29 Hypergraphs (Fanout) Sequentially route each two-point net, trying to re- use as much as possible from existing allocated paths. Add a state bit at every switch –Set when allocate during the current net search. –Clear when we begin to route a new net Order the destinations associated with a single source For each destination, –Search from sink as before (only from sink) –At the switch, if the state bit is set and the sink side is congestion free, we have found an available path. –Otherwise, drive ones into all available source paths and allocate a new path, like a standard route search.

CALTECH CS137 Winter DeHon 30 High Fanout Nets Victimizing high fanout net will cause considerable re-route work Might want to penalize victimizing high fanout nets CountNetFanout? –Requires more state…expensive… Simple hack: lock high fanout nets against victimization –What’s a high fanout net? >10?

CALTECH CS137 Winter DeHon 31 Toronto20  Quality PathfinderCountNet alu apex apex bigkey98.01 clma des diffeq89.84 dsip98.10 elliptic ex ex5p frisc misex pdc s s s seq spla tseng Total

CALTECH CS137 Winter DeHon 32 So far All Quality –…haven’t dealt with all performance details Had basis for confidence in performance Wanted to make sure worthwhile first

CALTECH CS137 Winter DeHon 33 Hardware Allocation Add all nets to R While nets in R > 0 and routeTrial < RT max For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Select a route to victimize and allocate the route Endfor Adjust R Endwhile Idea: send one down selected path

CALTECH CS137 Winter DeHon 34 With Victimization Add all nets to R While nets in R > 0 and routeTrial < RT max For each unrouted net Find all possible routes If found possible routes Randomly select and allocate a route Else Randomly select a route to victimize and allocate the route Endfor Adjust R Endwhile

CALTECH CS137 Winter DeHon 35 Analysis Methodology Sequential version that does effectively the same thing (perhaps inefficiently) Count key operations/variables –Number of net searches –Number of victims Timing model for key operations Calculate Performance under various timing assumptions

CALTECH CS137 Winter DeHon 36 Timing Models Hardware Timing –Tpath = length of path ~= log(N) –Tallocate~=Tpath –Tvictim~=4*Tpath Software Timing –Tallocate~=Npathsw*(Tm+Tc+Twb+Ta) –Tvictim~=Npathsw*(Tm+Tc)+V*Talloc Tm=main memory ref Tc=cache ref; Twb=write buffer; Ta=bit alloc

CALTECH CS137 Winter DeHon 37 Route Time N try – number of route starts N RT – number of path searches N RO – number of rip ups N FO – number of fanout searches N FOA – number of fanout allocations

CALTECH CS137 Winter DeHon 38 Raw Data

CALTECH CS137 Winter DeHon 39

CALTECH CS137 Winter DeHon 40 Making comparisons There is a quality/time tradeoffs Want to compare at iso-quality

CALTECH CS137 Winter DeHon 41

CALTECH CS137 Winter DeHon 42 More Parallelism Only exploiting parallelism in path search Subtrees are independent Route root Then route next two channels in parallel Then route next 4…

CALTECH CS137 Winter DeHon 43

CALTECH CS137 Winter DeHon 44 Still Not Exploiting Multiple path searches in parallel that overlap routing resources…

CALTECH CS137 Winter DeHon 45 Extension to Mesh Networks No well defined crossover point. Path back to the source is not implied directly by the topology of the routing network. Paths of different length – and non-minimal length paths may be important components of a good solution.

CALTECH CS137 Winter DeHon 46 Mesh Approach Single-ended search from source Larger delay on congestion  allow non-minimal length paths Breadcrumb approach  leave state in switches pointing back to source

CALTECH CS137 Winter DeHon 47 Extension to Mesh Networks

CALTECH CS137 Winter DeHon 48 Extension to Mesh Networks - Results VPR 4.3Hardware Router Design Quality Fast(ms)Rnd atomic (  s) 5xp c ex mm9a s (ms) s526n (Simulator too slow to run larger)

CALTECH CS137 Winter DeHon 49 BFT FPGA Implementation 21 4-LUTs to implement switch logic +9 4-LUTs to manage prng/allocation =30 4-LUTs/T-switch 13/3 switches/PE/domain  LUTs/PE/domain C=10  LUTs / PE

CALTECH CS137 Winter DeHon 50 Mesh FPGA Implementation

CALTECH CS137 Winter DeHon 51 FPGA Implementation Slow clock –3ns vs. 0.3ns? FPGAs Speedup/FPGA –0.5  1

CALTECH CS137 Winter DeHon 52 Saving Area Allocation and Victimization occur in a single domain –All other domains idle Maybe only implement a single physical domain? –Pipeline (C-slow) path search through other domains Slight slowdown, big area savings???

CALTECH CS137 Winter DeHon 53 Admin Read GraphStep for Monday –(paper not due until next Friday, so feedback welcomed…) Monday: GraphStep talk Friday: Project Selection Due

CALTECH CS137 Winter DeHon 54 Big Ideas Parallelism Avoiding bad memory hierarchy Specialization Simple/Lightweight algorithm –Fast “dumb” alg. vs. slow/stateful alg.