ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT
Previously Tree-based network –O(N) switches with linear population Mesh Interconnect –O(N p+0.5 ) switches with linear population Segmentation in Mesh ESE Spring DeHon 2
Today Directional Drive Mesh-of-Trees (MoT) Multi-level metal Heterogeneous Blocks Unified Network Design (time permit) ESE Spring DeHon 3
4 Bidirectional Any wire can send data any direction. Horizontal –Can route right or left Vertical –Can route up or down
Buffered Bidirectional Wires Logic C Block S Block 5 ESE Spring DeHon
6 Bidirectional Any wire can send data any direction. Horizontal –Can route right or left Vertical –Can route up or down Pros? Cons?
Tristate vs. Buffer Which is faster? ESE Spring DeHon 7
Buffered Switch Composition ESE Spring DeHon 8 Making a buffer tristateable makes it larger and diminishes its drive strength.
Directional Drive Slides and Study from Guy Lemieux (Paper FPT2004 – Tutorial FPT 2009) ESE Spring DeHon 9
Bidirectional Wires Problem Half of tristate buffers left unused Buffers + input muxes dominate interconnect area 10 ESE Spring DeHon
Bidirectional vs Directional 11 ESE Spring DeHon
Bidirectional vs Directional 12 ESE Spring DeHon
Bidirectional vs Directional 13 ESE Spring DeHon
Bidirectional Switch Block 14 ESE Spring DeHon
Directional Switch Block 15 ESE Spring DeHon
Bidirectional vs Directional Switch Element Same quantity and type of circuit elements, twice the wiring Switch Block Directional half as many Switch Elements 16 ESE Spring DeHon
Multi-driver Wiring Logic outputs use tristate buffers (C Block) Directional & multi-driver wiring C Block S Block CLB 17 ESE Spring DeHon
Single-driver Wiring Logic outputs use muxes (S Block) Directional & single-driver wiring New connectivity constraint S Block CLB 18 ESE Spring DeHon
Long Wires in Directional Long wires, span L tiles –Example L = CLB 19 ESE Spring DeHon
Directional, Single-driver Benefits Average improvements 0% channel width (most surprising?) 9% delay 14% tile length of physical layout 25% transistor count 32% area-delay product 37% wiring capacitance Any reason to use bidirectional? 20 ESE Spring DeHon
Preclass Complete Table ESE Spring DeHon 21
MoT ESE Spring DeHon 22
ESE Spring DeHon 23 Recall: Mesh Switches Switches per switchbox: –6w/L seg Switches into network: –(K+1) w Switches per PE: –6w/L seg + Fc (K+1) w –w = cN p-0.5 –Total N p-0.5 Total Switches: N*(Sw/PE) N p+0.5 > N
ESE Spring DeHon 24 Recall: Mesh Switches Switches per PE: –6w/L seg + Fc (K+1) w –w = cN p-0.5 –Total N p-0.5 Not change for –Any constant Fc –Any constant L seg
ESE Spring DeHon 25 Mesh of Trees Hierarchical Mesh Build Tree in each column [Leighton/FOCS 1981]
ESE Spring DeHon 26 Mesh of Trees Hierarchical Mesh Build Tree in each column …and each row [Leighton/FOCS 1981]
ESE Spring DeHon 27 MoT Parameterization Support C with additional trees –(like BFT) C=1 C=2
ESE Spring DeHon 28 MoT Parameterization: P P=0.5 P=0.75
ESE Spring DeHon 29 Mesh of Trees Logic Blocks –Only connect at leaves of tree Connect to the C trees –Per side –4C total C<W
ESE Spring DeHon 30 Switches How many switches per tree? How many trees? How many total switches?
ESE Spring DeHon 31 Switches Total Tree switches –2 C N× (switches/tree) Sw/Tree:
ESE Spring DeHon 32 Switches Only connect to leaves of tree Leaf switches: C (K+1) Total switches Leaf + Tree O(N) Compare Mesh O(N p+0.5 )
ESE Spring DeHon 33 Wires Design: O(N p ) in top level Total wire width of channels: O(N p ) –Another geometric sum No detail route guarantee (at present)
ESE Spring DeHon 34 Empirical Results Benchmark: Toronto 20 Compare to L seg =1, L seg =4 –CLMA ~ 8K LUTs Mesh(L seg =4): w=14 122 switches/LB MoT(p=0.67,arity=2): C=4 89 switches/LB –Benchmark wide: 10% less CLMA largest Asymptotic advantage [Rubin,DeHon/FPGA2003]
MoT Parameters C, P Arity Staggering ESE Spring DeHon 35
ESE Spring DeHon 36 Staggering With multiple Trees –Offset relative to each other –Avoids worst-case discrete breaks
ESE Spring DeHon 37 Staggering With multiple Trees –(not offset)
ESE Spring DeHon 38 Staggering With multiple Trees –Offset relative to each other –Avoids worst-case discrete breaks
ESE Spring DeHon 39 Staggering With multiple Trees –Offset relative to each other –Avoids worst-case discrete breaks
ESE Spring DeHon 40 Flattening Can use arity other than two
ESE Spring DeHon 41 [Rubin&DeHon/TRVLSI2004] Overall 26% fewer than mesh
ESE Spring DeHon 42 Day 5
ESE Spring DeHon 43 Wire Layers = More Wiring Day 5
ESE Spring DeHon 44 Mesh Segmentation Wires of uniform length L seg Allow wires to bypass switchboxes
For Uniform Length Wires How does Z relate to X? ESE Spring DeHon 45 XX ZZ
Use of Metal Layers in Mesh Number of metal layers we can use is ultimately limited Uniform wires saturate vias from active layer at bottom ESE Spring DeHon 46
ESE Spring DeHon 47 MoT Layout Main issue is layout 1D trees in multilayer metal
ESE Spring DeHon 48 Row/Column Layout Geometric Progression does not saturate via space!
ESE Spring DeHon 49 Row/Column Layout
ESE Spring DeHon 50 Composite Logic Block Tile
ESE Spring DeHon 51 P=0.75 Row/Column Layout
ESE Spring DeHon 52 P=0.75 Row/Column Layout
ESE Spring DeHon 53 MoT Layout Easily laid out in Multiple metal layers –Minimal O(N p-0.5 ) layers Contain constant switching area per LB –Even with p>0.5
Compare Mesh Switch area per LB grows O(N p-0.5 ) Ability to use metal layers limited –Saturate vias MoT Switch area per LB constant O(1) Can use growing number of metal layers –While keeping switch area constant ESE Spring DeHon 54
Heterogeneous Blocks ESE Spring DeHon 55
Heterogeneous How integrate heterogeneous blocks? –Memory blocks alongside 4-LUTs –Processors –Custom Units ESE Spring DeHon 56
Replace Clusters Cluster LUTs Replace LUT clusters with other units Requires –same/similar I/O –Network support If replace in row (column) –Change height (width) to adjust for different size ESE Spring DeHon 57
Stratix V Floorplan TODO: commercial example ESE Spring DeHon 58 Source:
Replace Subtrees ESE Spring DeHon 59
ESE Spring DeHon Heterogeneous Example Unified Network Fine-grained spatial logic Memory Blocks VLIW temporal processors 60
ESE Spring DeHon 61 Big Ideas [MSB Ideas] Benefits to Single Driver Hierarchy structure –allows to save switches O(N) vs. (N p+0.5 ) –reduces worst-case switches in path –Can exploit multi-level metal
Admin HW8 due today HW9 next Wednesday –Must run tools; will take time (plan for it) Reading for Monday on Web ESE Spring DeHon 62
ESE Spring DeHon 63 Relation?
ESE Spring DeHon 64 How Related? What lessons translate amongst networks? Once understand design space –Get closer together Ideally –One big network design we can parameterize
ESE Spring DeHon 65 MoT HSRA (P=0.5)
ESE Spring DeHon 66 MoT HSRA (p=0.75)
ESE Spring DeHon 67 MoT BFT A C MoT maps directly onto a 2C HSRA –Same p’s BFT can route anything MoT can
ESE Spring DeHon 68 BFT MoT Decompose and look at rows Add homogeneous, upper-level corner turns
ESE Spring DeHon 69 BFT MoT
ESE Spring DeHon 70 BFT MoT
ESE Spring DeHon 71 BFT MoT
ESE Spring DeHon 72 BFT MoT BFT + BFT T = MoT w/ H-UL-CT –Same C, P –H-UL-CT: Homogeneous, Upper-Level, Corner Turns
ESE Spring DeHon 73 HSRA MoT (p=0.75)
ESE Spring DeHon 74 HSRA MoT (p=0.75) Can organize HSRA as MoT P>0.5 MoT layout –Tells us how to layout p>0.5 HSRA
ESE Spring DeHon 75
ESE Spring DeHon 76 MoT vs. Mesh MoT has Geometric Segment Lengths Mesh has flat connections MoT must climb tree –Parameterize w/ arity MoT has O(N p-0.5 ) fewer switches
ESE Spring DeHon 77 MoT vs. Mesh Wires –Asymptotically the same (p>0.5) –Cases where Mesh requires constant less –Cases where require same number
ESE Spring DeHon 78 [DeHon/TRVLSI2004]