ESE534 -- Spring 2012 -- DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
CSET 4650 Field Programmable Logic Devices
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 7 Programmable.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Global Routing.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
CBSSS 2002: DeHon Interconnect André DeHon Thursday, June 20, 2002.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Oleg Petelin and Vaughn Betz FPL 2016
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Topics Circuit design for FPGAs: Logic elements. Interconnect.
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
ESE534: Computer Organization
Give qualifications of instructors: DAP
CS184a: Computer Architecture (Structure and Organization)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)

ESE Spring DeHon 2 Previously Saw – need to exploit locality/structure in interconnect –a mesh might be useful –Rent’s Rule as a way to characterize structure

ESE Spring DeHon 3 Today Mesh: –Channel width bounds –Linear population –Switch requirements –Routability –Segmentation –Clusters –Directional Wires Mesh-of-Trees (time permitting)

ESE Spring DeHon 4 Mesh Street Analogy

ESE Spring DeHon 5 Manhattan

ESE Spring DeHon 6 Mesh switchbox

Mesh Strengths? ESE Spring DeHon 7

8 Mesh Channels Lower Bound on w? Bisection Bandwidth –BW  N p –channels in bisection =N 0.5 Channel width grows with N.

ESE Spring DeHon 9 Straight-forward Switching Requirements Total Switches? Switching Delay?

ESE Spring DeHon 10 Switch Delay Switching Delay: –Manhattan distance |X i -X j |+|Y i -Y j | – 2  (N subarray ) worst case: N subarray = N

ESE Spring DeHon 11 Total Switches Switches per switchbox: –4  (3w  w)/2 = 6w 2 –Bidirectional switches (N  W same as W  N) double count

ESE Spring DeHon 12 Total Switches Switches per switchbox: –6w 2 Switches into network: –(K+1) w Switches per PE: –6w 2 +(K+1) w –w = cN p-0.5 –Total  w 2  N 2p-1 Total Switches: N×(Sw/PE)  N 2p

ESE Spring DeHon 13 Routability? Asking if you can route in a given channel width is: –NP-complete Contrast with Beneŝ, Beneš-crossover tree….

Linear Population Switchbox ESE Spring DeHon 14

ESE Spring DeHon 15 Traditional Mesh Population: Linear Switchbox contains only a linear number of switches in channel width

ESE Spring DeHon 16 Linear Mesh Switchbox Each entering channel connect to: –One channel on each remaining side (3) –4 sides –W wires –Bidirectional switches (N  W same as W  N) double count –3  4  W/2=6W switches vs. 6w 2 for full population

ESE Spring DeHon 17 Total Switches Switches per switchbox: –6w Switches into network: –(K+1) w Switches per PE: –6w +(K+1) w –w = cN p-0.5 –Total  N p-0.5 Total Switches: N×(Sw/PE)  N p+0.5 > N

ESE Spring DeHon 18 Total Switches (linear population) Total Switches  N p+0.5 N < N p+0.5 < N 2p Switches grow faster than nodes Wires grow faster than switches

ESE Spring DeHon 19 Checking Constants (Preclass 3) When do linear population designs become wire dominated? Wire pitch = 4 F switch area = 625 F 2 wire area: (4w) 2 switch area: 6  625 w Crossover?

ESE Spring DeHon 20 Checking Constants: Full Population Does full population really use all the wire physical tracks? Wire pitch = 4F switch area = 625 F 2 wire area: (4w) 2 switch area: 6  625 w 2 effective wire pitch: 60F  15 times pitch

ESE Spring DeHon 21 Practical Full population is always switch dominated –doesn’t really use all the potential physical tracks –…even with only two metal layers Just showed: –would take 15  Mapping Ratio for linear population to take same area as full population (once crossover to wire dominated) Can afford to not use some wires perfectly –to reduce switches (area)

ESE Spring DeHon 22 Diamond Switch Typical linear switchbox pattern: –Used by Xilinx

ESE Spring DeHon 23 Mapping Ratio? How bad is it? How much wider do channels have to be? Mapping Ratio: –detail channel width required / global ch width

ESE Spring DeHon 24 Mapping Ratio Empirical: –Seems plausibly, constant in practice Theory/provable: –There is no Constant Mapping Ratio At least detail/global –can be arbitrarily large!

ESE Spring DeHon 25 Domain Structure Once enter network (choose color) can only switch within domain

ESE Spring DeHon 26 Detail Routing as Coloring Global Route channel width = 2 Detail Route channel width = N –Can make arbitrarily large difference

ESE Spring DeHon 27 Routability Domain Routing is NP-Complete –can reduce coloring problem to domain selection i.e. map adjacent nodes to same channel Previous example shows basic shape –(another reason routers are slow)

ESE Spring DeHon 28 Routing Lack of detail/global mapping ratio –Says detail can be arbitrarily worse than global –Doesn’t necessarily say domain routing is bad Maybe can avoid this effect by changing global route path? –Says global not necessarily predict detail –Argument against decomposing mesh routing into global phase and detail phase Modern FPGA routers do not VLSI routers and earliest FPGA routers did

Buffering and Segmentation ESE Spring DeHon 29

Buffered Bidirectional Wires Logic C Block S Block 30 ESE Spring DeHon

31 Segmentation To improve speed (decrease delay) Allow wires to bypass switchboxes Maybe save switches? Certainly cost more wire tracks

ESE Spring DeHon 32 Segmentation Segment of Length L seg –6 switches per switchbox visited –Only enters a switchbox every L seg –SW/sbox/track of length Lseg = 6/L seg

ESE Spring DeHon 33 Segmentation Reduces switches on path  N/L seg May get fragmentation Another cause of unusable wires

ESE Spring DeHon 34 Segmentation: Corner Turn Option Can you corner turn in the middle of a segment? If can, need one more switch SW/sbox/track = 5/L seg + 1

Buffered Switch Composition ESE Spring DeHon 35

Buffered Switch Composition ESE Spring DeHon 36

Delay of Segment ESE Spring DeHon 37

Segment R and C What contributes to? R seg ? C seg ? ESE Spring DeHon 38

Preclass 4 Fillin Tseg table together. ESE Spring DeHon 39

Preclass 4 What L seg minimizes delay for: Distance=1? Distance=2? Distance=6? Distance=10? Distance=20? ESE Spring DeHon 40

ESE Spring DeHon 41 VPR L seg =4 Pix

ESE Spring DeHon 42 VPR L seg =4 Route

Effect of Segment Length? Experiment with on HW9 ESE Spring DeHon 43

Connection Boxes ESE Spring DeHon 44

ESE Spring DeHon 45 C-Box Depopulation Not necessary for every input to connect to every channel Saw last time: –K  (N-K+1) switches Maybe use fewer?

ESE Spring DeHon 46 IO Population Toronto Model –Fc fraction of tracks which an input connects to IOs spread over 4 sides Maybe show up on multiple –Shown here: 2

ESE Spring DeHon 47 IO Population

Clustering ESE Spring DeHon 48

ESE Spring DeHon 49 Leaves Not LUTs Recall cascaded LUTs Often group collection of LUTs into a Logic Block

ESE Spring DeHon 50 Logic Block [Betz+Rose/IEEE D&T 1998] What does clustering do for delay?

Delay versus Cluster Size ESE Spring DeHon 51 [Lu et al., FPGA 2009]

ESE Spring DeHon 52 Area versus Cluster Size [Lu et al., FPGA 2009]

ESE Spring DeHon 53 Review: Mesh Design Parameters Cluster Size –Internal organization LB IO (Fc, sides) Switchbox Population and Topology Segment length distribution –and staggering Switch rebuffering

Directional Drive Slides and Study from Guy Lemieux (Paper FPT2004 – Tutorial FPT 2009) ESE Spring DeHon 54

Bidirectional Wires Problem Half of tristate buffers left unused Buffers + input muxes dominate interconnect area 55 ESE Spring DeHon

Bidirectional vs Directional 56 ESE Spring DeHon

Bidirectional vs Directional 57 ESE Spring DeHon

Bidirectional vs Directional 58 ESE Spring DeHon

Bidirectional Switch Block 59 ESE Spring DeHon

Directional Switch Block 60 ESE Spring DeHon

Bidirectional vs Directional Switch Element Same quantity and type of circuit elements, twice the wiring Switch Block Directional half as many Switch Elements 61 ESE Spring DeHon

Multi-driver Wiring Logic outputs use tristate buffers (C Block) Directional & multi-driver wiring C Block S Block CLB 62

Single-driver Wiring Logic outputs use muxes (S Block) Directional & single-driver wiring New connectivity constraint S Block CLB 63

Directional, Single-driver Benefits Average improvements 0% channel width (most surprising?) 9% delay 14% tile length of physical layout 25% transistor count 32% area-delay product 37% wiring capacitance Any reason to use bidirectional? 64 ESE Spring DeHon

MoT ESE Spring DeHon 65

ESE Spring DeHon 66 Recall: Mesh Switches Switches per switchbox: –6w/L seg Switches into network: –(K+1) w Switches per PE: –6w/L seg + Fc  (K+1) w –w = cN p-0.5 –Total  N p-0.5 Total Switches: N*(Sw/PE)  N p+0.5 > N

ESE Spring DeHon 67 Recall: Mesh Switches Switches per PE: –6w/L seg + Fc  (K+1) w –w = cN p-0.5 –Total  N p-0.5 Not change for –Any constant Fc –Any constant L seg

ESE Spring DeHon 68 Mesh of Trees Hierarchical Mesh Build Tree in each column [Leighton/FOCS 1981]

ESE Spring DeHon 69 Mesh of Trees Hierarchical Mesh Build Tree in each column …and each row [Leighton/FOCS 1981]

ESE Spring DeHon 70 MoT Parameterization Support C with additional trees –(like BFT) C=1 C=2

ESE Spring DeHon 71 MoT Parameterization: P P=0.5 P=0.75

ESE Spring DeHon 72 Mesh of Trees Logic Blocks –Only connect at leaves of tree Connect to the C trees –Per side –4C total C<W

ESE Spring DeHon 73 Switches Total Tree switches –2 C  N× (switches/tree) Sw/Tree:

ESE Spring DeHon 74 Switches Only connect to leaves of tree Leaf switches: C  (K+1) Total switches  Leaf + Tree  O(N)  Compare Mesh O(N p+0.5 )

ESE Spring DeHon 75 Empirical Results Benchmark: Toronto 20 Compare to L seg =1, L seg =4 –CLMA ~ 8K LUTs Mesh(L seg =4): w=14  122 switches/LB MoT(p=0.67,arity=2): C=4  89 switches/LB –Benchmark wide: 10% less CLMA largest Asymptotic advantage [Rubin,DeHon/FPGA2003]

MoT Parameters C, P Arity Staggering Upper-Level Corner Turns Leaf IO Population/topology ESE Spring DeHon 76

ESE Spring DeHon 77 [Rubin&DeHon/TRVLSI2004] Overall 26% fewer than mesh

ESE Spring DeHon 78 Admin HW9 out Reading for Wednesday on web

ESE Spring DeHon 79 Big Ideas [MSB Ideas] Mesh natural 2D topology –Channels grow as  (N p-0.5 ) –Wiring grows as  (N 2p ) –Linear Population: Switches grow as  (N p+0.5 ) –Worse than shown for hierarchical Unbounded global  detail mapping ratio Detail routing NP-complete But, seems to work well in practice…

ESE Spring DeHon 80 Big Ideas [MSB-1 Ideas] Segmented/bypass routes –can reduce switching delay –costs more wires (fragmentation of wires) Hierarchy structure allows to save switches –O(N) vs.  (N p+0.5 )