ESE534: Computer Organization

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Lecture 3: Field Programmable Gate Arrays II September 10, 2013 ECE 636 Reconfigurable Computing Lecture 3 Field Programmable Gate Arrays II.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
CSET 4650 Field Programmable Logic Devices
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
CBSSS 2002: DeHon Interconnect André DeHon Thursday, June 20, 2002.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 21, 2012 Crosstalk.
Oleg Petelin and Vaughn Betz FPL 2016
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE534: Computer Organization
Crosstalk If both a wire and its neighbor are switching at the same time, the direction of the switching affects the amount of charge to be delivered and.
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
Lecture 17: NoC Innovations
ESE535: Electronic Design Automation
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Topics Circuit design for FPGAs: Logic elements. Interconnect.
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
ESE535: Electronic Design Automation
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE534: Computer Organization
Give qualifications of instructors: DAP
CS184a: Computer Architecture (Structure and Organization)
CprE / ComS 583 Reconfigurable Computing
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

ESE534: Computer Organization Day 20: April 7, 2010 Interconnect 5: Meshes (and MoT) ESE534 -- Spring 2010 -- DeHon

Previously Saw need to exploit locality/structure in interconnect a mesh might be useful Question: how does w grow? Rent’s Rule as a way to characterize structure ESE534 -- Spring 2010 -- DeHon

Today Mesh: Mesh-of-Trees (time permitting) Channel width bounds Linear population Switch requirements Routability Segmentation Clusters Directional Wires Mesh-of-Trees (time permitting) ESE534 -- Spring 2010 -- DeHon

Mesh ESE534 -- Spring 2010 -- DeHon

Manhattan ESE534 -- Spring 2010 -- DeHon

Mesh switchbox ESE534 -- Spring 2010 -- DeHon

Mesh Strengths? ESE534 -- Spring 2010 -- DeHon

Mesh Channels Lower Bound on w? Bisection Bandwidth BW  Np channels in bisection =N0.5 Channel width grows with N. ESE534 -- Spring 2010 -- DeHon

Straight-forward Switching Requirements Total Switches? Switching Delay? ESE534 -- Spring 2010 -- DeHon

Switch Delay Switching Delay: Manhattan distance 2 (Nsubarray) |Xi-Xj|+|Yi-Yj| 2 (Nsubarray) worst case: Nsubarray = N ESE534 -- Spring 2010 -- DeHon

Total Switches Switches per switchbox: 4(3ww)/2 = 6w2 Bidirectional switches (NW same as WN) double count ESE534 -- Spring 2010 -- DeHon

Total Switches Switches per switchbox: Switches into network: (K+1) w Switches per PE: 6w2 +(K+1) w w = cNp-0.5 Total  w2N2p-1 Total Switches: N×(Sw/PE)  N2p ESE534 -- Spring 2010 -- DeHon

Routability? Asking if you can route in a given channel width is: NP-complete ESE534 -- Spring 2010 -- DeHon

Linear Population Switchbox ESE534 -- Spring 2010 -- DeHon

Traditional Mesh Population: Linear Switchbox contains only a linear number of switches in channel width ESE534 -- Spring 2010 -- DeHon

Linear Mesh Switchbox Each entering channel connect to: One channel on each remaining side (3) 4 sides W wires Bidirectional switches (NW same as WN) double count 34W/2=6W switches vs. 6w2 for full population ESE534 -- Spring 2010 -- DeHon

Total Switches Switches per switchbox: Switches into network: (K+1) w Switches per PE: 6w +(K+1) w w = cNp-0.5 Total  Np-0.5 Total Switches: N×(Sw/PE)  Np+0.5 > N ESE534 -- Spring 2010 -- DeHon

Total Switches (linear population)  Np+0.5 N < Np+0.5 < N2p Switches grow faster than nodes Wires grow faster than switches ESE534 -- Spring 2010 -- DeHon

Checking Constants (Preclass 2) When do linear population designs become wire dominated? Wire pitch = 8l switch area = 2500 l2 wire area: (8w)2 switch area: 62500 w Crossover (2500 l2)? Crossover (5000 l2)? w=234 ? ESE534 -- Spring 2010 -- DeHon

Checking Constants: Full Population Does full population really use all the wire physical tracks? Wire pitch = 8l switch area = 2500 l2 wire area: (8w)2 switch area: 62500 w2 effective wire pitch: 120 l ~15 times pitch ESE534 -- Spring 2010 -- DeHon

Practical Full population is always switch dominated Just showed: doesn’t really use all the potential physical tracks …even with only two metal layers Just showed: would take 15 Mapping Ratio for linear population to take same area as full population (once crossover to wire dominated) Can afford to not use some wires perfectly to reduce switches (area) ESE534 -- Spring 2010 -- DeHon

Diamond Switch Typical switchbox pattern: Used by Xilinx Many fewer switches, but cannot guarantee will be able to use all the wires may need more wires than implied by Rent, since cannot use all wires this was already true…now more so ESE534 -- Spring 2010 -- DeHon

Universal SwitchBox Same number of switches as diamond Locally: can guarantee to satisfy any set of requests request = direction through swbox as long as meet channel capacities and order on all channels irrelevant can satisfy Not a global property no guarantees between swboxes ESE534 -- Spring 2010 -- DeHon

Diamond vs. Universal? Universal routes strictly more configurations ESE534 -- Spring 2010 -- DeHon

Inter-Switchbox Constraints Channels connect switchboxes For valid route, must satisfy all adjacent switchboxes ESE534 -- Spring 2010 -- DeHon

Mapping Ratio? How bad is it? How much wider do channels have to be? detail channel width required / global ch width ESE534 -- Spring 2010 -- DeHon

Mapping Ratio Empirical: Theory/provable: Seems plausibly, constant in practice Theory/provable: There is no Constant Mapping Ratio At least detail/global can be arbitrarily large! ESE534 -- Spring 2010 -- DeHon

Domain Structure Once enter network (choose color) can only switch within domain ESE534 -- Spring 2010 -- DeHon

Detail Routing as Coloring Global Route channel width = 2 Detail Route channel width = N Can make arbitrarily large difference ESE534 -- Spring 2010 -- DeHon

Routability Domain Routing is NP-Complete can reduce coloring problem to domain selection i.e. map adjacent nodes to same channel Previous example shows basic shape (another reason routers are slow) ESE534 -- Spring 2010 -- DeHon

Routing Lack of detail/global mapping ratio Says detail can be arbitrarily worse than global Doesn’t necessarily say domain routing is bad Maybe can avoid this effect by changing global route path? Says global not necessarily predict detail Argument against decomposing mesh routing into global phase and detail phase Modern FPGA routers do not VLSI routers and earliest FPGA routers did ESE534 -- Spring 2010 -- DeHon

Buffering and Segmentation ESE534 -- Spring 2010 -- DeHon

Buffered Bidirectional Wires C Block S Block Logic Lemieux 2009

Segmentation To improve speed (decrease delay) Allow wires to bypass switchboxes Maybe save switches? Certainly cost more wire tracks ESE534 -- Spring 2010 -- DeHon

Segmentation Segment of Length Lseg 6 switches per switchbox visited Only enters a switchbox every Lseg SW/sbox/track of length Lseg = 6/Lseg ESE534 -- Spring 2010 -- DeHon

Segmentation Reduces switches on path N/Lseg May get fragmentation Another cause of unusable wires ESE534 -- Spring 2010 -- DeHon

Segmentation: Corner Turn Option Can you corner turn in the middle of a segment? If can, need one more switch SW/sbox/track = 5/Lseg + 1 ESE534 -- Spring 2010 -- DeHon

Buffered Switch Composition ESE534 -- Spring 2010 -- DeHon

Buffered Switch Composition ESE534 -- Spring 2010 -- DeHon

Delay of Segment ESE534 -- Spring 2010 -- DeHon

Segment R and C What contributes to? Rseg? Cseg? ESE534 -- Spring 2010 -- DeHon

Preclass 3 What Lseg minimizes delay for: Distance=1? Distance=2? ESE534 -- Spring 2010 -- DeHon

VPR Lseg=4 Pix ESE534 -- Spring 2010 -- DeHon

VPR Lseg=4 Route ESE534 -- Spring 2010 -- DeHon

Effect of Segment Length? Experiment with on HW8 ESE534 -- Spring 2010 -- DeHon

Connection Boxes ESE534 -- Spring 2010 -- DeHon

C-Box Depopulation Not necessary for every input to connect to every channel Saw last time: K(N-K+1) switches Maybe use fewer? ESE534 -- Spring 2010 -- DeHon

IO Population Toronto Model IOs spread over 4 sides Fc fraction of tracks which an input connects to IOs spread over 4 sides Maybe show up on multiple Shown here: 2 ESE534 -- Spring 2010 -- DeHon

IO Population ESE534 -- Spring 2010 -- DeHon

Clustering ESE534 -- Spring 2010 -- DeHon

Leaves Not LUTs Recall cascaded LUTs Often group collection of LUTs into a Logic Block ESE534 -- Spring 2010 -- DeHon

Logic Block What does clustering do for delay? [Betz+Rose/IEEE D&T 1998] ESE534 -- Spring 2010 -- DeHon

Delay versus Cluster Size [Lu et al., FPGA 2009] ESE534 -- Spring 2010 -- DeHon

Area versus Cluster Size [Lu et al., FPGA 2009] ESE534 -- Spring 2010 -- DeHon

Review: Mesh Design Parameters Cluster Size Internal organization LB IO (Fc, sides) Switchbox Population and Topology Segment length distribution and staggering Switch rebuffering ESE534 -- Spring 2010 -- DeHon

Slides and Study from Guy Lemieux (Paper FPT2004 – Tutorial FPT 2009) Only cursory coverage in lecture. See FPT 2004 paper (recommended reading). Directional Drive Slides and Study from Guy Lemieux (Paper FPT2004 – Tutorial FPT 2009) ESE534 -- Spring 2010 -- DeHon

Bidirectional Wires Problem Half of tristate buffers left unused Buffers + input muxes dominate interconnect area Lemieux 2009

Bidirectional vs Directional Lemieux 2009

Bidirectional vs Directional Lemieux 2009

Bidirectional vs Directional Lemieux 2009

Bidirectional Switch Block Lemieux 2009

Directional Switch Block Lemieux 2009

Bidirectional vs Directional Switch Block Directional half as many Switch Elements Switch Element Same quantity and type of circuit elements, twice the wiring Lemieux 2009

Directional, Single-driver Benefits Average improvements 0% channel width (most surprising?) 9% delay 14% tile length of physical layout 25% transistor count 32% area-delay product 37% wiring capacitance Any reason to use bidirectional? Lemieux 2009

Did not cover in lecture. See Journal article on recommended reading. MoT Did not cover in lecture. See Journal article on recommended reading. ESE534 -- Spring 2010 -- DeHon

Recall: Mesh Switches Switches per switchbox: Switches into network: 6w/Lseg Switches into network: (K+1) w Switches per PE: 6w/Lseg + Fc(K+1) w w = cNp-0.5 Total  Np-0.5 Total Switches: N*(Sw/PE)  Np+0.5 > N Penn ESE680-002 Spring 2007 -- DeHon

Recall: Mesh Switches Switches per PE: Not change for 6w/Lseg + Fc(K+1) w w = cNp-0.5 Total  Np-0.5 Not change for Any constant Fc Any constant Lseg Penn ESE680-002 Spring 2007 -- DeHon

Mesh of Trees Hierarchical Mesh Build Tree in each column [Leighton/FOCS 1981] Penn ESE680-002 Spring 2007 -- DeHon

Mesh of Trees Hierarchical Mesh Build Tree in each column …and each row [Leighton/FOCS 1981] Penn ESE680-002 Spring 2007 -- DeHon

MoT Parameterization Support C with additional trees C=1 C=2 (like BFT) C=1 C=2 Penn ESE680-002 Spring 2007 -- DeHon

MoT Parameterization: P Penn ESE680-002 Spring 2007 -- DeHon

Mesh of Trees Logic Blocks Connect to the C trees C<W Only connect at leaves of tree Connect to the C trees Per side 4C total C<W Penn ESE680-002 Spring 2007 -- DeHon

Switches Total Tree switches Sw/Tree: 2 C N× (switches/tree) Penn ESE680-002 Spring 2007 -- DeHon

Switches Only connect to leaves of tree Leaf switches: C(K+1) Total switches Leaf + Tree O(N) Compare Mesh O(Np+0.5) Penn ESE680-002 Spring 2007 -- DeHon

Empirical Results Benchmark: Toronto 20 Compare to Lseg=1, Lseg=4 CLMA ~ 8K LUTs Mesh(Lseg=4): w=14  122 switches/LB MoT(p=0.67,arity=2): C=4  89 switches/LB Benchmark wide: 10% less CLMA largest Asymptotic advantage [Rubin,DeHon/FPGA2003] Penn ESE680-002 Spring 2007 -- DeHon

MoT Parameters C, P Arity Staggering Upper-Level Corner Turns Leaf IO Population/topology ESE534 -- Spring 2010 -- DeHon

[Rubin&DeHon/TRVLSI2004] Overall 26% fewer than mesh [Rubin&DeHon/TRVLSI2004] Penn ESE680-002 Spring 2007 -- DeHon

Admin HW8 due Monday Reading for Monday ESE534 -- Spring 2010 -- DeHon

Big Ideas [MSB Ideas] Mesh natural 2D topology Channels grow as W(Np-0.5) Wiring grows as W(N2p ) Linear Population: Switches grow as W(Np+0.5) Worse than shown for hierarchical Unbounded globaldetail mapping ratio Detail routing NP-complete But, seems to work well in practice… ESE534 -- Spring 2010 -- DeHon

Big Ideas [MSB-1 Ideas] Segmented/bypass routes can reduce switching delay costs more wires (fragmentation of wires) Hierarchy structure allows to save switches O(N) vs. W(Np+0.5) ESE534 -- Spring 2010 -- DeHon