ESE534: Computer Organization

Slides:

Advertisements

Similar presentations

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.

Advertisements

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.

CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.

EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.

DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.

CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.

CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.

Interconnect Network Topologies

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.

Interconnect Networks

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.

ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.

1 Dynamic Interconnection Networks Miodrag Bolic.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.

Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.

Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.

ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.

ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534 Computer Organization

CS184a: Computer Architecture (Structure and Organization)

CS137: Electronic Design Automation

CS137: Electronic Design Automation

ESE535: Electronic Design Automation

Multiprocessors Interconnection Networks

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

Indirect Networks or Dynamic Networks

Interconnection Network Design Lecture 14

ESE535: Electronic Design Automation

ESE535: Electronic Design Automation

CS184a: Computer Architecture (Structures and Organization)

ESE535: Electronic Design Automation

ESE534: Computer Organization

ESE534: Computer Organization

ESE534: Computer Organization

CS184a: Computer Architecture (Structure and Organization)

Design Principles of Scalable Switching Networks

Presentation transcript:

ESE534: Computer Organization Day 19: April 5, 2010 Interconnect 4: Switching Penn ESE534 Spring 2010 -- DeHon

Previously Used Rent’s Rule characterization to understand wire growth IO = c Np Top bisections will be (Np) 2D wiring area (Np)(Np) = (N2p) Penn ESE534 Spring 2010 -- DeHon

We Know How we avoid O(N2) wire growth for “typical” designs How to characterize locality How we might exploit that locality to reduce wire growth Wire growth implied by a characterized design Penn ESE534 Spring 2010 -- DeHon

Today Switching Exploiting Multiple Metal Layer Implications Options Penn ESE534 Spring 2010 -- DeHon

Switching: How can we use the locality captured by Rent’s Rule to reduce switching requirements? (How much?) Penn ESE534 Spring 2010 -- DeHon

IO = c Np Observation Locality that saved us wiring, also saves us switching IO = c Np Penn ESE534 Spring 2010 -- DeHon

Consider Crossbar case to exploit wiring: split into two halves, connect with limited wires N/2 x N/2 crossbar each half N/2 x c(N/2)p connect to bisection wires Penn ESE534 Spring 2010 -- DeHon

Preclass 1 Crosspoints? Symbolic ratio? Ratio N=215, p=2/3 ? Penn ESE534 Spring 2010 -- DeHon

What next When something works once? …we try it again… Penn ESE534 Spring 2010 -- DeHon

Recurse Repeat at each level  form a tree Penn ESE534 Spring 2010 -- DeHon

Result If use crossbar at each tree node O(N2p) wiring area for p>0.5, direct from bisection O(N2p) switches top switch box is O(N2p) 2 Wtop×Wbot+(Wbot)2 2 (Np×(N/2)p)+(N/2)2p N2p(1/2p + 1/22p) Correction from lecture N2p(2/2p + 1/22p) Penn ESE534 Spring 2010 -- DeHon

Result If use crossbar at each tree node 2(N/2)2p(2/2p + 1/22p) O(N2p) wiring area for p>0.5, direct from bisection O(N2p) switches top switch box is O(N2p) N2p(2/2p + 1/22p) switches at one level down is 2(N/2)2p(2/2p + 1/22p) 2(1/2p)2 ( N2p(2/2p + 1/22p)) 2(1/2p)2  previous level Penn ESE534 Spring 2010 -- DeHon

Result If use crossbar at each tree node 2(1/2p)2  previous level O(N2p) wiring area for p>0.5, direct from bisection O(N2p) switches top switch box is O(N2p) N2p(1/2p + 1/22p) switches at one level down is 2(1/2p)2  previous level (2/22p)=2(1-2p) coefficient < 1 for p>0.5 Now believe this is correct. Just got confused In lecture. Example: p=2/3 2(1-2✕2/3)=2-1/3=0.7937 Penn ESE534 Spring 2010 -- DeHon

Result N2p (1+2(1-2p) +22(1-2p) +23(1-2p) +…) N2p (1/(1-2(1-2p) )) If use crossbar at each tree node O(N2p) switches top switch box is O(N2p) switches at one level down is 2(1-2p)  previous level Total switches: N2p (1+2(1-2p) +22(1-2p) +23(1-2p) +…) get geometric series; sums to O(1) N2p (1/(1-2(1-2p) )) = 2(2p-1) /(2(2p-1) -1)  N2p Penn ESE534 Spring 2010 -- DeHon

Good News Good news asymptotically optimal Even without switches, area O(N2p) so adding O(N2p) switches not change Penn ESE534 Spring 2010 -- DeHon

Bad News Switches area >> wire crossing area Consider 8l wire pitch  crossing 64 l2 Typical (passive) switch  2500 l2 Passive only: 40× area difference worse once rebuffer or latch signals. …and switches limited to substrate whereas can use additional metal layers for wiring area Penn ESE534 Spring 2010 -- DeHon

Additional Structure This motivates us to look beyond crossbars can we depopulate crossbars on up-down connection without loss of functionality? Penn ESE534 Spring 2010 -- DeHon

Can we do better? Crossbar too powerful? What do we want to do? Does the specific down channel matter? What do we want to do? Connect to any channel on lower level Choose a subset of wires from upper level order not important Penn ESE534 Spring 2010 -- DeHon

N choose K Exploit freedom to depopulate switchbox Can do with: K(N-K+1) switches Vs. K  N Save ~K2 Penn ESE534 Spring 2010 -- DeHon

N-choose-M Up-down connections Consequent: Similary, Left-Right only require concentration choose M things out of N i.e. order of subset irrelevant Consequent: can save a constant factor ~ 2p/(2p-1) (N/2)p x Np vs (Np - (N/2)p+1)(N/2)p P=2/3  2p/(2p-1)  2.7 Similary, Left-Right order not important  reduces switches Penn ESE534 Spring 2010 -- DeHon

Multistage Switching Penn ESE534 Spring 2010 -- DeHon

Multistage Switching We can route any permutation with fewer switches than a crossbar If we allow switching in stages Trade increase in switches in path For decrease in total switches Penn ESE534 Spring 2010 -- DeHon

Butterfly Log stages Resolve one bit per stage bit3 bit2 bit1 bit0 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Penn ESE534 Spring 2010 -- DeHon

What can a Butterfly Route? 0000 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1000 0000  0001 1000  0010 Penn ESE534 Spring 2010 -- DeHon

What can a Butterfly Route? 0000 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 1000 0000  0001 1000  0010 Penn ESE534 Spring 2010 -- DeHon

Butterfly Routing Cannot route all permutations Get internal blocking What required for non-blocking network? Penn ESE534 Spring 2010 -- DeHon

Decomposition Penn ESE534 Spring 2010 -- DeHon

Decomposed Routing Pick a link to route. Route to destination over red network At destination, What can we say about the link which shares the final stage switch with this one? What can we do with link? Route that link What constraint does this impose? So what do we do? Penn ESE534 Spring 2010 -- DeHon

Decomposition Switches: N/2  2  4 + (N/2)2 < N2 Penn ESE534 Spring 2010 -- DeHon

Recurse If it works once, try it again… Penn ESE534 Spring 2010 -- DeHon

Result: Beneš Network 2log2(N)-1 stages (switches in path) Made of N/2 22 switchpoints (4 switches) 4Nlog2(N) total switches Compute route in O(N log(N)) time Penn ESE534 Spring 2010 -- DeHon

Preclass 2 Switches in Benes 32? Ratio to 32x32 Crossbar? Penn ESE534 Spring 2010 -- DeHon

Beneš Network Wiring Bisection: N Wiring  O(N2) area (fixed wire layers) Penn ESE534 Spring 2010 -- DeHon

Beneš Switching Beneš reduced switches N2 to N(log(N)) using multistage network Replace crossbars in tree with Beneš switching networks Penn ESE534 Spring 2010 -- DeHon

Beneš Switching Implication of Beneš Switching still require O(W2) wiring per tree node or a total of O(N2p) wiring now O(W log(W)) switches per tree node converges to O(N) total switches! O(log2(N)) switches in path across network strictly speaking, dominated by wire delay ~O(Np) but constants make of little practical interest except for very large networks  Penn ESE534 Spring 2010 -- DeHon

Better yet… Believe do not need Beneš on the up paths Single switch on up path Beneš for crossover Switches in path: log(N) up + log(N) down + 2log(N) crossover = 4 log(N) = O(log (N)) Penn ESE534 Spring 2010 -- DeHon

Linear Switch Population Penn ESE534 Spring 2010 -- DeHon

Linear Switch Population Can further reduce switches connect each lower channel to O(1) channels in each tree node end up with O(W) switches per tree node Penn ESE534 Spring 2010 -- DeHon

Linear Switch (p=0.5) Penn ESE534 Spring 2010 -- DeHon

Linear Population and Beneš Top-level crossover of p=1 is Beneš switching Penn ESE534 Spring 2010 -- DeHon

Beneš Compare Can permute stage switches so local shuffles on outside and big shuffle in middle Penn ESE534 Spring 2010 -- DeHon

Linear Consequences: Good News Linear Switches O(log(N)) switches in path O(N2p) wire area O(N) switches More practical than Beneš crossover case Penn ESE534 Spring 2010 -- DeHon

Preclass 3 Route Sets: T(L & R), L (T & R), R(T & L) TL, RT, LR TL, LR, RL Penn ESE534 Spring 2010 -- DeHon

Mapping Ratio Mapping ratio says if I have W channels may only be able to use W/MR wires for a particular design’s connection pattern to accommodate any design forall channels physical wires  MR  logical Penn ESE534 Spring 2010 -- DeHon

Mapping Ratio Example: Shows MR=3/2 For Linear Population, 1:1 switchbox Penn ESE534 Spring 2010 -- DeHon

Linear Consequences: Bad News Lacks guarantee can use all wires as shown, at least mapping ratio > 1 likely cases where even constant not suffice expect no worse than logarithmic Finding Routes is harder no longer linear time, deterministic open as to exactly how hard Penn ESE534 Spring 2010 -- DeHon

Area Comparison Both: p=0.67 N=1024 M-choose-N perfect map Linear MR=2 Penn ESE534 Spring 2010 -- DeHon

Area Comparison M-choose-N perfect map Linear MR=2 Since switch >> wire may be able to tolerate MR>1 reduces switches net area savings Empirical: Never seen MR greater than 1.5 Penn ESE534 Spring 2010 -- DeHon

Multilayer Metal Penn ESE534 Spring 2010 -- DeHon

Day 7 Penn ESE534 Spring 2010 -- DeHon

Wire Layers = More Wiring Day 7 Penn ESE534 Spring 2010 -- DeHon

Opportunity Multiple Layers of metal allow us to Increase effective pitch Potentially route in 3D volume Penn ESE534 Spring 2010 -- DeHon

Larger “Cartoon” P=0.67 LUT Area 3% 1024 LUT Network Day 18 Penn ESE534 Spring2010 -- DeHon

Challenge Homogenize switches Avoid via saturation Spread or balance wires Homogenize switches Avoid via saturation Penn ESE534 Spring2010 -- DeHon

Linear Population Tree (P=0.5)

Linear Population Tree (P=0.5)

BFT Folding

BFT Folding

BFT Folding

BFT Folding

BFT Folding

BFT Folding

BFT Folding

Invariants Lower folds leave both diagonals free Current level consumes one, leaving other free

BFT Folding

Wiring

Avoid via Saturation

Compact, Multilayer Linear Population Tree Layout Can layout BFT in O(N) 2D area with O(log(N)) wiring layers Can be extended for p>0.5 as well Wire layers grow as O(N(p-0.5))

Admin HW6 Graded HW8 dues 4/12 Reading for Wednesday on web Parts still need Wednesday’s lecture Reading for Wednesday on web Penn ESE534 Spring 2010 -- DeHon

Big Ideas [MSB Ideas] In addition to wires, must have switches Switches have significant area and delay Rent’s Rule locality reduces both wiring and switching requirements Naïve switches match wires at O(N2p) switch area >> wire area prevent benefit from multiple layers of metal Penn ESE534 Spring 2010 -- DeHon

Big Ideas [MSB Ideas] Can achieve O(N) switches Switchbox depopulation plausibly O(N) area with sufficient metal layers Switchbox depopulation save considerably on area (delay) will waste wires May still come out ahead (evidence to date) Penn ESE534 Spring 2010 -- DeHon