Penn ESE680-002 Spring2007 -- DeHon 1 ESE680-002 (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.

Slides:



Advertisements
Similar presentations
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Advertisements

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 21: April 15, 2009 Routing 1.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 19: April 9, 2008 Routing 1.
CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 18: March 21, 2007 Interconnect 6: MoT.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
Interconnect Network Topologies
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Interconnect Networks
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
1 Dynamic Interconnection Networks Miodrag Bolic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day9:
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 13: March 3, 2015 Routing 1.
CBSSS 2002: DeHon Interconnect André DeHon Thursday, June 20, 2002.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 21: November 28, 2005 Routing 1.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
ESE534: Computer Organization
ESE534: Computer Organization
Refer example 2.4on page 64 ACA(Kai Hwang) And refer another ppt attached for static scheduling example.
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
Design Principles of Scalable Switching Networks
Presentation transcript:

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching

Penn ESE Spring DeHon 2 Previously Used Rent’s Rule characterization to understand wire growth IO = c N p Top bisections will be  (N p ) 2D wiring area  (N p )  (N p ) =  (N 2p )

Penn ESE Spring DeHon 3 We Know How we avoid O(N 2 ) wire growth for “typical” designs How to characterize locality How we might exploit that locality to reduce wire growth Wire growth implied by a characterized design

Penn ESE Spring DeHon 4 Today Switching –Implications –Options

Penn ESE Spring DeHon 5 Switching: How can we use the locality captured by Rent’s Rule to reduce switching requirements? (How much?)

Penn ESE Spring DeHon 6 Observation Locality that saved us wiring, also saves us switching IO = c N p

Penn ESE Spring DeHon 7 Consider Crossbar case to exploit wiring:  split into two halves, connect with limited wires  N/2 x N/2 crossbar each half  N/2 x (N/2) p connect to bisection wires  2(N 2 /4) + 2(N/2) p+1 < N 2

Penn ESE Spring DeHon 8 Recurse Repeat at each level –form tree

Penn ESE Spring DeHon 9 Result If use crossbar at each tree node –O(N 2p ) wiring area for p>0.5, direct from bisection –O(N 2p ) switches top switch box is O(N 2p ) –2 W top ×W bot +(W bot ) 2 –2 (N p ×(N/2) p )+(N/2) 2p –N 2p (1/2 p + 1/2 2p )

Penn ESE Spring DeHon 10 Result If use crossbar at each tree node –O(N 2p ) wiring area for p>0.5, direct from bisection –O(N 2p ) switches top switch box is O(N 2p ) –N 2p (1/2 p + 1/2 2p ) switches at one level down is –2  (N/2) 2p (1/2 p + 1/2 2p ) –2  (1/2 p ) 2  ( N 2p (1/2 p + 1/2 2p )) –2  (1/2 p ) 2  previous level

Penn ESE Spring DeHon 11 Result If use crossbar at each tree node –O(N 2p ) wiring area for p>0.5, direct from bisection –O(N 2p ) switches top switch box is O(N 2p ) –N 2p (1/2 p + 1/2 2p ) switches at one level down is –2  (1/2 p ) 2  previous level –(2/2 2p )=2 (1-2p) –coefficient 0.5

Penn ESE Spring DeHon 12 Result If use crossbar at each tree node –O(N 2p ) switches top switch box is O(N 2p ) switches at one level down is  2 (1-2p)  previous level Total switches:  N 2p  (1+2 (1-2p) +2 2(1-2p) +2 3(1-2p) +…)  get geometric series; sums to O(1)  N 2p  (1/(1-2 (1-2p) ))  = 2 (2p-1) /(2 (2p-1) -1)  N 2p

Penn ESE Spring DeHon 13 Good News Good news –asymptotically optimal –Even without switches, area O(N 2p ) so adding O(N 2p ) switches not change

Penn ESE Spring DeHon 14 Bad News Switches area >> wire crossing area –Consider 8 wire pitch  crossing 64  2 –Typical (passive) switch  –Passive only: 40× area difference worse once rebuffer or latch signals. …and switches limited to substrate –whereas can use additional metal layers for wiring area

Penn ESE Spring DeHon 15 Additional Structure This motivates us to look beyond crossbars –can depopulate crossbars on up-down connection without loss of functionality?

Penn ESE Spring DeHon 16 Can we do better? Crossbar too powerful? –Does the specific down channel matter? What do we want to do? –Connect to any channel on lower level –Choose a subset of wires from upper level order not important

Penn ESE Spring DeHon 17 N choose K Exploit freedom to depopulate switchbox Can do with: –K  (N-K+1) switches –Vs. K  N –Save ~K 2

Penn ESE Spring DeHon 18 N-choose-M Up-down connections –only require concentration choose M things out of N –i.e. order of subset irrelevant Consequent: –can save a constant factor ~ 2 p /(2 p -1) (N/2) p x N p vs (N p - (N/2) p +1)(N/2) p P=2/3  2 p /(2 p -1)  2.7 Similary, Left-Right –order not important  reduces switches

Penn ESE Spring DeHon 19 Multistage Switching

Penn ESE Spring DeHon 20 Multistage Switching We can route any permutation w/ less switches than a crossbar If we allow switching in stages –Trade increase in switches in path –For decrease in total switches

Penn ESE Spring DeHon 21 Butterfly Log stages Resolve one bit per stage bit3bit2bit1bit0

Penn ESE Spring DeHon 22 What can a Butterfly Route? 0000  

Penn ESE Spring DeHon 23 What can a Butterfly Route? 0000  

Penn ESE Spring DeHon 24 Butterfly Routing Cannot route all permutations –Get internal blocking What required for non-blocking network?

Penn ESE Spring DeHon 25 Decomposition

Penn ESE Spring DeHon 26 Decomposed Routing Pick a link to route. Route to destination over red network At destination, –What can we say about the link which shares the final stage switch with this one? –What can we do with link? Route that link –What constraint does this impose? –So what do we do?

Penn ESE Spring DeHon 27 Decomposition Switches: N/2  2  4 + (N/2) 2 < N 2

Penn ESE Spring DeHon 28 Recurse If it works once, try it again…

Penn ESE Spring DeHon 29 Result: Beneš Network 2log 2 (N)-1 stages (switches in path) Made of N/2 2  2 switchpoints –(4 switches) 4N  log 2 (N) total switches Compute route in O(N log(N)) time

Penn ESE Spring DeHon 30 Beneš Network Wiring Bisection: N Wiring  O(N 2 ) area (fixed wire layers)

Penn ESE Spring DeHon 31 Beneš Switching Beneš reduced switches –N 2 to N(log(N)) –using multistage network Replace crossbars in tree with Beneš switching networks

Penn ESE Spring DeHon 32 Beneš Switching Implication of Beneš Switching –still require O(W 2 ) wiring per tree node or a total of O(N 2p ) wiring –now O(W log(W)) switches per tree node converges to O(N) total switches! –O(log 2 (N)) switches in path across network strictly speaking, dominated by wire delay ~O(N p ) but constants make of little practical interest except for very large networks 

Penn ESE Spring DeHon 33 Better yet… Believe do not need Beneš on the up paths Single switch on up path Beneš for crossover Switches in path: log(N) up + log(N) down + 2log(N) crossover = 4 log(N) = O(log (N))

Penn ESE Spring DeHon 34 Linear Switch Population

Penn ESE Spring DeHon 35 Linear Switch Population Can further reduce switches –connect each lower channel to O(1) channels in each tree node –end up with O(W) switches per tree node

Penn ESE Spring DeHon 36 Linear Switch (p=0.5)

Penn ESE Spring DeHon 37 Linear Population and Beneš Top-level crossover of p=1 is Beneš switching

Penn ESE Spring DeHon 38 Beneš Compare Can permute stage switches so local shuffles on outside and big shuffle in middle

Penn ESE Spring DeHon 39 Linear Consequences: Good News Linear Switches –O(log(N)) switches in path –O(N 2p ) wire area –O(N) switches –More practical than Beneš crossover case

Penn ESE Spring DeHon 40 Linear Consequences: Bad News Lacks guarantee can use all wires –as shown, at least mapping ratio > 1 –likely cases where even constant not suffice expect no worse than logarithmic Finding Routes is harder –no longer linear time, deterministic –open as to exactly how hard

Penn ESE Spring DeHon 41 Mapping Ratio Mapping ratio says –if I have W channels may only be able to use W/MR wires –for a particular design’s connection pattern to accommodate any design –forall channels physical wires  MR  logical

Penn ESE Spring DeHon 42 Mapping Ratio Example: –Shows MR=3/2 –For Linear Population, 1:1 switchbox

Penn ESE Spring DeHon 43 Area Comparison Both: p=0.67 N=1024 M-choose-N perfect map Linear MR=2

Penn ESE Spring DeHon 44 Area Comparison M-choose-N perfect map Linear MR=2 Since –switch >> wire may be able to tolerate MR>1 reduces switches –net area savings Empirical: –Never seen greater than 1.5

Penn ESE Spring DeHon 45 Expanders

Penn ESE Spring DeHon 46 Expander Theory ,  expansion –Any group of size k=  N will expand connect to a group of size  k  N in each logical direction [Arora, Leighton, Maggs SIAM Journal of Comp. v25n3p ]

Penn ESE Spring DeHon 47 Expander Idea IF we can achieve expansion –Can guarantee non-blocking at each stage i.e. –Guarantee use less than  N –Guarantee connections to more stuff in next level –Since  N>  N available in next level Guaranteed to be an available switch

Penn ESE Spring DeHon 48 Dilated Switches Have multiple outputs per logical direction –Dilation: number of outputs per direction –E.g. radix 2 switch w/ 4 outputs 2 per direction Dilation 2 Up (0) Down (1)

Penn ESE Spring DeHon 49 Dilated Switches allow Expansion On Right –Any pair of nodes connects to 3 switches Strictly speaking must have d>2 for expansion

Penn ESE Spring DeHon 50 Random Wiring Random, dilated wiring for butterfly can achieve –For tree…2  2 p (?) [Upfal/STOC 1989, Leighton/Leiserson/Kravets MIT/LCS/RSS ]

Penn ESE Spring DeHon 51 Constraints Total load should not exceed  of net –L=mapping ratio (light loading factor) –  LW = number into each subtree –  LW  W/2 p  L  1/(2 p  ) Cannot expand past the size of subtree –  N  N/2 p –  1/2 p

Penn ESE Spring DeHon 52 Extra Switches Extra switch factor: d×L Try: –  =2,  =1/10 –d=8 –dL  40 (p=1) Try: –  =1.01,  =1/4  d=6, L  2 dL  12 (p=1) –  =1.01,  =1/4  d=6, L  2.8 dL  17 (p=0.5)

Penn ESE Spring DeHon 53 Putting it Together Base, linear-population trees have O(N) switches Make larger by a factor of L (linear factor) Dilated version have a factor of d more switches Randomly wired expander –Can have O(N) switches –Guarantee routes –Constants < 100 (looks like < 20) –Open: how tight can make it?

Penn ESE Spring DeHon 54 Big Ideas [MSB Ideas] In addition to wires, must have switches –Have significant area and delay Rent’s Rule locality reduces –both wiring and switching requirements Naïve switches match wires at O(N 2p ) –switch area >> wire area –prevent benefit from multiple layers of metal

Penn ESE Spring DeHon 55 Big Ideas [MSB Ideas] Can achieve O(N) switches –plausibly O(N) area with sufficient metal layers Switchbox depopulation –save considerably on area (delay) –will waste wires –May still come out ahead (evidence to date)