Caltech CS184a Fall2000 -- DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day6: October 11, 2000 Instruction Taxonomy VLSI Scaling.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
JR.S00 1 Lecture 13: (Re)configurable Computing Prof. Jan Rabaey Computer Science 252, Spring 2000 The major contributions of Andre Dehon to this slide.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 18, 2005 Interconnect 6: MoT.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day9:
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
CBSSS 2002: DeHon Interconnect André DeHon Thursday, June 20, 2002.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
ESE Spring DeHon 1 ESE534: Computer Organization Day 18: March 26, 2012 Interconnect 5: Meshes (and MoT)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
ESE Spring DeHon 1 ESE534: Computer Organization Day 20: April 9, 2014 Interconnect 6: Direct Drive, MoT.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534 Computer Organization
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
CS137: Electronic Design Automation
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
CS184a: Computer Architecture (Structures and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
Presentation transcript:

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements

Caltech CS184a Fall DeHon2 Last Time Saw various compute blocks Role of automated mapping in exploring design space To exploit structure in typical designs we need programmable interconnect All reasonable, scalable structures: –small to moderate sized logic blocks –connected via programmable interconnect been saying delay across programmable interconnect is a big factor

Caltech CS184a Fall DeHon3 Today Interconnect Design Space Dominance of Interconnect Simple things –and why they don’t work Interconnect Implications Characterizing Interconnect Requirements

Caltech CS184a Fall DeHon4 Dominant Area

Caltech CS184a Fall DeHon5 Dominant Time

Caltech CS184a Fall DeHon6 Dominant Time

Caltech CS184a Fall DeHon7 Dominant Power XC4003A data from Eric Kusse (UCB MS 1997)

Caltech CS184a Fall DeHon8 For Spatial Architectures Interconnect dominant –area –power –time …so need to understand in order to optimize architectures

Caltech CS184a Fall DeHon9 Interconnect Problem –Thousands of independent (bit) operators producing results true of FPGAs today …true for *LIW, multi-uP, etc. in future –Each taking as inputs the results of other (bit) processing elements –Interconnect is late bound don’t know until after fabrication

Caltech CS184a Fall DeHon10 Design Issues Flexibility -- route “anything” –(w/in reason?) Area -- wires, switches Delay -- switches in path, stubs, wire length Power -- switch, wire capacitance Routability -- computational difficulty finding routes

Caltech CS184a Fall DeHon11 (1) Shared Bus Familiar case Use single interconnect resource Reuse in Time Consequence?

Caltech CS184a Fall DeHon12 Shared Bus Consider operation: y=Ax 2 +Bx +C –3 mpys –2 adds –~5 values need to be routed from producer to consumer Performance lower bound if have design w/: –m multipliers –u madd units –a adders –i simultaneous interconnection busses

Caltech CS184a Fall DeHon13 Viewpoint Interconnect is a resource Bottleneck for design can be in availability of any resource Lower Bound on Delay: Logical Resource / Physical Resources May be worse –dependencies –ability to use resource

Caltech CS184a Fall DeHon14 Shared Bus Flexibility (+) –routes everything (given enough time) –can be trick to schedule use optimally Delay (Power) (--) –wire length O(kn) –parasitic stubs: kn+n –series switch: 1 –O(kn) –sequentialize I/B Area (++) –kn switches –O(n)

Caltech CS184a Fall DeHon15 Term: Bisection Bandwidth Partition design into two equal size halves Minimize wires (nets) with ends in both halves Number of wires crossing is bisection bandwidth

Caltech CS184a Fall DeHon16 (2) Crossbar Avoid bottleneck Every output gets its own interconnect channel

Caltech CS184a Fall DeHon17 Crossbar

Caltech CS184a Fall DeHon18 Crossbar

Caltech CS184a Fall DeHon19 Crossbar Flexibility (++) –routes everything (guaranteed) Delay (Power) (-) –wire length O(kn) –parasitic stubs: kn+n –series switch: 1 –O(kn) Area (-) –Bisection bandwidth n –kn 2 switches –O(n 2 )

Caltech CS184a Fall DeHon20 Crossbar Too expensive –Switch Area = k*n 2 *2.5K 2 –Switch Area/LUT = k*n* 2.5K 2 –n=1024, k=4 => 10M 2 What can we do?

Caltech CS184a Fall DeHon21 Avoiding Crossbar Costs Typical architecture trick: –exploit expected problem structure

Caltech CS184a Fall DeHon22 Avoiding Crossbar Costs Typical architecture trick: –exploit expected problem structure We have freedom in operator placement Designs have spatial locality =>place connected components “close” together –don’t need full interconnect?

Caltech CS184a Fall DeHon23 Exploit Locality Wires expensive Local interconnect cheap 1D versions? (explore on hmwrk)

Caltech CS184a Fall DeHon24 Exploit Locality Wires expensive Local interconnect cheap Use 2D to make more things closer Mesh?

Caltech CS184a Fall DeHon25 Mesh Analysis Can we place everything close?

Caltech CS184a Fall DeHon26 Mesh “Closeness” Try placing “everything” close

Caltech CS184a Fall DeHon27 Mesh Analysis Flexibility - ? –Ok w/ large w Delay (Power) –Series switches 1--  n –Wire length w--  n –Stubs O(w)--O(w  n) Area –Bisection BW -- w  n –Switches -- O(nw) –O(w 2 n) [linear pop] – larger on homework

Caltech CS184a Fall DeHon28 Mesh Plausible …but What’s w …and how does it grow?

Caltech CS184a Fall DeHon29 Characterize Locality Want to exploit locality How much locality do we have? Impact on resources required?

Caltech CS184a Fall DeHon30 Bisection Bandwidth Bisect design Bisection bandwidth of design –=> lower bound on network bisection bandwidth Design with more locality –=> lower bisection bandwidth Enough? N/2 cutsize

Caltech CS184a Fall DeHon31 Characterizing Locality Single cut not capture locality within halves Cut again –=> recursive bisection

Caltech CS184a Fall DeHon32 Regularizing Growth How do bisection bandwidths shrink (grow) at different levels of bisection hierarchy? Basic assumption: Geometric –1 –1/  –1/  2

Caltech CS184a Fall DeHon33 Geometric Growth (F,  )-bifurcator –F bandwidth at root –geometric regression  at each level

Caltech CS184a Fall DeHon34 Good Model? Log-log plot ==> straight lines represent geometric growth

Caltech CS184a Fall DeHon35 Rent’s Rule Long standing empirical relationship –IO = C*N P –0  P  1.0 –compare (F,  )-bifurcator  = 2 P Captures notion of locality –some signals generated and consumed locally –reconvergent fanout

Caltech CS184a Fall DeHon36 Monday class stopped here

Caltech CS184a Fall DeHon37 Rent’s Rule Typically consider –0.5  P  0.75 “High-Speed” Logic P=0.67 Memory (P~ ) Example (i10) –max C=7, P=0.68 –avg C=5, P=0.72

Caltech CS184a Fall DeHon38 What tell us about design? Recursive bandwidth requirements in network

Caltech CS184a Fall DeHon39 What tell us about design? Recursive bandwidth requirements in network –lower bound on resource requirements N.B. necessary but not sufficient condition on network design – I.e. design must also be able to use the wires

Caltech CS184a Fall DeHon40 What tell us about design? Interconnect lengths –Intuition if p>0.5, everything cannot be nearest neighbor as p grows, so wire distances

Caltech CS184a Fall DeHon41 What tell us about design? Interconnect lengths –IO=(n 2 ) P cross distance n –dIO/dn end at exactly distance n –E(l)=Integral 0 to n=  N of n*(dIO/dn)/n 2 assume iid sources –E(l)=O(N (p-0.5) ) p>0.5

Caltech CS184a Fall DeHon42 What Tell us about design? IO  N P Bisection BW  N P side length  N P –N if p<0.5 Area  N 2p –p>0.5 N.B. 2D VLSI world has “natural” Rent of P=0.5 (area vs. perimeter)

Caltech CS184a Fall DeHon43 Rent’s Rule Caveats Modern “systems” on a chip -- likely to contain subcomponents of varying Rent complexity Less I/O at certain “natural” boundaries System close –(Rent’s Rule apply to workstation, PC, PDA?)

Caltech CS184a Fall DeHon44 Area/Wire Length Bad news –Area ~ O(N 2p ) faster than N –Avg. Wire Length ~ O(N (p-0.5) ) grows with N Can designers/CAD control p (locality) once appreciate its effects? I.e. maybe this cost changes design style/criteria so we mitigate effects?

Caltech CS184a Fall DeHon45 What Rent didn’t tell us Bisection bandwidth purely geometrical No constraint for delay –I.e. a partition may leave critical path weaving between halves

Caltech CS184a Fall DeHon46 Critical Path and Bisection Minimum cut may cross critical path multiple times. Minimizing long wires in critical path => increase cut size.

Caltech CS184a Fall DeHon47 Rent Weakness Not account for path topology ? Can we define a “Temporal” Rent which takes into consideration? –Promising research topic

Caltech CS184a Fall DeHon48 Finishing Up...

Caltech CS184a Fall DeHon49 Big Ideas [MSB Ideas] Interconnect Dominant –power, delay, area Can be bottleneck for designs Can’t afford full crossbar Need to exploit locality Can’t have everything close

Caltech CS184a Fall DeHon50 Big Ideas [MSB Ideas] Rent’s rule characterize locality => Area growth O(N 2p ) p>0.5 => interconnect growing faster than compute elements –expect interconnect to dominate other resources