Download presentation
Presentation is loading. Please wait.
1
ESE534 Computer Organization
Day 7: February 17, 2014 Interconnect Introduction Penn ESE534 Spring DeHon
2
Previously Universal building blocks Computations with gates
Penn ESE534 Spring DeHon
3
Today Universal Spatially Programmable Crossbar
Programmable compute blocks Start thinking about optimizing interconnect Penn ESE534 Spring DeHon
4
Spatial Programmable Penn ESE534 Spring DeHon
5
Spatially Programmable
Program up “any” function Not sequentialize in time E.g. Want to build any FSM or any arithmetic operation Penn ESE534 Spring DeHon
6
Needs? Need a collection of gates. What else will we need?
Penn ESE534 Spring DeHon
7
Needs Need some registers Need way to programmably wire gates together
Penn ESE534 Spring DeHon
8
Multiplexer Interconnect
Use a multiplexer for programmable interconnect Can select any source to be an input for a gate How big is an N-input multiplexer? Penn ESE534 Spring DeHon
9
Sources? What are potential sources? Inputs to circuit -- I
Outputs of gates -- G Outputs of registers – R N=I+G+R Penn ESE534 Spring DeHon
10
Sinks Which things need programmable inputs? (and how many?)
Circuit outputs -- O Gate – needs one per input -- kG Assuming k-input gates Registers – R M = O+R+kG Penn ESE534 Spring DeHon
11
N-input, M-output Multiplexing
Area? Bits to control the multiplexers? Data input switching Capacitance Switched? Delay? Penn ESE534 Spring DeHon
12
Mux Programmable Interconnect
Area M×N = (I+G+R) × (O+kG+R) = kG2 + … Scales faster than gates! Penn ESE534 Spring DeHon
13
Interconnect Costs We can do better than this Even when we do better
Touch on a little later in lecture Dig into details later in term Even when we do better Interconnect can be dominate Area, delay, energy Particularly for Spatial Architectures Penn ESE534 Spring DeHon
14
Dominant Time LUT is the gate. Definition coming soon…
Penn ESE534 Spring DeHon
15
Dominant Power [Energy]
XC4003A data from Eric Kusse (UCB MS 1997) [Virtex II, Shang et al., FPGA 2002] [Tuan et al./FPGA 2006] Penn ESE534 Spring DeHon
16
Crossbar Penn ESE534 Spring DeHon
17
Crossbar Allows us to connect any of a set of inputs to any of the outputs. This is functionality provided with our muxes Penn ESE534 Spring DeHon
18
Crossbar Structure Can be more efficient
Penn ESE534 Spring DeHon
19
Crossbar Costs Area still goes as M×N Delay proportional to M + N
More realistic even for mux implementation Energy still goes as M×N Penn ESE534 Spring DeHon
20
Crossbar Notation Penn ESE534 Spring DeHon
21
Gates with Crossbar Interconnect
Penn ESE534 Spring DeHon
22
Programmable, Universal
Program to any function of N gates Limitation we only use nand2 gates? What can we say about functions of arbitrary 2-input gates? Penn ESE534 Spring DeHon
23
Universal Computation with Fixed Compute Operator
Being minimalists, this shows: do not need compute to be programmable Just use fixed nor2 or nand2 Penn ESE534 Spring DeHon
24
Programmable Functions
Penn ESE534 Spring DeHon
25
Mux can be a programmable gate
bool mux4(bool a, b, c, d, s0, s1) { return(mux2( mux2(a,b,s0), mux2(c,d,s0), s1)); } Penn ESE534 Spring DeHon
26
Program AND2? How do we program to behave as and2?
Penn ESE534 Spring DeHon
27
Mux as Logic bool and2(bool x, y)
{return (mux4(false,false,false,true,x,y));} bool or2(bool x, y) {return (mux4(false,true,true,true,x,y));} Just by routing “data” into this mux4, Can select any two input function Penn ESE534 Spring DeHon
28
LUT – LookUp Table When use a mux as programmable gate
Call it a LookUp Table (LUT) Implementing the Truth Table for small # of inputs # of inputs =k (need mux-2k) Just lookup the output result in the table Penn ESE534 Spring DeHon
29
Programmable Compute Can use programmable gate in place of nor gate
Penn ESE534 Spring DeHon
30
Programmable Compute How do we “program” to perform a particular function? Penn ESE534 Spring DeHon
31
Instructions Set of bits that tell the programmable units how to behave Penn ESE534 Spring DeHon
32
Is an Adder Universal? Assuming interconnect: A: 001a What’s c?
(big assumption as we have just seen) Consider: What’s c? A: 001a B: 000b S: 00cd Penn ESE534 Spring DeHon
33
Practically To reduce (some) interconnect,
and to reduce number of operations, do tend to build a bit more general “universal” computing function Penn ESE534 Spring DeHon
34
Arithmetic Logic Unit (ALU)
Observe: with small tweaks can get many functions with basic adder components Penn ESE534 Spring DeHon
35
Arithmetic and Logic Unit
Penn ESE534 Spring DeHon
36
ALU Functions A+B w/ Carry B-A A xor B (squash carry)
A*B (squash carry) /A Penn ESE534 Spring DeHon
37
Optimization and Design Space
Interconnect Optimization and Design Space Penn ESE534 Spring DeHon
38
Locality Maybe we don’t need to connect everything to everything?
Cluster groups of C things at leaves CG gates, CR registers Limit cluster I/O – CI, CO Crossbar within cluster Crossbar among clusters Penn ESE534 Spring DeHon
39
Compare Switches Penn ESE534 Spring DeHon
40
8×16=128 8×4+18×4=104 Penn ESE534 Spring DeHon
41
Comparing Full Crossbar needs: kG2 switches
How many switches needed for: CG gates per cluster CI inputs to cluster CO outputs from cluster (ignore registers and circuit input/output) Penn ESE534 Spring DeHon
42
Costs Cluster Input Crossbar: Cluster Output Crossbar:
Inputs: CI+CG Outputs: kCG Cluster Output Crossbar: Input: CG Output: CO Master Crossbar: Inputs: (G/CG) × CO Outputs: (G/CG) × CI (G/CG)×CO×(G/CG)×CI +(G/CG) ×(k×CG2+k×CG×CI+CG×CO) Penn ESE534 Spring DeHon
43
Costs Cluster: G2×(CI×CO/CG2)+k×G×(CG+CI+CO) Full Crossbar: kG2
Compare at: CG=8, CI=CO=2, G=256, k=2 Cluster case? Full crossbar case? 216×(2×2/82)+28×(8+2+2)×2 = ×29= ×211= 5×213 217 Penn ESE534 Spring DeHon
44
More? Can we do it again? Discuss Generalization? Issues?
Penn ESE534 Spring DeHon
45
Optimizing Interconnect
What other interconnect optimizations/structures might we use? Penn ESE534 Spring DeHon
46
Interconnect Design Space
Large interconnect design space We will be exploring systematically Day16—20+25 Penn ESE534 Spring DeHon
47
Big Ideas Interconnect can be programmable
Interconnect area/delay/energy can dominate compute area Exploiting structure can reduce area Locality Penn ESE534 Spring DeHon
48
Admin HW2 HW3 Wednesday Drop date is Friday HW4 out
On-time submissions graded (late assignments will be graded later) HW3 Wednesday Drop date is Friday HW4 out Reading for Wed. on Web Penn ESE534 Spring DeHon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.