ESE534 Computer Organization Day 7: February 17, 2014 Interconnect Introduction Penn ESE534 Spring2014 -- DeHon
Previously Universal building blocks Computations with gates Penn ESE534 Spring2014 -- DeHon
Today Universal Spatially Programmable Crossbar Programmable compute blocks Start thinking about optimizing interconnect Penn ESE534 Spring2014 -- DeHon
Spatial Programmable Penn ESE534 Spring2014 -- DeHon
Spatially Programmable Program up “any” function Not sequentialize in time E.g. Want to build any FSM or any arithmetic operation Penn ESE534 Spring2014 -- DeHon
Needs? Need a collection of gates. What else will we need? Penn ESE534 Spring2014 -- DeHon
Needs Need some registers Need way to programmably wire gates together Penn ESE534 Spring2014 -- DeHon
Multiplexer Interconnect Use a multiplexer for programmable interconnect Can select any source to be an input for a gate How big is an N-input multiplexer? Penn ESE534 Spring2014 -- DeHon
Sources? What are potential sources? Inputs to circuit -- I Outputs of gates -- G Outputs of registers – R N=I+G+R Penn ESE534 Spring2014 -- DeHon
Sinks Which things need programmable inputs? (and how many?) Circuit outputs -- O Gate – needs one per input -- kG Assuming k-input gates Registers – R M = O+R+kG Penn ESE534 Spring2014 -- DeHon
N-input, M-output Multiplexing Area? Bits to control the multiplexers? Data input switching Capacitance Switched? Delay? Penn ESE534 Spring2014 -- DeHon
Mux Programmable Interconnect Area M×N = (I+G+R) × (O+kG+R) = kG2 + … Scales faster than gates! Penn ESE534 Spring2014 -- DeHon
Interconnect Costs We can do better than this Even when we do better Touch on a little later in lecture Dig into details later in term Even when we do better Interconnect can be dominate Area, delay, energy Particularly for Spatial Architectures Penn ESE534 Spring2014 -- DeHon
Dominant Time LUT is the gate. Definition coming soon… Penn ESE534 Spring2014 -- DeHon
Dominant Power [Energy] XC4003A data from Eric Kusse (UCB MS 1997) [Virtex II, Shang et al., FPGA 2002] [Tuan et al./FPGA 2006] Penn ESE534 Spring2014 -- DeHon
Crossbar Penn ESE534 Spring2014 -- DeHon
Crossbar Allows us to connect any of a set of inputs to any of the outputs. This is functionality provided with our muxes Penn ESE534 Spring2014 -- DeHon
Crossbar Structure Can be more efficient Penn ESE534 Spring2014 -- DeHon
Crossbar Costs Area still goes as M×N Delay proportional to M + N More realistic even for mux implementation Energy still goes as M×N Penn ESE534 Spring2014 -- DeHon
Crossbar Notation Penn ESE534 Spring2014 -- DeHon
Gates with Crossbar Interconnect Penn ESE534 Spring2014 -- DeHon
Programmable, Universal Program to any function of N gates Limitation we only use nand2 gates? What can we say about functions of arbitrary 2-input gates? Penn ESE534 Spring2014 -- DeHon
Universal Computation with Fixed Compute Operator Being minimalists, this shows: do not need compute to be programmable Just use fixed nor2 or nand2 Penn ESE534 Spring2014 -- DeHon
Programmable Functions Penn ESE534 Spring2014 -- DeHon
Mux can be a programmable gate bool mux4(bool a, b, c, d, s0, s1) { return(mux2( mux2(a,b,s0), mux2(c,d,s0), s1)); } Penn ESE534 Spring2014 -- DeHon
Program AND2? How do we program to behave as and2? Penn ESE534 Spring2014 -- DeHon
Mux as Logic bool and2(bool x, y) {return (mux4(false,false,false,true,x,y));} bool or2(bool x, y) {return (mux4(false,true,true,true,x,y));} Just by routing “data” into this mux4, Can select any two input function Penn ESE534 Spring2014 -- DeHon
LUT – LookUp Table When use a mux as programmable gate Call it a LookUp Table (LUT) Implementing the Truth Table for small # of inputs # of inputs =k (need mux-2k) Just lookup the output result in the table Penn ESE534 Spring2014 -- DeHon
Programmable Compute Can use programmable gate in place of nor gate Penn ESE534 Spring2014 -- DeHon
Programmable Compute How do we “program” to perform a particular function? Penn ESE534 Spring2014 -- DeHon
Instructions Set of bits that tell the programmable units how to behave Penn ESE534 Spring2014 -- DeHon
Is an Adder Universal? Assuming interconnect: A: 001a What’s c? (big assumption as we have just seen) Consider: What’s c? A: 001a B: 000b S: 00cd Penn ESE534 Spring2014 -- DeHon
Practically To reduce (some) interconnect, and to reduce number of operations, do tend to build a bit more general “universal” computing function Penn ESE534 Spring2014 -- DeHon
Arithmetic Logic Unit (ALU) Observe: with small tweaks can get many functions with basic adder components Penn ESE534 Spring2014 -- DeHon
Arithmetic and Logic Unit Penn ESE534 Spring2014 -- DeHon
ALU Functions A+B w/ Carry B-A A xor B (squash carry) A*B (squash carry) /A Penn ESE534 Spring2014 -- DeHon
Optimization and Design Space Interconnect Optimization and Design Space Penn ESE534 Spring2014 -- DeHon
Locality Maybe we don’t need to connect everything to everything? Cluster groups of C things at leaves CG gates, CR registers Limit cluster I/O – CI, CO Crossbar within cluster Crossbar among clusters Penn ESE534 Spring2014 -- DeHon
Compare Switches Penn ESE534 Spring2014 -- DeHon
8×16=128 8×4+18×4=104 Penn ESE534 Spring2014 -- DeHon
Comparing Full Crossbar needs: kG2 switches How many switches needed for: CG gates per cluster CI inputs to cluster CO outputs from cluster (ignore registers and circuit input/output) Penn ESE534 Spring2014 -- DeHon
Costs Cluster Input Crossbar: Cluster Output Crossbar: Inputs: CI+CG Outputs: kCG Cluster Output Crossbar: Input: CG Output: CO Master Crossbar: Inputs: (G/CG) × CO Outputs: (G/CG) × CI (G/CG)×CO×(G/CG)×CI +(G/CG) ×(k×CG2+k×CG×CI+CG×CO) Penn ESE534 Spring2014 -- DeHon
Costs Cluster: G2×(CI×CO/CG2)+k×G×(CG+CI+CO) Full Crossbar: kG2 Compare at: CG=8, CI=CO=2, G=256, k=2 Cluster case? Full crossbar case? 216×(2×2/82)+28×(8+2+2)×2 = 212 + 12×29= 212 + 3×211= 5×213 217 Penn ESE534 Spring2014 -- DeHon
More? Can we do it again? Discuss Generalization? Issues? Penn ESE534 Spring2014 -- DeHon
Optimizing Interconnect What other interconnect optimizations/structures might we use? Penn ESE534 Spring2014 -- DeHon
Interconnect Design Space Large interconnect design space We will be exploring systematically Day16—20+25 Penn ESE534 Spring2014 -- DeHon
Big Ideas Interconnect can be programmable Interconnect area/delay/energy can dominate compute area Exploiting structure can reduce area Locality Penn ESE534 Spring2014 -- DeHon
Admin HW2 HW3 Wednesday Drop date is Friday HW4 out On-time submissions graded (late assignments will be graded later) HW3 Wednesday Drop date is Friday HW4 out Reading for Wed. on Web Penn ESE534 Spring2014 -- DeHon