Penn ESE534 Spring2012 -- DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.

Slides:



Advertisements
Similar presentations
Part 4: combinational devices
Advertisements

Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 12: February 21, 2007 Compute 2: Cascades, ALUs, PLAs.
More Basics of CPU Design Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day9:
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 25: April 28, 2014 Interconnect 7: Dynamically Switched Interconnect.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 4: September 12, 2012 Transistor Introduction.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 5: September 8, 2014 Transistor Introduction.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
FIGURE 5-1 MOS Transistor, Symbols, and Switch Models
ESE 150 – Digital Audio Basics
Presentation transcript:

Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction

Previously Universal building blocks Programmable Universal Temporal Architecture Penn ESE534 Spring DeHon 2

3 Today Universal Spatially Programmable Crossbar Programmable compute blocks Hybrid Spatial/Temporal Bus Ring Mesh

Spatial Programmable Penn ESE534 Spring DeHon 4

Temporal Programmable Day 8 5

Spatially Programmable Program up “any” function Not sequentialize in time E.g. Want to build any FSM Penn ESE534 Spring DeHon 6

Needs? Need a collection of gates. What else will we need? Penn ESE534 Spring DeHon 7

Needs Need some registers Need way to programmably wire gates together Penn ESE534 Spring DeHon 8

Multiplexer Interconnect Use a multiplexer for programmable interconnect Can select any source to be an input for a gate How big is an N-input multiplexer? Penn ESE534 Spring DeHon 9

Sources? What are potential sources? –Inputs to circuit -- I –Outputs of gates -- G –Outputs of registers – R –N=I+G+R Penn ESE534 Spring DeHon 10

Sinks Which things need programmable inputs? (and how many?) –Circuit outputs -- O –Gate – needs one per input -- kG Assuming k-input gates –Registers – R –M = O+R+kG Penn ESE534 Spring DeHon 11

N-input, M-output Multiplexing Area? Instruction Bits? Data input switching –Capacitance Switched? –Delay? Control input switching –Capacitance switched? –Delay? Penn ESE534 Spring DeHon 12

Mux Programmable Interconnect Area M×N = (I+G+R) × (O+kG+R) = kG 2 + … Scales faster than gates! Penn ESE534 Spring DeHon 13

Interconnect Costs We can do better than this –Touch on a little later in lecture –Dig into details later in term Even when we do better Interconnect can be dominate –Area, delay, energy –Particularly for Spatial Architectures –(saw in HW4, memory can dominate for Temporal Architectures) Penn ESE534 Spring DeHon 14

Penn ESE534 Spring DeHon 15 Dominant Time

Penn ESE534 Spring DeHon 16 Dominant Power [Energy] XC4003A data from Eric Kusse (UCB MS 1997) [Virtex II, Shang et al., FPGA 2002]

Crossbar Penn ESE534 Spring DeHon 17

Crossbar Allows us to connect any of a set of inputs to any of the outputs. This is functionality provided with our muxes Penn ESE534 Spring DeHon 18

Crossbar Structure Can be more efficient Penn ESE534 Spring DeHon 19

Crossbar Costs Area still goes as M×N Delay proportional to M + N –More realistic even for mux implementation Energy still goes as M×N Penn ESE534 Spring DeHon 20

Crossbar Notation Penn ESE534 Spring DeHon 21

Gates with Crossbar Interconnect Penn ESE534 Spring DeHon 22

Programmable Functions Penn ESE534 Spring DeHon 23

Universal Computation with Fixed Compute Operator Being minimalists, show do not need compute to be programmable –Just use fixed nor2 or nand2 Penn ESE534 Spring DeHon 24

Penn ESE534 Spring DeHon Mux can be a programmable gate bool mux4(bool a, b, c, d, s0, s1) { return(mux2( mux2(a,b,s0), mux2(c,d,s0), s1)); } 25

Penn ESE534 Spring DeHon Mux as Logic bool and2(bool x, y) {return (mux4(false,false,false,true,x,y));} bool or2(bool x, y) {return (mux4(false,true,true,true,x,y));} Just by routing “data” into this mux4, –Can select any two input function 26

Penn ESE534 Spring DeHon Programmable Compute Can use programmable gate in place of nor gate 27 Specifying the function of the gate becomes part of the instruction.

Penn ESE534 Spring DeHon 28 Is an Adder Universal? Assuming interconnect: –(big assumption as we have just seen) –Consider: What’s c? A: 001a B: 000b S: 00cd

Penn ESE534 Spring DeHon 29 Practically To reduce (some) interconnect, and to reduce number of operations, do tend to build a bit more general “universal” computing function

Penn ESE534 Spring DeHon 30 Arithmetic Logic Unit (ALU) Observe: –with small tweaks can get many functions with basic adder components

ALU Size Adder took 6 2-input gates. How many 2-input gates did your ALU bitslice require? (HW4.1d?) Penn ESE534 Spring DeHon 31

Penn ESE534 Spring DeHon 32 Arithmetic and Logic Unit

Penn ESE534 Spring DeHon 33 ALU Functions A+B w/ Carry B-A A xor B (squash carry) A*B (squash carry) /A

Penn ESE534 Spring DeHon 34 Slightly more conventional Programmable Architecture

Penn ESE534 Spring DeHon 35 Instructions Identify the bits which control the function of our programmable device as: –Instructions

Multibit Word Ops What are we doing when we make the ALU (and register file) width > 1 –E.g. w=16 on HW4 Benefit? Limitation? Penn ESE534 Spring DeHon 36

Interconnect Optimization and Design Space Penn ESE534 Spring DeHon 37

Switching w-bit words Consider grouping outputs (inputs) into w-bit words –E.g. maybe operators are 16-bit ALUs How does this change switching requirements? –Don’t need to switch bit 3 to bit 7 –Reduces switching needed Penn ESE534 Spring DeHon 38

Switching words (w=2 shown) Penn ESE534 Spring DeHon 39

Also share memories Penn ESE534 Spring DeHon 40

Switching w-bit words N/w w-bit inputs, M/w w-bit outputs Instruction Bits –Factor of w fewer outputs to switch –Factor of w fewer inputs  M/w(log 2 (N/w)) Area: –Factor of w fewer switches, w 2 memories Delay: –Factor of w fewer sources Energy: –Factor of w fewer switches Penn ESE534 Spring DeHon 41

Locality Maybe we don’t need to connect everything to everything? Cluster groups of C things at leaves –CG gates, CR registers –Limit cluster I/O – CI, CO –Crossbar within cluster –Crossbar among clusters Penn ESE534 Spring DeHon 42

Compare Switches Penn ESE534 Spring DeHon 43

Penn ESE534 Spring DeHon 44 8×16=128 8 × 4+18 × 4=104

Comparing Full Crossbar needs: kG 2 switches How many switches needed for: –CG gates per cluster –CI inputs to cluster –CO outputs from cluster –(ignore registers and circuit input/output) Penn ESE534 Spring DeHon 45

Costs Cluster Input Crossbar: –Inputs: CI+CG –Outputs: kCG Cluster Output Crossbar: –Input: CG –Output: CO Master Crossbar: –Inputs: (G/CG) × CO –Outputs: (G/CG) × CI (G/CG)×CO×(G/CG)×CI +(G/CG) ×(k×CG 2 +k×CG×CI+CG×CO) Penn ESE534 Spring DeHon 46

Costs Cluster: G 2 ×(CI×CO/CG 2 )+k×G×(CG+CI+CO) Full Crossbar: kG 2 Compare at: CG=8, CI=CO=2, G=256, k=2 –Cluster case? –Full crossbar case? Penn ESE534 Spring DeHon 47

Hybrid Temporal/Spatial Penn ESE534 Spring DeHon 48

Extremes Fully SpatialFully Temporal Penn ESE534 Spring DeHon 49

General Case Between How many concurrent operators? How much serialization? Penn ESE534 Spring DeHon 50

Separate Data Memory and Compute Memory banks and compute Penn ESE534 Spring DeHon 51

Crossbar Generalized What’s different about this crossbar? –Compared to one we used in purely sequential case in first part of lecture? Penn ESE534 Spring DeHon 52

Dynamic Crossbar Need to switch crossbar configuration on each cycle Penn ESE534 Spring DeHon 53

Dynamic Crossbar Switching time matters Must also supply crossbar controls –More wires into array –How many? Area –Bit-level switching case? –W-bit word case? Penn ESE534 Spring DeHon 54 N (1+log(N)) M ~N (1+log(N)/W) M

Note on Remainder Rest of lecture to introduce issues –Be illustrative Not intended to be comprehensive Will return to interconnect and address systematically starting on Day 15 Penn ESE534 Spring DeHon 55

Class Ended Here Penn ESE534 Spring DeHon 56

Local Memory Case Put memory local to compute Penn ESE534 Spring DeHon 57

Reduce Interconnect? How can we reduce interconnect? –Maybe don’t need to deliver a non-local value to every bank on every cycle? –Maybe don’t need to communicate everywhere? Penn ESE534 Spring DeHon 58

Single Global Bus Penn ESE534 Spring DeHon 59

Single Global Bus Pros and Cons? Penn ESE534 Spring DeHon 60

Multiple Global Busses Penn ESE534 Spring DeHon 61

Interconnect Resource Can communication B values per cycle Could be dominant area/energy –Don’t want too large Could be bottleneck in computation? –Don’t want too small Example of an architectural parameter Penn ESE534 Spring DeHon 62

Nearest Neighbor Interconnect Penn ESE534 Spring DeHon 63

Nearest Neighbor Interconnect Compare to crossbar? –Number can transmit per cycle? –Area? Penn ESE534 Spring DeHon 64

Ring Interconnect Penn ESE534 Spring DeHon 65

Ring Interconnect Compare to bus? –Area? –Cycle time? –Data transfers/cycle? Penn ESE534 Spring DeHon 66

Ring Interconnect Compare to xbar? –Area? –Cycle time? –Data transfers/cycle? Penn ESE534 Spring DeHon 67

Mesh Interconnect Nearest neighbors in 2D Penn ESE534 Spring DeHon 68

Mesh Interconnect Compare to ring? –Area? –Latency? –Throughput? Penn ESE534 Spring DeHon 69

Interconnect Design Space Large interconnect design space We will be exploring systematically –Day15—18+24 Penn ESE534 Spring DeHon 70

Admin Drop Date Friday HW5 out – 1 problem due Monday –Next due following Monday No class next Wednesday (2/23) –Class this Wednesday and Monday –Office hours this Tuesday (not next) Reading for Wednesday, Monday on Blackboard Penn ESE534 Spring DeHon 71

Penn ESE534 Spring DeHon 72 Big Ideas Interconnect can be programmable Interconnect area/delay/energy can dominate compute area Exploiting structure can reduce area –Word structure –Locality