ESE534: Computer Organization

Slides:



Advertisements
Similar presentations
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Advertisements

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day3: October 2, 2000 Arithmetic and Pipelining.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
Penn ESE Spring DeHon 1 ESE : Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 9, 2015 Scheduled Operator Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 10, 2005 Arithmetic and Pipelining.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 5: January 17, 2003 What is required for Computation?
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 27, 2010 Sequential Logic (FSMs, Pipelining, FSMD)
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
ESE532: System-on-a-Chip Architecture
Dr.Ahmed Bayoumi Dr.Shady Elmashad
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534 Computer Organization
ESE534: Computer Organization
Computer Organization & Design Microcode for Control Sec. 5
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
CS184a: Computer Architecture (Structure and Organization)
ESE534: Computer Organization
ESE534: Computer Organization
Objective of This Course
ESE534: Computer Organization
ESE534: Computer Organization
Chapter 1 Introduction.
Guest Lecturer TA: Shreyas Chand
ESE534: Computer Organization
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
ESE534: Computer Organization
ESE 150 – Digital Audio Basics
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
ECE 352 Digital System Fundamentals
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE534: Computer Organization
Review: The whole processor
ECE 352 Digital System Fundamentals
ESE 150 – Digital Audio Basics
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Presentation transcript:

ESE534: Computer Organization Day 8: February 8, 2012 Operator Sharing, Virtualization, Programmable Architectures Penn ESE534 Spring2012 -- DeHon

Preclass Parity How many gates? Draw solutions Penn ESE534 Spring2012 -- DeHon

Previously Pipelining – reuse in time for same operation Memory Memories pack state compactly densely Penn ESE534 Spring2012 -- DeHon

What is Importance of Memory? Day 7 What is Importance of Memory? Radical Hypothesis: Memory is simply a very efficient organization which allows us to store data compactly (at least, in the technologies we’ve seen to date) A great engineering trick to optimize resources Alternative: memory is a primary Penn ESE534 Spring2012 -- DeHon

Today Operator Sharing (from Day 4) Datapath Operation Virtualization Memory …continue unpacking the role of memory… Penn ESE534 Spring2012 -- DeHon

Design Space for Computation (from Day 4) Penn ESE534 Spring2012 -- DeHon

Compute Function Assume y=Ax2 +Bx +C D(Mpy) > D(Add) E.g. D(Mpy)=24, D(Add)=8 A(Mpy) > A(Add) E.g. A(Mpy)=64, A(Add)=8 Penn ESE534 Spring2012 -- DeHon

Spatial Quadratic D(Quad) = 2*D(Mpy)+D(Add) = 56 Throughput 1/(2*D(Mpy)+D(Add)) = 1/56 A(Quad) = 3*A(Mpy) + 2*A(Add) = 208 Penn ESE534 Spring2012 -- DeHon

Pipelined Spatial Quadratic A(Reg)=4 D(Quad) = 3*D(Mpy) = 72 Throughput 1/D(Mpy) = 1/24 A(Quad) = 3*A(Mpy) + 2*A(Add)+6A(Reg) = 232 Penn ESE534 Spring2012 -- DeHon

Quadratic with Single Multiplier and Adder? We’ve seen reuse to perform the same operation pipelining bit-serial, homogeneous datapath We can also reuse a resource in time to perform a different role. Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Start with one of each operation HW2.4 showed could just user adders Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Multiplier serves multiple roles x*x A*(x*x) B*x Will need to be able to steer data (switch interconnections) Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Multiplier serves multiple roles Inputs x*x A*(x*x) B*x Inputs x, x*x x,A,B Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Multiplier serves multiple roles Inputs x*x A*(x*x) B*x Inputs x, x*x x,A,B Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Adder serves multiple roles Inputs (Bx)+c (A*x*x)+(Bx+c) Inputs one always mpy output C, Bx+C Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Penn ESE534 Spring2012 -- DeHon

Quadratic Datapath Add input register for x Penn ESE534 Spring2012 -- DeHon

Quadratic Control Now, we just need to control the datapath What control? Control: LD x LD x*x MA Select MB Select AB Select LD Bx+C LD Y Penn ESE534 Spring2012 -- DeHon

FSMD FSMD = FSM + Datapath Stylization for building controlled datapaths such as this (a pattern) Of course, an FSMD is just an FSM it’s often easier to think about as a datapath synthesis, place and route tools have been notoriously bad about discovering/exploiting datapath structure Penn ESE534 Spring2012 -- DeHon

Quadratic FSMD Penn ESE534 Spring2012 -- DeHon

Quadratic FSMD Control S0: if (go) LD_X; goto S1 else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A goto S4 S4: AB_SEL=Bx+C, LD_Y goto S0 Penn ESE534 Spring2012 -- DeHon

Quadratic FSMD Control S0: if (go) LD_X; goto S1 else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A goto S4 S4: AB_SEL=Bx+C, LD_Y goto S0 Penn ESE534 Spring2012 -- DeHon

Quadratic FSM D(mux3)=D(mux2)=1 A(mux2)=2 A(mux3)=3 A(QFSM) ~= 10 Latency/Throughput/Area? Latency: 5*(D(MPY)+D(mux3)) = 125 Throughput: 1/Latency = 1/125 Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(QFSM) = 109 Penn ESE534 Spring2012 -- DeHon

Universal Sharing Penn ESE534 Spring2012 -- DeHon

Review Given a task: y=Ax2 +Bx +C Saw how to share primitive operators Got down to one of each Penn ESE534 Spring2012 -- DeHon

Very naively Might seem we need one of each different type of operator Penn ESE534 Spring2012 -- DeHon

..But Doesn’t fool us We already know that nand gate …. are universal (and many other things—HW1.3) …. are universal So, we know, we can build a universal compute operator Penn ESE534 Spring2012 -- DeHon

Temporal Composition Penn ESE534 Spring2012 -- DeHon

Temporal Don’t have to implement all the gates at once Can reuse one gate over time Penn ESE534 Spring2012 -- DeHon

Temporal Decomposition Take Set of gates Sort topologically All predecessors before successors Give a unique number to each gate Hold value of its outputs Use a memory to hold the gate values Sequence through gates Penn ESE534 Spring2012 -- DeHon

Example Logic Penn ESE534 Spring2012 -- DeHon

Numbered Gates Penn ESE534 Spring2012 -- DeHon

Preclass Number gates Penn ESE534 Spring2012 -- DeHon

nor2 Memory/Datapath Penn ESE534 Spring2012 -- DeHon

Programming? How do we program this netlist? Penn ESE534 Spring2012 -- DeHon

Programming? Program gates This is the instruction for the gate Tell each gate where to get its input Tell gate n where its two inputs come from Specify the memory location for the output of the associated gate Each gate operation specified with two addresses (the input sources for gate) This is the instruction for the gate Penn ESE534 Spring2012 -- DeHon

nor2 Memory/Datapath Instruction Penn ESE534 Spring2012 -- DeHon

Supply Instruction How can we supply the sequence of instructions to program this operation? Penn ESE534 Spring2012 -- DeHon

Simplest Programmable Control Use a memory to “record” control instructions “Play” control with sequence Penn ESE534 Spring2012 -- DeHon

Temporal Gate Architecture Penn ESE534 Spring2012 -- DeHon

How program preclass computation? How would we program the preclass computation? Complete the memory Penn ESE534 Spring2012 -- DeHon

Simulate the Logic For Preclass Go around the room calling out: Identify PC Identify instruction Perform nor2 on slot __ and slot ___ Result is ___ Store into slot ___ Penn ESE534 Spring2012 -- DeHon

What does this mean? With only one active component nor gate Can implement any function given appropriate state (memory) muxes (interconnect) Control Penn ESE534 Spring2012 -- DeHon

Defining Terms Fixed Function: Programmable: Computes “any” computable function (e.g. Processor, DSPs, FPGAs) Function defined after fabrication Computes one function (e.g. FP-multiply, divider, DCT) Function defined at fabrication time Penn ESE534 Spring2012 -- DeHon

Result Can sequence together primitive operations in time Communicating state through memory Memory as interconnect To perform “arbitrary” operations Penn ESE534 Spring2012 -- DeHon

“Any” Computation? (Universality) Any computation which can “fit” on the programmable substrate Limitations: hold entire computation and intermediate data Penn ESE534 Spring2012 -- DeHon

Temporal-Spatial Variation Can have any number of gates Tradeoff Area for Reduce Time…. Penn ESE534 Spring2012 -- DeHon

Use of Memory? What did we use memory for here? State Instructions Interconnect Penn ESE534 Spring2012 -- DeHon

“Stored Program” Computer/Processor Can build a datapath that can be programmed to perform any computation. Can be built with limited hardware that is reused in time. Historically: this was a key contribution from Penn’s Moore School Computer Engineers: Eckert and Mauchly ENIACEDVAC (often credited to Von Neumann) Penn ESE534 Spring2012 -- DeHon

What have we done? Taken a computation: y=Ax2 +Bx +C Turned it into operators and interconnect Decomposed operators into a basic primitive: nor, adds Penn ESE534 Spring2012 -- DeHon

What have we done? Said we can implement it on as few as one of compute unit (nor2) Added a unit for state Added an instruction to tell single, universal unit how to act as each operator in original graph Penn ESE534 Spring2012 -- DeHon

Virtualization We’ve virtualized the computation No longer need one physical compute unit for each operator in original computation Can suffice with: shared operator(s) a description of how each operator behaved a place to store the intermediate data between operators Penn ESE534 Spring2012 -- DeHon

Virtualization Penn ESE534 Spring2012 -- DeHon

Why Interesting? Memory compactness This works and was interesting because the area to describe a computation, its interconnect, and its state is much smaller than the physical area to spatially implement the computation e.g. traded multiplier for few memory slots to hold state few memory slots to describe operation time on a shared unit (adder, gate) Penn ESE534 Spring2012 -- DeHon

Questions? Penn ESE534 Spring2012 -- DeHon

Admin HW4 due Monday No new reading for Monday Penn ESE534 Spring2012 -- DeHon

Big Ideas [MSB Ideas] Memory: efficient way to hold state …and allows us to describe/implement computations of unbounded size State can be << computation [area] Resource sharing: key trick to reduce area Memory key tool for Area-Time tradeoffs “configuration” signals allow us to generalize the utility of a computational operator Penn ESE534 Spring2012 -- DeHon

Big Ideas [MSB-1 Ideas] First programmable computing unit Two key functions of memory retiming (interconnect in time) instructions description of computation Penn ESE534 Spring2012 -- DeHon