Penn ESE535 Spring 2009 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day3: October 2, 2000 Arithmetic and Pipelining.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 4: January 22, 2007 Memories.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
Penn ESE Spring DeHon 1 ESE : Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 9, 2015 Scheduled Operator Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 7: February 6, 2012 Memories.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day5:
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 5: February 1, 2010 Memories.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 10, 2005 Arithmetic and Pipelining.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
CS 141 ChienMay 3, 1999 Concepts in Pipelining u Last Time –Midterm Exam, Grading in progress u Today –Concepts in Pipelining u Reminders/Announcements.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 27, 2010 Sequential Logic (FSMs, Pipelining, FSMD)
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 8: February 19, 2014 Memories.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
ESE534: Computer Organization
Lecturer: Alan Christopher
Serial versus Pipelined Execution
ESE534: Computer Organization
An Introduction to pipelining
ESE535: Electronic Design Automation
ESE532: System-on-a-Chip Architecture
8086 processor.
ESE534: Computer Organization
Addressing mode summary
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE532: System-on-a-Chip Architecture
ESE532: System-on-a-Chip Architecture
Presentation transcript:

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing

Penn ESE535 Spring DeHon 2 Last Time How to construct a dataflow graph from a high-level language –From C

Penn ESE535 Spring DeHon 3 Today Sharing Resources Area-Time Tradeoffs Throughput vs. Latency VLIW Architectures

Penn ESE535 Spring DeHon 4 Compute Function Compute: y=Ax 2 +Bx +C Assume –D(Mpy) > D(Add) –A(Mpy) > A(Add)

Penn ESE535 Spring DeHon 5 Spatial Quadratic A(Quad) = 3*A(Mpy) + 2*A(Add)

Penn ESE535 Spring DeHon 6 Latency vs. Throughput Latency: Delay from inputs to output(s) Throughput: Rate at which can introduce new set of inputs

Penn ESE535 Spring DeHon 7 Washer/Dryer Example 1 Washer Takes 30 minutes 1 Dryer Takes 45 minutes How long to do one load of wash? –  Wash latency How long to do 5 loads of wash? Wash Throughput?

Penn ESE535 Spring DeHon 8 Spatial Quadratic D(Quad) = 2*D(Mpy)+D(Add) Throughput 1/(2*D(Mpy)+D(Add)) A(Quad) = 3*A(Mpy) + 2*A(Add)

Penn ESE535 Spring DeHon 9 Pipelined Spatial Quadratic D(Quad) = 3*D(Mpy) Throughput 1/D(Mpy) A(Quad) = 3*A(Mpy) + 2*A(Add)+6A(Reg)

Penn ESE535 Spring DeHon 10 Quadratic with Single Multiplier and Adder? We’ve seen reuse to perform the same operation –pipelining We can also reuse a resource in time to perform a different role. –Here: x*x, A*(x*x), B*x –also: (Bx)+c, (A*x*x)+(Bx+c)

Penn ESE535 Spring DeHon 11 Quadratic Datapath Start with one of each operation

Penn ESE535 Spring DeHon 12 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x Will need to be able to steer data (switch interconnections)

Penn ESE535 Spring DeHon 13 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE535 Spring DeHon 14 Quadratic Datapath Multiplier serves multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE535 Spring DeHon 15 Quadratic Datapath Adder serves multiple roles –(Bx)+c –(A*x*x)+(Bx+c) one always mpy output C, Bx+C

Penn ESE535 Spring DeHon 16 Quadratic Datapath

Penn ESE535 Spring DeHon 17 Quadratic Datapath Add input register for x

Penn ESE535 Spring DeHon 18 Cycle Impact? Add mux delay Register setup/hold time, clock skew Limited by slowest operation

Penn ESE535 Spring DeHon 19 Quadratic Control Now, we just need to control the datapath What control? Control: –LD x –LD x*x –MA Select –MB Select –AB Select –LD Bx+C –LD Y

Penn ESE535 Spring DeHon 20 Quadratic Control 1.LD_X 2.MA_SEL=x,MB_SEL[1:0]=x, LD_x*x 3.MA_SEL=x,MB_SEL[1:0]=B 4.AB_SEL=C,MA_SEL=x*x, MB_SEL=A, LD_Bx+C 5.AB_SEL=Bx+C, LD_Y

Penn ESE535 Spring DeHon 21 Quadratic Memory Control 1.LD_X 2.MA_SEL=x, MB_SEL[1:0]=x, LD_x*x 3.MA_SEL=x, MB_SEL[1:0]=B 4.AB_SEL=C, MA_SEL=x*x, MB_SEL=A, LD_Bx+C 5.AB_SEL=Bx+C, LD_Y

Penn ESE535 Spring DeHon 22 Quadratic Datapath Latency/Throughput/Area? Latency: 5*(D(MPY)+D(mux3)) Throughput: 1/Latency Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(Imem)

Penn ESE535 Spring DeHon 23 Registers  Memory Generally can see many registers If # registers >> physical operators –Only need to access a few at a time Group registers into memory banks

Penn ESE535 Spring DeHon 24 Memory Bank Quadratic Store x x*x B*x A*x 2 ; B*x+c (A*x 2 )+(B*x+c) X +

Penn ESE535 Spring DeHon 25 Memory Bank Quadratic Store x x*x B*x A*x 2 ; B*x+c (A*x 2 )+(B*x+c) X + x x2x2 x B A Bx Ax 2 c Bx+c

Penn ESE535 Spring DeHon 26 Cycle Impact? Add mux delay Register setup/hold time, clock skew Memory read/write –Could pipeline –Impact? Latency Througput? Limited by slowest operation X +

Penn ESE535 Spring DeHon 27 VLIW Very Long Instruction Word Set of operators –Parameterize number, distribution –Gives rise to Area-Time tradeoff More operators  less time, more area Fewer operators  more time, less area Memories for intermediate state Memory for “long” instructions Schedule compute task General, potentially more expensive than customized –Wiring, memories get expensive –Opportunity for further optimizations

Penn ESE535 Spring DeHon 28 VLIW + X X Address Instruction Memory

Penn ESE535 Spring DeHon 29 Summary Reuse physical operators in time Share operators in different roles Allows us to reduce area at expense of increasing time Area-Time tradeoff Pay some sharing overhead –Muxes, memory VLIW – general formulation for shared datapaths

Penn ESE535 Spring DeHon 30 Admin Reading for Wednesday

Penn ESE535 Spring DeHon 31 Big Ideas: Scheduled Operator Sharing