Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

Presenter MaxAcademy Lecture Series – V1.0, September 2011 Stream Scheduling.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
ICS 252 Introduction to Computer Design
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
CBSSS 2002: DeHon Universal Programming Organization André DeHon Tuesday, June 18, 2002.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
Pipelining and Retiming 1
ESE534 Computer Organization
CS184a: Computer Architecture (Structure and Organization)
Architecture & Organization 1
Instructor: Dr. Phillip Jones
Lecture 10 Sequential Circuit ATPG Time-Frame Expansion
ESE534: Computer Organization
Architecture & Organization 1
Day 26: November 1, 2013 Synchronous Circuits
Serial versus Pipelined Execution
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
CprE / ComS 583 Reconfigurable Computing
CS184a: Computer Architecture (Structures and Organization)
ICS 252 Introduction to Computer Design
ESE534: Computer Organization
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Overview Motivation for Dynamic Reconfiguration Limitations of Reconfigurable Approaches Hardware support for reconfiguration Examples application -ASCII to Hex conversion (Dehon)

Lecture 17: Dynamic Reconfiguration I November 10, 2004 DPGA Configuration selects operation of computation unit Context identifier changes over time to allow change in functionality DPGA – Dynamically Programmable Gate Array in Computation Unit (LUT) out Address Inputs (Inst. Store) Context Identifier Programming may differ for each element

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Computations that Benefit from Reconfiguration Low throughput tasks Data dependent operations Effective if not all resources active simultaneously. Possible to time-multiplex both logic and routing resources. A B F0F0 F1F1 F2F2 Non-pipelined example

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Resource Reuse Example circuit: Part of ASCII -> Hex design Computation can be broken up by considering this a data flow graph.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Resource Reuse Resources must be directed to do different things at different times through instructions. Different local configurations can be thought of as instructions Minimizing the number and size of instructions a key to successfully achieving efficient design. What are the implications for the hardware?

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Previous Study (DeHon) Interconnect Mux Logic Reuse °A ctxt  80K 2 dense encoding °A base  800K 2 * Each context no overly costly compared to base cost of wire, switches, IO circuitry Question: How does this effect scale?

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Exploring the Tradeoffs Assume ideal packing: N active = N total /L Reminder: c*A ctxt = A base Difficult to exactly balance resources/demands Needs for contexts may vary across applications Robust point where critical path length equals # contexts.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Implementation Choices Both require same amount of execution time Implementation #1 more resource efficient. Implementation #1Implementation #2 N A = 3N A = 4

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Scheduling Limitations N A = size of largest stage in terms of active LUTs Precedence -> a LUT can only be evaluated after predecessors have been evaluated. Need to assign design LUTs to device -LUTs at specific contexts. Consider formulation for scheduling. What are the choices?

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Scheduling ASAP (as soon as possible) -Propagate depth forward from primary inputs -Depth = 1 + max input length ALAP (as late as possible) -Propagate distance from outputs backwards towards inputs -Level = 1 + max output consumption level Slack -Slack = L + 1 – (depth + level) -PI depth = 0, PO level = 0

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Slack Example Note connection from C1 to O1 Critical path will have 0 slack Admittedly small example

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Sequentialization Adding time slots allows for potential increase in hardware efficiency This comes at the cost of increased latency Adding slack allows better balance. -L=4 N A = 2 (4 or 3 contexts)

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Full ASCII -> Hex Circuit Logically three levels of dependence * Single Context: K 2 =18.5M 2

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Time-multiplexed version Three contexts: K 2 =12.5M 2 Pipelining need for dependent paths.

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Context Optimization With enough contexts only one LUT needed. Leads to poor latency. Increased LUT area due to additional stored configuration information Eventually additional interconnect savings taken up by LUT configuration overhead Ideal = perfect scheduling spread + no retime overhead

Lecture 17: Dynamic Reconfiguration I November 10, 2004 General Throughput Mapping Useful if only limited throughput is desired. Target produces new result every t cycles (e.g. a t LUT path) Spatially pipeline every t stages. -Cycle = t Retime to minimize register requirement Multi-context evaluation within a spatial stage. Retime to minimize resource usage Map for depth, i, and contexts, C

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Dharma Architecture (UC Berkeley) Allows for levelized circuit to be executed Design parameters - #DLM -K -> number of DLM inputs -L -> number of levels

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example Dharma Circuit

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Levelization of Circuit Levelization performed on basis od dependency graph. Functions implemented as 3 input LUTs

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Detailed View of Dharma

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example: DPGA Prototype

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example: DPGA Area

Lecture 17: Dynamic Reconfiguration I November 10, 2004 Summary Multiple contexts can be used to combat wire inactivity and logic latency Too many contexts lead to inefficiencies due to retiming registers and extra LUTs Architectures such as DPGA and Dharma address these issues through contexts Run-time system needed to handle dynamic reconfiguration.