Download presentation
Presentation is loading. Please wait.
Published byJune Patrick Modified over 9 years ago
1
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon
2
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Overview Motivation for Dynamic Reconfiguration Limitations of Reconfigurable Approaches Hardware support for reconfiguration Examples application -ASCII to Hex conversion (Dehon)
3
Lecture 17: Dynamic Reconfiguration I November 10, 2004 DPGA Configuration selects operation of computation unit Context identifier changes over time to allow change in functionality DPGA – Dynamically Programmable Gate Array............ in Computation Unit (LUT) out Address Inputs (Inst. Store) Context Identifier Programming may differ for each element
4
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Computations that Benefit from Reconfiguration Low throughput tasks Data dependent operations Effective if not all resources active simultaneously. Possible to time-multiplex both logic and routing resources. A B F0F0 F1F1 F2F2 Non-pipelined example
5
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Resource Reuse Example circuit: Part of ASCII -> Hex design Computation can be broken up by considering this a data flow graph.
6
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Resource Reuse Resources must be directed to do different things at different times through instructions. Different local configurations can be thought of as instructions Minimizing the number and size of instructions a key to successfully achieving efficient design. What are the implications for the hardware?
7
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Previous Study (DeHon) Interconnect Mux Logic Reuse °A ctxt 80K 2 dense encoding °A base 800K 2 * Each context no overly costly compared to base cost of wire, switches, IO circuitry Question: How does this effect scale?
8
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Exploring the Tradeoffs Assume ideal packing: N active = N total /L Reminder: c*A ctxt = A base Difficult to exactly balance resources/demands Needs for contexts may vary across applications Robust point where critical path length equals # contexts.
9
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Implementation Choices Both require same amount of execution time Implementation #1 more resource efficient. Implementation #1Implementation #2 N A = 3N A = 4
10
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Scheduling Limitations N A = size of largest stage in terms of active LUTs Precedence -> a LUT can only be evaluated after predecessors have been evaluated. Need to assign design LUTs to device -LUTs at specific contexts. Consider formulation for scheduling. What are the choices?
11
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Scheduling ASAP (as soon as possible) -Propagate depth forward from primary inputs -Depth = 1 + max input length ALAP (as late as possible) -Propagate distance from outputs backwards towards inputs -Level = 1 + max output consumption level Slack -Slack = L + 1 – (depth + level) -PI depth = 0, PO level = 0
12
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Slack Example Note connection from C1 to O1 Critical path will have 0 slack Admittedly small example
13
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Sequentialization Adding time slots allows for potential increase in hardware efficiency This comes at the cost of increased latency Adding slack allows better balance. -L=4 N A = 2 (4 or 3 contexts)
14
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Full ASCII -> Hex Circuit Logically three levels of dependence * Single Context: 21 LUTs @ 880K 2 =18.5M 2
15
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Time-multiplexed version Three contexts: 12 LUTs @ 1040K 2 =12.5M 2 Pipelining need for dependent paths.
16
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Context Optimization With enough contexts only one LUT needed. Leads to poor latency. Increased LUT area due to additional stored configuration information Eventually additional interconnect savings taken up by LUT configuration overhead Ideal = perfect scheduling spread + no retime overhead
17
Lecture 17: Dynamic Reconfiguration I November 10, 2004 General Throughput Mapping Useful if only limited throughput is desired. Target produces new result every t cycles (e.g. a t LUT path) Spatially pipeline every t stages. -Cycle = t Retime to minimize register requirement Multi-context evaluation within a spatial stage. Retime to minimize resource usage Map for depth, i, and contexts, C
18
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Dharma Architecture (UC Berkeley) Allows for levelized circuit to be executed Design parameters - #DLM -K -> number of DLM inputs -L -> number of levels
19
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example Dharma Circuit
20
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Levelization of Circuit Levelization performed on basis od dependency graph. Functions implemented as 3 input LUTs
21
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Detailed View of Dharma
22
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example: DPGA Prototype
23
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Example: DPGA Area
24
Lecture 17: Dynamic Reconfiguration I November 10, 2004 Summary Multiple contexts can be used to combat wire inactivity and logic latency Too many contexts lead to inefficiencies due to retiming registers and extra LUTs Architectures such as DPGA and Dharma address these issues through contexts Run-time system needed to handle dynamic reconfiguration.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.