Download presentation
Presentation is loading. Please wait.
1
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming
2
So Far W/ HSRA –…lots of emphasis on retiming, already
3
Today Formalizing the transformations Retiming Algorithm C-slow
4
Motivation FPGAs (spatial computing) –run efficiently when all resources reused rapidly cycle time minimized HSRA –design to embody/require this –correct operation depends on match basic cycle components in architecture “Everything in the right place at the right time.”
5
Task Move registers to: –Minimize path length between registers –(make path length 1 for HSRA) –Maximize reuse rate –…while minimizing number of registers required
6
Simple Example Path Length (L) = 4 Can we do better?
7
Legal Register Moves Retiming Lag/Lead
8
Canonical Graph Representation Separate arch for each path Weight edges by number of registers (weight nodes by delay through node)
9
Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.
10
Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))
11
Valid Retiming Retiming is valid as long as: – e in graph weight(e) = weight(e) + lag(head(e))-lag(tail(e)) 0 Assuming original circuit was a valid synchronous circuit, this guarantees: –non-negative register weights on all edges no travel backward in time :-) –all cycles have strictly positive register counts –propagation delay on each vertex is non- negative (assumed 1 for today)
12
Retiming Task Move registers assign lags to nodes –lags define all locally legal moves Preserving non-negative edge weights –(previous slide) –guarantees collection of lags remains consistent globally
13
Retiming Transformation N.B. -- unchanged by retiming –number of registers around a cycle –delay along a cycle Cycle of length P must have –at least P/c registers on it –to be retimeable to cycle c
14
Optimal Retiming There is a retiming of –graph G –w/ clock cycle c –iff G-1/c has no cycles with negative edge weights G- subtract from each edge weight
15
G-1/c
16
Compute Retiming Lag(v) = shortest path to I/O in G-1/c Compute shortest paths in O(|V||E|) –Bellman-Ford –also use to detect negative weight cycles when c too small
17
Bellman Ford For I 0 to N –u i (except u i =0 for IO) For k 0 to N –for e i,j E u i min(u i, u j +w(e i,j )) for e i,j E if u i >u j +w(e i,j ) –cycles detected
18
Apply to Example
19
Apply: Find Lags
20
Apply: Lags
21
Apply: Move Registers weight(e) = weight(e) + lag(head(e))-lag(tail(e))
22
Apply: Retimed
23
Apply: Retimed Design
24
Revise Example (fanout delay)
25
Revised: Graph
27
Revised: C=1?
28
Revised: C=2?
29
Revised: Lag
30
Take ceiling to convert to integer lags:
31
Revised: Apply Lag
32
Revised: Retimed
33
C>1 ==> Pipeline
34
Add Registers
35
Pipeline Retiming: Lag
36
Pipelined Retimed
37
Real Cycle
39
Cycle C=1?
40
Cycle C=2?
41
Cycle: C-slow Cycle=c C-slow network has Cycle=1
42
2-slow Cycle C=1
43
2-Slow Lags
44
2-Slow Retime
45
Retimed 2-Slow Cycle
46
C-Slow applicable? Available parallelism –solve C identical, independent problems e.g. process packets (blocks) separately e.g. independent regions in images Commutative operators –e.g. max example
47
HSRA Retiming One additional twist –long, pipelined interconnect need more than one register on paths
48
Accommodating HSRA Interconnect Delays Add buffers to LUT LUT path to match interconnect register requirements Retime to C=1 as before Buffer chains force enough registers to cover interconnect delays
49
Accommodating HSRA Interconnect Delays
50
Summary Retiming important to –minimize cycles –efficiently utilize spatial architectures Optimally solvable in O(|V||E|) time Tells us –pipelining required –C-slow –where to move registers
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.