Presentation is loading. Please wait.

Presentation is loading. Please wait.

Penn ESE680-002 Fall2007 -- DeHon 1 ESE680-002 (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.

Similar presentations


Presentation on theme: "Penn ESE680-002 Fall2007 -- DeHon 1 ESE680-002 (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations."— Presentation transcript:

1 Penn ESE680-002 Fall2007 -- DeHon 1 ESE680-002 (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations

2 Penn ESE680-002 Fall2007 -- DeHon 2 Previously Reviewed Pipelining –basic assignments on Saw spatial designs efficient –when reuse logic at maximum frequency Interconnect is dominant delay –and dominant area –heavy call to reuse to use efficiently

3 Penn ESE680-002 Fall2007 -- DeHon 3 Today Systematic transformation for retiming –preserve semantics (meaning)

4 Penn ESE680-002 Fall2007 -- DeHon 4 Motivation

5 Penn ESE680-002 Fall2007 -- DeHon 5 Motivation FPGAs (spatial computing) –run efficiently when all resources reused rapidly cycle time minimized “Everything in the right place at the right time.”

6 Penn ESE680-002 Fall2007 -- DeHon 6 Motivating Questions Can I build a fixed-frequency (fixed clock) programmable architecture? Can I always make a design run at maximum clock rate? How do we systematically transform any computation to –Operate on fixed-frequency array? –Coordinate around mandatory registers in design?

7 Penn ESE680-002 Fall2007 -- DeHon 7 Interconnect Retiming Long Paths Slow Could limit cycle Add registers to long distance interconnect –At each switch? –In the middle of long wires? How justify these registers?

8 Penn ESE680-002 Fall2007 -- DeHon 8 Spatial Quadratic How do we pipeline a design? Day 3

9 Penn ESE680-002 Fall2007 -- DeHon 9 Pipelined Spatial Quadratic Day 3

10 Penn ESE680-002 Fall2007 -- DeHon 10 How do you use?

11 Penn ESE680-002 Fall2007 -- DeHon 11

12 Penn ESE680-002 Fall2007 -- DeHon 12 How do you use? To compute A*B+C*D+E

13 Penn ESE680-002 Fall2007 -- DeHon 13 Compute A*B+C*D+E

14 Penn ESE680-002 Fall2007 -- DeHon 14 How Compute? Y i =Y i-1 xor X i With pipelined nand2 gates?

15 Penn ESE680-002 Fall2007 -- DeHon 15 want have

16 Penn ESE680-002 Fall2007 -- DeHon 16

17 Penn ESE680-002 Fall2007 -- DeHon 17 Retiming Algorithm

18 Penn ESE680-002 Fall2007 -- DeHon 18 Task Move registers to: –Preserve semantics –Minimize path length between registers –i.e. Make path length 1 for maximum throughput or reuse –…while minimizing number of registers required

19 Penn ESE680-002 Fall2007 -- DeHon 19 Simple Example Path Length (L) = 4 Can we do better?

20 Penn ESE680-002 Fall2007 -- DeHon 20 Legal Register Moves Retiming Lag/Lead

21 Penn ESE680-002 Fall2007 -- DeHon 21 Canonical Graph Representation Separate arc for each path Weight edges by number of registers (weight nodes by delay through node)

22 Penn ESE680-002 Fall2007 -- DeHon 22 Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.

23 Penn ESE680-002 Fall2007 -- DeHon 23 Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))

24 Penn ESE680-002 Fall2007 -- DeHon 24 Valid Retiming Retiming is valid as long as: –  e in graph weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 Assuming original circuit was a valid synchronous circuit, this guarantees: –non-negative register weights on all edges no travel backward in time :-) –all cycles have strictly positive register counts –propagation delay on each vertex is non-negative (assumed 1 for today)

25 Penn ESE680-002 Fall2007 -- DeHon 25 Retiming Task Move registers  assign lags to nodes –lags define all locally legal moves Preserving non-negative edge weights –(previous slide) –guarantees collection of lags remains consistent globally

26 Penn ESE680-002 Fall2007 -- DeHon 26 Retiming Transformation N.B.: unchanged by retiming –number of registers around a cycle –delay along a cycle Cycle of length P must have –at least P/c registers on it to be retimeable to cycle c –Can be computed from invariant above

27 Penn ESE680-002 Fall2007 -- DeHon 27 Optimal Retiming There is a retiming of –graph G –w/ clock cycle c –iff G-1/c has no cycles with negative edge weights G-   subtract  from each edge weight

28 Penn ESE680-002 Fall2007 -- DeHon 28 1/c Intuition Want to place a register every c delay units Each register adds one Each delay subtracts 1/c As long as remains more positives than negatives around all cycles –can move registers to accommodate –Captures the regs=P/c constraints

29 Penn ESE680-002 Fall2007 -- DeHon 29 G-1/c

30 Penn ESE680-002 Fall2007 -- DeHon 30 Compute Retiming Lag(v) = shortest path to I/O in G-1/c Compute shortest paths in O(|V||E|) –Bellman-Ford –also use to detect negative weight cycles when c too small

31 Penn ESE680-002 Fall2007 -- DeHon 31 Bellman Ford For I  0 to N –u i  (except u i =0 for IO) For k  0 to N –for e i,j  E u i  min(u i, u j +w(e i,j )) For e i,j  E //still update  negative cycle if u i >u j +w(e i,j ) –cycles detected

32 Penn ESE680-002 Fall2007 -- DeHon 32 Apply to Example

33 Penn ESE680-002 Fall2007 -- DeHon 33 Try c=1

34 Penn ESE680-002 Fall2007 -- DeHon 34 Apply: Find Lags Negative weight cycles? Shortest paths?

35 Penn ESE680-002 Fall2007 -- DeHon 35 Apply: Lags

36 Penn ESE680-002 Fall2007 -- DeHon 36 Apply: Move Registers weight(e) = weight(e) + lag(head(e))-lag(tail(e)) 1 1 1 11 11 1 2 1 2 3 4 Animation Seq. #’s

37 Penn ESE680-002 Fall2007 -- DeHon 37 Apply: Retimed

38 Penn ESE680-002 Fall2007 -- DeHon 38 Apply: Retimed Design

39 Penn ESE680-002 Fall2007 -- DeHon 39 Revise Example (fanout delay)

40 Penn ESE680-002 Fall2007 -- DeHon 40 Revised: Graph

41 Penn ESE680-002 Fall2007 -- DeHon 41 Revised: Graph

42 Penn ESE680-002 Fall2007 -- DeHon 42 Revised: C=1?

43 Penn ESE680-002 Fall2007 -- DeHon 43 Revised: C=2?

44 Penn ESE680-002 Fall2007 -- DeHon 44 Revised: Lag

45 Penn ESE680-002 Fall2007 -- DeHon 45 Revised: Lag Take ceiling to convert to integer lags: 0 0

46 Penn ESE680-002 Fall2007 -- DeHon 46 Revised: Apply Lag 0 0

47 Penn ESE680-002 Fall2007 -- DeHon 47 Revised: Apply Lag 1 101 1 0 1 10 0 1 1 0 0 0 1 243 7 5 11 108 9 6 12 13

48 Penn ESE680-002 Fall2007 -- DeHon 48 Revised: Retimed 1 101 10 1 1 0 0 1 10

49 Penn ESE680-002 Fall2007 -- DeHon 49 Pipelining We can use this retiming to pipeline Assume we have enough (infinite supply) registers at edge of circuit Retime them into circuit

50 Penn ESE680-002 Fall2007 -- DeHon 50 C>1 ==> Pipeline

51 Penn ESE680-002 Fall2007 -- DeHon 51 Add Registers G n 1 11 1 1 0000 000

52 Penn ESE680-002 Fall2007 -- DeHon 52 Add Registers n 1 11 1 1 0000 000 G G-1/1

53 Penn ESE680-002 Fall2007 -- DeHon 53 Pipeline Retiming: Lag

54 Penn ESE680-002 Fall2007 -- DeHon 54 Pipelined Retimed

55 Penn ESE680-002 Fall2007 -- DeHon 55 Real Cycle

56 Penn ESE680-002 Fall2007 -- DeHon 56 Real Cycle

57 Penn ESE680-002 Fall2007 -- DeHon 57 Cycle C=1?

58 Penn ESE680-002 Fall2007 -- DeHon 58 Cycle C=2?

59 Penn ESE680-002 Fall2007 -- DeHon 59 Cycle: C-slow Cycle=c  C-slow network has Cycle=1

60 Penn ESE680-002 Fall2007 -- DeHon 60 2-slow Cycle  C=1

61 Penn ESE680-002 Fall2007 -- DeHon 61 2-Slow Lags

62 Penn ESE680-002 Fall2007 -- DeHon 62 2-Slow Retime

63 Penn ESE680-002 Fall2007 -- DeHon 63 Retimed 2-Slow Cycle

64 Penn ESE680-002 Fall2007 -- DeHon 64 C-Slow applicable? Available parallelism –solve C identical, independent problems Data-level parallelism e.g. process packets (blocks) separately e.g. independent regions in images Commutative operators –e.g. max example

65 Penn ESE680-002 Fall2007 -- DeHon 65 Max Example

66 Penn ESE680-002 Fall2007 -- DeHon 66 Max Example

67 Penn ESE680-002 Fall2007 -- DeHon 67 HSRA Retiming HSRA –adds mandatory pipelining to interconnect One additional twist –long, pipelined interconnect  need more than one register on paths

68 Penn ESE680-002 Fall2007 -- DeHon 68 Accommodating HSRA Interconnect Delays Add buffers to LUT  LUT path to match interconnect register requirements Retime to C=1 as before Buffer chains force enough registers to cover interconnect delays

69 Penn ESE680-002 Fall2007 -- DeHon 69 Accommodating HSRA Interconnect Delays

70 Penn ESE680-002 Fall2007 -- DeHon 70 Admin Retiming Assignment Due Wed. Reading for today includes retiming algorithm –(handed out last week) Retiming Structures on Wed. –(swap from original syllabus)

71 Penn ESE680-002 Fall2007 -- DeHon 71 Big Ideas [MSB Ideas] Retiming transformations important to –minimize cycles –efficiently utilize spatial architectures Optimally solvable in O(|V||E|) time Tells us –pipelining required –C-slow –where to move registers Can accommodate mandatory delays


Download ppt "Penn ESE680-002 Fall2007 -- DeHon 1 ESE680-002 (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations."

Similar presentations


Ads by Google