Presentation is loading. Please wait.

Presentation is loading. Please wait.

Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural.

Similar presentations


Presentation on theme: "Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural."— Presentation transcript:

1 Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural Retiming

2 RAM Offset Example Problem -- Clock period is too large Write Address Read Address

3 RAM Write Address Read Address Offset Pipelining Problems w/ consecutive dependent operations

4 Performance Bottleneck Latency constrained paths Latency constrained paths Latency = n

5 Performance Bottleneck Latency constrained paths Latency constrained paths Latency = n Approach Approach apply architectural retiming at the RT level

6 Problem: too much work, too little time Architectural Retiming ykyk

7 Problem: too much work, too little time D pipelineregister ykyk Architectural Retiming

8 N negative register Problem: too much work, too little time pipelineregister D C ykyk Architectural Retiming

9 N negative register Problem: too much work, too little time pipelineregister D C ykyk Architectural Retiming precomputation prediction

10 Outline Precomputation Precomputation incremental rescheduling without resource constraints incremental rescheduling without resource constraints Prediction Prediction incremental rescheduling with resource constraints incremental rescheduling with resource constraints Results Results

11 D t = C t+1 Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N

12 D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N

13 D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) x i t+1 = x´ i t = g (..., y k t,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N

14 f´ D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) x i t+1 = x´ i t = g (..., y k t,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N D t = f (..., g (..., y k t,... ),...) = f´(..., y k t,... ) = f´(..., y k t,... )

15 Incremental Rescheduling h h h f f g g ykyk Time n g Time n+1 f, h N

16 f´ Incremental Rescheduling h h h f f g g ykyk Time n g Time n+1 f, h N Time n f ’ Time n+1 h

17 Precomputing With Register Arrays Read Data Write Address Read Address Write Data Read Data

18 Precomputing With Register Arrays Write Address Read Address Write Data Read Data Out N F

19 Precomputing With Register Arrays F t = Out t+1 Write Address Read Address Write Data Read Data Out N F

20 Precomputing With Register Arrays F t = Out t+1 = Array t+1 [Read Address t+1 ] Write Address Read Address Write Data Read Data Out N F

21 Synthesizing Bypass Paths Write Address Precomputed Read Address Write Data Read Data = ? Write Address Read Address Write Data Read Data

22 Precomputing RAM Output RAM N

23 Prediction D C f f gigi Z N What if ? What if ? can’t precompute, can’t precompute, too many additional resources, or too many additional resources, or performance is unsatisfactory performance is unsatisfactory

24 Prediction D C f f gigi Z N What if ? What if ? can’t precompute, can’t precompute, too many additional resources, or too many additional resources, or performance is unsatisfactory performance is unsatisfactory Predict C one cycle before its arrival Predict C one cycle before its arrival

25 Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1c2 H h1h2 

26 Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H  Verify Negative Register c2 h1h2

27 Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H  Verify Negative Register

28 Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H  h1 c1*=? c1 c1* Verify Negative Register c2* c2 h2 c2*=? c2 c2

29 Synthesis Issues in Prediction Negative register as predicting FSM Negative register as predicting FSM use signal transition probabilities use signal transition probabilities incorporate don’t care conditions incorporate don’t care conditions Nullifying mispredictions Nullifying mispredictions Two correction strategies Two correction strategies As-Soon-As-Possible restoration As-Late-As-Possible correction Add handshaking signals to coordinate with interface Add handshaking signals to coordinate with interface

30 Related Work Precomputation Precomputation Bypass Synthesis Bypass Synthesis lookahead [Kogge ‘81, …..] lookahead [Kogge ‘81, …..] Prediction / Speculative Execution Prediction / Speculative Execution Most likely path, arbitrarily deep [Holtmann & Ernst ‘93,’95] Most likely path, arbitrarily deep [Holtmann & Ernst ‘93,’95] Pre-execution [Radivojevic & Brewer ‘94] Pre-execution [Radivojevic & Brewer ‘94] Possible multiple paths & arbitrarily deep [Lakshminarayana et al. ‘98] Possible multiple paths & arbitrarily deep [Lakshminarayana et al. ‘98] Percolation scheduling [Potasman et al. ‘90] Percolation scheduling [Potasman et al. ‘90]

31 Results

32 Architectural Retiming Improves throughput while preserving functionality and sometimes latency Improves throughput while preserving functionality and sometimes latency Bridge gap between HLS and logic optimizations Bridge gap between HLS and logic optimizations Unifies several sequential optimizations Unifies several sequential optimizations bypass synthesis bypass synthesis lookahead transformation lookahead transformation branch prediction branch prediction fine-grain cross register optimizations fine-grain cross register optimizations

33 Ph.D. Forum at DAC ‘99 Goal Goal increase interaction between academia and industry increase interaction between academia and industry Format Format students present work at poster session at DAC students present work at poster session at DAC researchers give feedback researchers give feedback Who’s eligible? Who’s eligible? Students within 1 or 2 years of finishing Ph.D. thesis Students within 1 or 2 years of finishing Ph.D. thesis www.cs.washington.edu/homes/soha/forum

34 The End

35 Precomputing in Single-Register Cycles Original Circuit BA

36 Precomputing in Single-Register Cycles Original Circuit N BA

37 Precomputing in Single-Register Cycles Lookahead -- A(n) is a function of B(n-2) N BA A' BA B' [Kogge, ‘81], [Parhi & Messerschmidtt, ‘89]

38 Precomputing RAM Output RAM

39 Precomputing RAM Output RAM

40 Speculative Execution c1 c2 c3 c4 c5 c6 Scope and Depth

41 Speculative Execution Scope and Depth


Download ppt "Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural."

Similar presentations


Ads by Google