Download presentation
Presentation is loading. Please wait.
1
Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA Fine Grain Incremental Rescheduling Via Architectural Retiming
2
RAM Offset Example Problem -- Clock period is too large Write Address Read Address
3
RAM Write Address Read Address Offset Pipelining Problems w/ consecutive dependent operations
4
Performance Bottleneck Latency constrained paths Latency constrained paths Latency = n
5
Performance Bottleneck Latency constrained paths Latency constrained paths Latency = n Approach Approach apply architectural retiming at the RT level
6
Problem: too much work, too little time Architectural Retiming ykyk
7
Problem: too much work, too little time D pipelineregister ykyk Architectural Retiming
8
N negative register Problem: too much work, too little time pipelineregister D C ykyk Architectural Retiming
9
N negative register Problem: too much work, too little time pipelineregister D C ykyk Architectural Retiming precomputation prediction
10
Outline Precomputation Precomputation incremental rescheduling without resource constraints incremental rescheduling without resource constraints Prediction Prediction incremental rescheduling with resource constraints incremental rescheduling with resource constraints Results Results
11
D t = C t+1 Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N
12
D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N
13
D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) x i t+1 = x´ i t = g (..., y k t,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N
14
f´ D t = C t+1 = f (..., x i t+1,... ) = f (..., x i t+1,... ) x i t+1 = x´ i t = g (..., y k t,... ) Precomputation Function h h h D C xixi f f g g ykyk x´x´ i N D t = f (..., g (..., y k t,... ),...) = f´(..., y k t,... ) = f´(..., y k t,... )
15
Incremental Rescheduling h h h f f g g ykyk Time n g Time n+1 f, h N
16
f´ Incremental Rescheduling h h h f f g g ykyk Time n g Time n+1 f, h N Time n f ’ Time n+1 h
17
Precomputing With Register Arrays Read Data Write Address Read Address Write Data Read Data
18
Precomputing With Register Arrays Write Address Read Address Write Data Read Data Out N F
19
Precomputing With Register Arrays F t = Out t+1 Write Address Read Address Write Data Read Data Out N F
20
Precomputing With Register Arrays F t = Out t+1 = Array t+1 [Read Address t+1 ] Write Address Read Address Write Data Read Data Out N F
21
Synthesizing Bypass Paths Write Address Precomputed Read Address Write Data Read Data = ? Write Address Read Address Write Data Read Data
22
Precomputing RAM Output RAM N
23
Prediction D C f f gigi Z N What if ? What if ? can’t precompute, can’t precompute, too many additional resources, or too many additional resources, or performance is unsatisfactory performance is unsatisfactory
24
Prediction D C f f gigi Z N What if ? What if ? can’t precompute, can’t precompute, too many additional resources, or too many additional resources, or performance is unsatisfactory performance is unsatisfactory Predict C one cycle before its arrival Predict C one cycle before its arrival
25
Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1c2 H h1h2
26
Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H Verify Negative Register c2 h1h2
27
Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H Verify Negative Register
28
Schedule with Mispredictions C H R1R2 t-1 t t+1 C c1 H h1 c1*=? c1 c1* Verify Negative Register c2* c2 h2 c2*=? c2 c2
29
Synthesis Issues in Prediction Negative register as predicting FSM Negative register as predicting FSM use signal transition probabilities use signal transition probabilities incorporate don’t care conditions incorporate don’t care conditions Nullifying mispredictions Nullifying mispredictions Two correction strategies Two correction strategies As-Soon-As-Possible restoration As-Late-As-Possible correction Add handshaking signals to coordinate with interface Add handshaking signals to coordinate with interface
30
Related Work Precomputation Precomputation Bypass Synthesis Bypass Synthesis lookahead [Kogge ‘81, …..] lookahead [Kogge ‘81, …..] Prediction / Speculative Execution Prediction / Speculative Execution Most likely path, arbitrarily deep [Holtmann & Ernst ‘93,’95] Most likely path, arbitrarily deep [Holtmann & Ernst ‘93,’95] Pre-execution [Radivojevic & Brewer ‘94] Pre-execution [Radivojevic & Brewer ‘94] Possible multiple paths & arbitrarily deep [Lakshminarayana et al. ‘98] Possible multiple paths & arbitrarily deep [Lakshminarayana et al. ‘98] Percolation scheduling [Potasman et al. ‘90] Percolation scheduling [Potasman et al. ‘90]
31
Results
32
Architectural Retiming Improves throughput while preserving functionality and sometimes latency Improves throughput while preserving functionality and sometimes latency Bridge gap between HLS and logic optimizations Bridge gap between HLS and logic optimizations Unifies several sequential optimizations Unifies several sequential optimizations bypass synthesis bypass synthesis lookahead transformation lookahead transformation branch prediction branch prediction fine-grain cross register optimizations fine-grain cross register optimizations
33
Ph.D. Forum at DAC ‘99 Goal Goal increase interaction between academia and industry increase interaction between academia and industry Format Format students present work at poster session at DAC students present work at poster session at DAC researchers give feedback researchers give feedback Who’s eligible? Who’s eligible? Students within 1 or 2 years of finishing Ph.D. thesis Students within 1 or 2 years of finishing Ph.D. thesis www.cs.washington.edu/homes/soha/forum
34
The End
35
Precomputing in Single-Register Cycles Original Circuit BA
36
Precomputing in Single-Register Cycles Original Circuit N BA
37
Precomputing in Single-Register Cycles Lookahead -- A(n) is a function of B(n-2) N BA A' BA B' [Kogge, ‘81], [Parhi & Messerschmidtt, ‘89]
38
Precomputing RAM Output RAM
39
Precomputing RAM Output RAM
40
Speculative Execution c1 c2 c3 c4 c5 c6 Scope and Depth
41
Speculative Execution Scope and Depth
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.