Chapter 5 Unfolding
Definitions Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration. Also known as (a.k.a.) Loop unrolling (in compilers for parallel programs) Block processing Applications Reducing sampling period to achieve iteration bound (desired throughput rate) T. Parallel (block processing) to execute several iterations concurrently. Digit-serial or bit-serial processing (C) 1997-2006 by Yu Hen Hu
An example Block processing formulation J = 3, 9/J = 3 (an integer) X(k) = [x(3k) x(3k+1) x(3k+2)]T Y(k) = [y(3k) y(3k+1) y(3k+2)]T Y(k) = a*Y(k- 3 ) + X(k) J = 2, 9/J = 5 (not an integer) X(k) = [x(2k) x(2k+1)]T Y(k) = [y(2k) y(2k+1)]T Y(k) = a*Y(k- 5 ) + X(k) Before unfolding: For n = 0 to N-1, y(n)=a*y(n-9)+x(n) end Unfolding once (J = 2) For k = 0 to N/2-1, y(2k)=a*y(2k-9)+x(2k) y(2k+1)=a*y(2k-8)+x(2k+1) Unfolding twice (J = 3) For k = 0 to N/3-1, y(3k)=a*y(3k-9)+x(3k) y(3k+1)=a*y(3k-8)+x(3k+1) y(3k+2)=a*y(3k-7)+x(3k+2) (C) 1997-2006 by Yu Hen Hu
Implementation with J=3 3Ts Serial-to-parallel conversion parallel-to-Serial conversion Ts y(0) y(1) y(2) y(3) y(4) y(5) . Ts + X D + X D x(0) x(1) x(2) x(3) x(4) x(5) . + X D (C) 1997-2006 by Yu Hen Hu
Unfolding the DFG Rewrite the algorithm formulation: y(2k)=a*y(2k-9)+x(2k) y(2k+1)=a*y(2k-8)+x(2k+1) y(2k)=a*y(2(k-5)+1)+x(2k) y(2k+1)=a*y(2(k-4))+x(2k+1) After J-folded unfolding, the clock period T = J Ts, where Ts is the data sampling period. T=Ts T=J Ts (C) 1997-2006 by Yu Hen Hu
Timing Diagram y(0) y(1) y(2) y(3) y(4) y(5) y(6) y(7) y(8) y(9) y(10) y(11) y(12) y(13) 9 T T=Ts 9 T T=2Ts y(0) y(2) y(4) y(6) y(8) y(10) y(12) 4T 5T y(1) y(3) y(5) y(7) y(9) y(11) y(13) Above timing diagram is obtained assuming that the sampling period Ts remains unchanged. Thus, the clock period T is increased J-fold. Since 9/2 is not an integer, output (y(0), y(1)) will be needed by two different future iterations, 4T and 5T later. (C) 1997-2006 by Yu Hen Hu
General DFG Unfolding Method Define Step 1. For each node U in original DFG, draw J nodes {Ui; 0 iJ-1} in the unfolded DFG Step 2. For each edge from U to V with w delays, draw J edges from Ui to V(i+w)%J with (i+w)/J delays (C) 1997-2006 by Yu Hen Hu
Another DFG Unfolding Example J=2 S0 i w (i+w)%J 2 1 3 Q0 T0 S R0 Q T 3D 2D S1 R Q1 T1 T=3 R1 Step 1. Duplicate J copies of each node (C) 1997-2006 by Yu Hen Hu
Another DFG Unfolding Example J=2 S0 i w (i+w)%J 2 1 3 Q0 T0 S R0 Q T 3D 2D S1 R Q1 T1 T=3 R1 Step 2. Add all edges with 0 delay on them. (C) 1997-2006 by Yu Hen Hu
Another DFG Unfolding Example J=2 S0 i w (i+w)%J 2 1 3 Q0 T0 S D R0 Q T 2D D 3D 2D S1 R Q1 T1 T=3 D R1 Step 3. Use table on the left to figure out edges with delays. T=6 (C) 1997-2006 by Yu Hen Hu
Properties of Unfolding Unfolding preserves the number of registers (delays) in a DFG For a loop with w delays in a DFG that has been unfolded J times, it leads to g.c.d.(w, J) loops in the unfolded DFG, with each of these loops containing w/(g.c.d.(w,J)) delays and J/(g.c.d.(w,J)) copies of each node that appear in the original loop. Unfolding a DFG with iteration bound T results in a J-folded DFG with iteration bound JT. A path with w (< J) delays in a DFG will lead to J-w paths with no delays, and w paths with 1 delay each in the J-unfolded DFG. Any path in the original DFG containing J or more delays leads to J paths with 1 or more delay in each path. Therefore, it can not create a critical path in the J-unfolded DFG Any clock period that can be achieved by retiming a J-unfolded DFG can be achieved by retiming the original DFG and followed by J-unfolding. (C) 1997-2006 by Yu Hen Hu