CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic9: Software Pipelining (Some slides from David A. Patterson’s CS252, Spring 2001 Lecture Slides)
CMPUT429/CMPE382 Amaral 1/17/01 Another possibility: Software Pipelining Observation: if iterations from loops are independent, then we can get more ILP by scheduling execution instructions from different iterations Software pipelining: reorganizes loops so that each iteration is made from instructions chosen from different iterations of the original loop
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example Before: Unrolled 3 times 1 L.DF0,0(R1) 2 ADD.DF4,F0,F2 3 S.D0(R1),F4 4 L.DF6,-8(R1) 5 ADD.DF8,F6,F2 6 S.D-8(R1),F8 7 L.DF10,-16(R1) 8 ADD.DF12,F10,F2 9 S.D-16(R1),F12 10 DSUBUIR1,R1,#24 11 BNEZR1,LOOP After: Software Pipelined 1 S.D0(R1),F4 ;Stores M[i] 2 ADD.DF4,F0,F2 ;Adds to M[i-1] 3 L.DF0,-16(R1);Loads M[i-2] 4 DSUBUIR1,R1,#8 5 BNEZR1,LOOP Symbolic Loop Unrolling – Maximize result-use distance – Less code space than unrolling – Fill & drain pipe only once per loop vs. once per each unrolled iteration in loop unrolling SW Pipeline Loop Unrolled overlapped ops Time 5 cycles per iteration
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example Before: Unrolled 3 times 1 L.DF0,0(R1) 2 ADD.DF4,F0,F2 3 S.D0(R1),F4 4 L.DF6,-8(R1) 5 ADD.DF8,F6,F2 6 S.D-8(R1),F8 7 L.DF10,-16(R1) 8 ADD.DF12,F10,F2 9 S.D-16(R1),F12 10 DSUBUIR1,R1,#24 11 BNEZR1,LOOP After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 F0F2F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 s X[1000]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED R1 T1 F0F2F4 s x[1000]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T1 F0F2F4 s x[999]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 T1 X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T1 F0F2F4 s x[999]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T2 F0F2F4 s x[999] +
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T2 F0F2F4 s x[998]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] X[999] X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T2 F0F2F4 s x[998]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example After: Software Pipelined L.DF0,0(R1) ADD.DF4,F0,F2 L.DF0,-8(R1) L:S.D0(R1),F4 ;Stores M[i] ADD.DF4,F0,F2 ;Adds to M[i-1] L.DF0,-16(R1); Loads M[i-2] DSUBUIR1,R1,#8 BNEZR1,L S.D-8(R1),F4 ADD.DF4,F0,F2 S.D-16(R1),F4 X[1000] T2 X[998] X[997]... 0xFF00 0xFEE8 0xFEE0 0xFED8... R1 T2 F0F2F4 s x[998]
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop General Registers (Physical) Predicate Registers 4 LC 3 EC x4 x5 x1 x2 x3 Memory General Registers (Logical) 0 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x General Registers (Physical) Predicate Registers 4 LC 3 EC x4 x5 x1 x2 x3 Memory General Registers (Logical) 0 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 4 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) 0 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 4 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) 0 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 4 LC 3 EC 1 x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 3 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 3 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 3 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2 y1 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 3 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2 y1 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 3 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2 y1 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 2 LC 3 EC 1 x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2 y1 -2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 2 LC 3 EC x4 x5 x1 x2 x3 Memory x General Registers (Physical) General Registers (Logical) x2y1x3 -2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop y Predicate Registers 2 LC 3 EC x4 x5 x1 x2 x3 Memory General Registers (Physical) General Registers (Logical) x2y1x3 -2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 2 LC 3 EC x4 x5 x1 x2 x3 y1 Memory y General Registers (Physical) General Registers (Logical) x2y1x3 -2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 2 LC 3 EC x4 x5 x1 x2 x3 y1 Memory y General Registers (Physical) General Registers (Logical) x2y1x3 -2 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 1 LC 3 EC 1 x4 x5 x1 x2 x3 y1 Memory -3 RRB y General Registers (Physical) General Registers (Logical) x2y1x3
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 1 LC 3 EC x4 x5 x1 x2 x3 y1 Memory -3 RRB y2 x General Registers (Physical) General Registers (Logical) x2y1x3
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 1 LC 3 EC x4 x5 x1 x2 x3 y1 Memory y2 x General Registers (Physical) General Registers (Logical) y3y1x3 -3 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA-64 loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop Predicate Registers 1 LC 3 EC x4 x5 x1 x2 x3 y1 y2 Memory y2 x General Registers (Physical) General Registers (Logical) y3y1x3 -3 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 1 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 Memory y2 x General Registers (Physical) General Registers (Logical) y3y1x3 -3 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop 1 x4 x5 x1 x2 x3 y1 y2 Memory -4 RRB y2 x General Registers (Physical) General Registers (Logical) y3y1x3
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 Memory y2 x5x General Registers (Physical) General Registers (Logical) y3y1x3 -4 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 Memory y2 x5x General Registers (Physical) General Registers (Logical) y3y1y4 -4 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 y3 Memory -4 RRB y2 x5x General Registers (Physical) General Registers (Logical) y3y1y4
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 3 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 y3 Memory y2 x5x General Registers (Physical) General Registers (Logical) y3y1y4 -4 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 2 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop 0 x4 x5 x1 x2 x3 y1 y2 y3 Memory y2 x5x General Registers (Physical) General Registers (Logical) y3y1y4 -5 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 2 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 y3 Memory y2 x5x General Registers (Physical) General Registers (Logical) y3y1y4 -5 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 2 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y1 y2 y3 Memory y2 x5y General Registers (Physical) General Registers (Logical) y3y1y4 -5 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 2 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y1 y2 y3 Memory y2 x5y General Registers (Physical) General Registers (Logical) y3y1y4 -5 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 2 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y1 y2 y3 Memory y2 x5y General Registers (Physical) General Registers (Logical) y3y1y4 -5 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop 0 x4 x5 x1 x2 x3 y4 y1 y2 y3 Memory y2 x5y General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y5 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y5 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 1 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop x4 x5 x1 x2 x3 y4 y5 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -6 RRB
CMPUT429/CMPE382 Amaral 1/17/01 Software Pipelining Example in the IA Predicate Registers 0 LC 0 EC loop: (p16)ldl r32 = [r12], 1 (p17)add r34 = 1, r33 (p18)stl [r13] = r35,1 br.ctop loop 0 x4 x5 x1 x2 x3 y4 y5 y1 y2 y3 Memory y2 x5y5 General Registers (Physical) General Registers (Logical) y3y1y4 -7 RRB