Interprocedural Symbolic Range Propagation for Optimizing Compilers Hansang Bae and Rudolf Eigenmann Purdue University 2005. 10. 22
Interprocedural Symbolic Range Propagation Outline Motivation Symbolic Range Propagation Interprocedural Symbolic Range Propagation Experiments Conclusions 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation Motivation Symbolic analysis is key to static analysis X=Y+1 better than X=? Range analysis has been successful X=[0,10] better than X=? Relevant questions How much can we achieve interprocedurally? Can interprocedural range propagation outperform other alternatives? 4/8/2019 Interprocedural Symbolic Range Propagation
Symbolic Range Propagation (Background) Has been effective for compiler analyses Abstract interpretation Provides lower/upper bounds for variables Sources of information Variable definition IF conditionals Loop variables Intersect with new source, union at merge 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation SRP – Example Example Code Ranges X=[-INF,INF] X=[1,1] X=[2,2] X=[3,3] X=[2,3] X = 1 IF (X.LE.N) THEN X = 2*X ELSE X = X+2 ENDIF … 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation (ISRP) Propagates ranges across procedure calls Collects ISR at important program points Entry to a subroutine After a call site SRP as the source of information Iterative algorithm Context sensitivity by procedure-cloning 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation ISRP – Terminology Symbolic Range Mapping from a variable to its value range, V = [LB, UB], where LB and UB are expressions Interprocedural Symbolic Range Symbolic Range valid at relevant program points - subroutine entries and call sites (forward/backward) Jump Function Set of symbolic ranges expressed in terms of input variables to a called subroutine (actual parameters, global variables) Return Jump Function terms of return variables to a calling subroutine (formal parameters, global Caller Jump Function Backward ISR Forward ISR Return Jump Function Callee 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation ISRP – Algorithm Propagate_Interprocedural_Ranges() { Initialize_Call_Graph() while (there is any change in ISR) { foreach Subroutine (bottom-up) { Get_Backward_Interprocedural_Ranges() Compute_Jump_Functions() Compute_Return_Jump_Functions() } Get_Forward_Interprocedural_Ranges() 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation ISRP – Algorithm Propagate_Interprocedural_Ranges() Initialize_Call_Graph() while (there is any change in ISR) { foreach Subroutine (bottom-up) { Get_Backward_Interprocedural_Ranges() Compute_Jump_Functions() Compute_Return_Jump_Functions() } Get_Forward_Interprocedural_Ranges() Get_Backward_Interprocedural_Ranges() Transforms return jump functions to ISRs Does nothing for leaf nodes of the call graph Compute_Jump_Functions() Computes intraprocedural ranges Discards non-input-variables to the callee Compute_Return_Jump_Functions() Discards non-return-variables to the caller Get_Forward_Interprocedural_Ranges() Transforms jump functions to ISRs Clone procedures if necessary Keeps track of any changes 4/8/2019 Interprocedural Symbolic Range Propagation
ISRP – Example (1st iteration) foreach subroutine (bottom-up) Get_Backward_ISRs Compute_Jump_Functions Compute_Return_Jump_Functions Get_Forward_ISRs (for call graph) X=[1] X=[1],Y=[2] N=[10,40] N=[10,40],T=[U+N] V=[W+M] PROGRAM MAIN INTEGER X, Y X = 1 Y = 2 α CALL A(X, Y) END SUBROUTINE A(T, U) INTEGER N, T, U DO N = 10, 40 β CALL B(T, U, N) ENDDO SUBROUTINE B(V, W, M) INTEGER M, V, W V = W + M J :X=[1],Y=[2] ISR:T=[1],U=[2] J :N=[10,40] ISR:T=[U+N] ISR:M=[10,40] RJ :V=[W+M] 4/8/2019 Interprocedural Symbolic Range Propagation
ISRP – Example (2nd iteration) foreach subroutine (bottom-up) Get_Backward_ISRs Compute_Jump_Functions Compute_Return_Jump_Functions Get_Forward_ISRs (for call graph) X=[1] X=[1],Y=[2] Y=[2] T=[1],U=[2] T=[1],U=[2],N=[10,40] N=[10,40],U=[2] U=[2] M=[10,40] M=[10,40],V=[W+M] PROGRAM MAIN INTEGER X, Y X = 1 Y = 2 α CALL A(X, Y) END SUBROUTINE A(T, U) INTEGER N, T, U DO N = 10, 40 β CALL B(T, U, N) ENDDO SUBROUTINE B(V, W, M) INTEGER M, V, W V = W + M J :X=[1],Y=[2] ISR:Y=[2] ISR:T=[1],U=[2] J :U=[2],N=[10,40] ISR:N=[10,40],T=[U+N] RJ :U=[2] ISR:M=[10,40] RJ :M=[10,40],V=[W+M] ISR:T=[1],U=[2] ISR:M=[10,40],W=[2] 4/8/2019 Interprocedural Symbolic Range Propagation
ISRP – Example (3rd iteration) foreach subroutine (bottom-up) Get_Backward_ISRs Compute_Jump_Functions Compute_Return_Jump_Functions Get_Forward_ISRs (for call graph) X=[1] X=[1],Y=[2] Y=[2] T=[1],U=[2] T=[1],U=[2],N=[10,40] N=[10,40],U=[2] U=[2] M=[10,40],W=[2] M=[10,40],W=[2],V=[W+M] J :X=[1],Y=[2] ISR:Y=[2] ISR:T=[1],U=[2] J :U=[2],N=[10,40] ISR:N=[10,40],U=[2],T=[U+N] RJ :U=[2] ISR:M=[10,40],W=[2] RJ :M=[10,40],W=[2],V=[W+M] PROGRAM MAIN INTEGER X, Y X = 1 Y = 2 α CALL A(X, Y) END SUBROUTINE A(T, U) INTEGER N, T, U DO N = 10, 40 β CALL B(T, U, N) ENDDO SUBROUTINE B(V, W, M) INTEGER M, V, W V = W + M ISR:T=[1],U=[2] ISR:M=[10,40],W=[2] 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation Experiments Efficacy of ISRP for an optimizing compiler (Polaris) Test elision and dead-code elimination Data dependence analysis Other optimizations for parallelization 21 Fortran codes from SPEC CFP and Perfect Best available optimizations in Polaris as Base Interprocedural expression propagation Forward substitution Intraprocedural symbolic range propagation Automatic partial inlining 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation Result – Test Elision Codes Base ISRP ARC2D BDNA DYFESM FLO52Q MDG MIGRATION OCEAN QCD2 SPEC77 TRACK TRFD 4 15 18 2 67 3 5 26 1 6 72 10 8 applu apsi fpppp hydro2d mgrid su2cor swim tomcatv turb3d wupwise TOTAL 12 9 7 19 176 27 242 ISRP found more or same number of cases for 20 codes Base made an aggressive decision with hard-wired test elision for fpppp 4/8/2019 Interprocedural Symbolic Range Propagation
Result – Data Dependence Analysis Codes Base ISRP ARC2D BDNA DYFESM FLO52Q MDG MIGRATION OCEAN QCD2 SPEC77 TRACK TRFD 801 1124 1388 263 1120 7180 433 17257 2392 2003 93 459 1081 1108 279 908 6219 370 3579 1652 1781 40 applu apsi fpppp hydro2d mgrid su2cor swim tomcatv turb3d wupwise TOTAL 677 11395 22064 110 83 10915 45 371 2013 81727 8222 20006 66 19 8712 342 1071 56636 ISRP disproved more data dependences for 20 codes Base benefits from forward substitution for FLO52Q Data dependence analysis benefits from other improved optimizations 4/8/2019 Interprocedural Symbolic Range Propagation
Result – Other Optimizations IF (((-num)+(-num**2))/2.LE.0.AND.(-num).LE.0) THEN ALLOCATE (xrsiq00(1:morb, 1:num, 1:numthreads)) !$OMP PARALLEL !$OMP+IF(6+((-1)*num+(-1)*num**2)/2.LE.0) !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(MY_CPU_ID,MRS,MRSIJ1,MI0,MJ0,VAL,MQ,MP,XIJ00,MI,MJ) my_cpu_id = omp_get_thread_num()+1 !$OMP DO DO mrs = 1, (num*(1+num))/2, 1 IF ((num*(1+num))/2.NE.mrs) THEN DO mq = 1, num, 1 DO mi0 = 1, num, 1 10 CONTINUE xrsiq00(mi0, mq, my_cpu_id) = zero ENDDO ENDDO DO mp = 1, num, 1 DO mq = 1, mp, 1 val = xrspq(mq+(mp**2+(-mp)+(-num)+(-num**2)+mrs*num+mrs*num* **2)/2) IF (zero.NE.val) THEN DO mi0 = 1, num, 1 xrsiq00(mi0, mq, my_cpu_id) = xrsiq00(mi0, mq, my_cpu_id)+v *(mp, mi0)*val xrsiq00(mi0, mp, my_cpu_id) = xrsiq00(mi0, mp, my_cpu_id)+v *(mq, mi0)*val 20 CONTINUE ENDDO ENDIF 30 CONTINUE ENDDO 40 CONTINUE ENDDO mrsij1 = ((-num)+(-num**2)+mrs*num+mrs*num**2)/2 DO mi0 = 1, num, 1 DO mj0 = 1, mi0, 1 50 CONTINUE xij00(mj0) = zero ENDDO DO mq = 1, num, 1 val = xrsiq00(mi0, mq, my_cpu_id) IF (zero.NE.val) THEN DO mj0 = 1, mi0, 1 60 CONTINUE xij00(mj0) = xij00(mj0)+v(mq, mj0)*val ENDDO ENDIF 70 CONTINUE ENDDO DO mj0 = 1, mi0, 1 80 CONTINUE xrsij(mj0+(mi0**2+(-mi0))/2+((-num)+(-num**2)+mrs*num+mrs*num ***2)/2) = xij00(mj0) ENDDO 90 CONTINUE ENDDO 100 CONTINUE ELSE DO mq = 1, num, 1 DO mi = 1, num, 1 316 CONTINUE xrsiq(mi, mq) = zero ENDDO ENDDO DO mp = 1, num, 1 DO mq = 1, mp, 1 val = xrspq(mq+(mp**2+(-mp)+(-num)+(-num**2)+mrs*num+mrs*num* **2)/2) IF (zero.NE.val) THEN DO mi = 1, num, 1 xrsiq(mi, mq) = xrsiq(mi, mq)+v(mp, mi)*val xrsiq(mi, mp) = xrsiq(mi, mp)+v(mq, mi)*val 317 CONTINUE ENDDO ENDIF 318 CONTINUE ENDDO 319 CONTINUE ENDDO mrsij1 = ((-num)+(-num**2)+mrs*num+mrs*num**2)/2 DO mi = 1, num, 1 DO mj = 1, mi, 1 320 CONTINUE xij(mj) = zero ENDDO DO mq = 1, num, 1 val = xrsiq(mi, mq) IF (zero.NE.val) THEN DO mj = 1, mi, 1 321 CONTINUE xij(mj) = xij(mj)+v(mq, mj)*val ENDDO ENDIF 322 CONTINUE ENDDO DO mj = 1, mi, 1 323 CONTINUE xrsij(mj+(mi**2+(-mi))/2+((-num)+(-num**2)+mrs*num+mrs*num**2 *)/2) = xij(mj) ENDDO 324 CONTINUE ENDDO 325 CONTINUE ENDIF ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL DEALLOCATE (xrsiq00) ELSE DO mrs = 1, (num*(1+num))/2, 1 !$OMP PARALLEL !$OMP+IF(6+(-1)*num.LE.0) !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(MI) !$OMP DO DO mq = 1, num, 1 DO mi = 1, (num), 1 306 CONTINUE xrsiq(mi, mq) = zero ENDDO ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL DO mp = 1, num, 1 DO mq = 1, mp, 1 mrspq = 1+mrspq val = xrspq(mrspq) IF (zero.NE.val) THEN !$OMP PARALLEL !$OMP+IF(6+(-1)*num.LE.0) !$OMP+DEFAULT(SHARED) !$OMP DO DO mi = 1, (num), 1 xrsiq(mi, mq) = xrsiq(mi, mq)+v(mp, mi)*val xrsiq(mi, mp) = xrsiq(mi, mp)+v(mq, mi)*val 307 CONTINUE ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL ENDIF 308 CONTINUE ENDDO 309 CONTINUE ENDDO mrsij = mrsij0 DO mi = 1, (num), 1 !$OMP PARALLEL !$OMP+IF(6+(-1)*mi.LE.0) !$OMP+DEFAULT(SHARED) !$OMP DO DO mj = 1, mi, 1 310 CONTINUE xij(mj) = zero ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL ALLOCATE (xij1(1:mi, 1:numthreads)) !$OMP PARALLEL !$OMP+IF(6+(-1)*num.LE.0) !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(MY_CPU_ID,MQ,TPINIT,VAL,MJ) my_cpu_id = omp_get_thread_num()+1 DO tpinit = 1, mi, 1 xij1(tpinit, my_cpu_id) = 0.0 ENDDO !$OMP DO DO mq = 1, num, 1 val = xrsiq(mi, mq) IF (zero.NE.val) THEN DO mj = 1, mi, 1 311 CONTINUE xij1(mj, my_cpu_id) = xij1(mj, my_cpu_id)+v(mq, mj)*val ENDDO ENDIF 312 CONTINUE ENDDO !$OMP END DO NOWAIT !$OMP CRITICAL DO tpinit = 1, mi, 1 xij(tpinit) = xij(tpinit)+xij1(tpinit, my_cpu_id) ENDDO !$OMP END CRITICAL !$OMP END PARALLEL DEALLOCATE (xij1) DO mj = 1, mi, 1 mrsij = mrsij+1 313 CONTINUE xrsij(mrsij) = xij(mj) ENDDO 314 CONTINUE ENDDO mrsij0 = mrsij0+(num*(num+1))/2 315 CONTINUE ENDDO ENDIF !$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(MRSIJ1,MI0,MJ0,VAL,MQ,MP,XIJ00,XRSIQ00) !$OMP DO DO mrs = 1, (num+num**2)/2, 1 IF ((num+num**2)/2.NE.mrs) THEN DO mq = 1, num, 1 DO mi0 = 1, num, 1 10 CONTINUE xrsiq00(mi0, mq) = zero ENDDO ENDDO DO mp = 1, num, 1 DO mq = 1, mp, 1 val = xrspq(mq+(mp**2+(-mp)+(-num)+(-num**2)+mrs*num+mrs*num** *2)/2) IF (zero.NE.val) THEN DO mi0 = 1, num, 1 xrsiq00(mi0, mq) = xrsiq00(mi0, mq)+v(mp, mi0)*val xrsiq00(mi0, mp) = xrsiq00(mi0, mp)+v(mq, mi0)*val 20 CONTINUE ENDDO ENDIF 30 CONTINUE ENDDO 40 CONTINUE ENDDO mrsij1 = ((-num)+(-num**2)+mrs*num+mrs*num**2)/2 DO mi0 = 1, num, 1 DO mj0 = 1, mi0, 1 50 CONTINUE xij00(mj0) = zero ENDDO DO mq = 1, num, 1 val = xrsiq00(mi0, mq) IF (zero.NE.val) THEN DO mj0 = 1, mi0, 1 60 CONTINUE xij00(mj0) = xij00(mj0)+v(mq, mj0)*val ENDDO ENDIF 70 CONTINUE ENDDO DO mj0 = 1, mi0, 1 80 CONTINUE xrsij(mj0+(mi0**2+(-mi0)+(-num)+(-num**2)+mrs*num+mrs*num**2)/ *2) = xij00(mj0) ENDDO 90 CONTINUE ENDDO 100 CONTINUE ELSE DO mq = 1, num, 1 DO mi = 1, num, 1 306 CONTINUE xrsiq(mi, mq) = zero ENDDO ENDDO DO mp = 1, num, 1 DO mq = 1, mp, 1 val = xrspq(mq+(mp**2+(-mp)+(-num)+(-num**2)+mrs*num+mrs*num** *2)/2) IF (zero.NE.val) THEN DO mi = 1, num, 1 xrsiq(mi, mq) = xrsiq(mi, mq)+v(mp, mi)*val xrsiq(mi, mp) = xrsiq(mi, mp)+v(mq, mi)*val 307 CONTINUE ENDDO ENDIF 308 CONTINUE ENDDO 309 CONTINUE ENDDO mrsij1 = ((-num)+(-num**2)+mrs*num+mrs*num**2)/2 DO mi = 1, num, 1 DO mj = 1, mi, 1 310 CONTINUE xij(mj) = zero ENDDO DO mq = 1, num, 1 val = xrsiq(mi, mq) IF (zero.NE.val) THEN DO mj = 1, mi, 1 311 CONTINUE xij(mj) = xij(mj)+v(mq, mj)*val ENDDO ENDIF 312 CONTINUE ENDDO DO mj = 1, mi, 1 313 CONTINUE xrsij(mj+(mi**2+(-mi)+(-num)+(-num**2)+mrs*num+mrs*num**2)/2) *= xij(mj) ENDDO 314 CONTINUE ENDDO 315 CONTINUE ENDIF ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL Result – Other Optimizations Induction Variable Substitution Reduction Translation “Yes” to the questions helps the compiler generate better codes ISRP helped the compiler make better decisions for 5 codes 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation Conclusions Interprocedural analysis of symbolic ranges Based on intraprocedural analysis Iterative algorithm ISRP enhances other optimizations Compilation time increases up to 150% Exceptions: OCEAN and TRACK 4/8/2019 Interprocedural Symbolic Range Propagation
Interprocedural Symbolic Range Propagation Thank you. 4/8/2019 Interprocedural Symbolic Range Propagation