The Impact of Data Dependence Analysis on Compilation and Program Parallelization Original Research by Kleanthis Psarris & Konstantinos Kyriakopoulos Year of Publication: 2003 Presentation by Jamie Perkins
Data Dependence Analysis Key to optimization and detection of implicit parallelism in sequential code. Helps compiler improve memory, improve load balancing and determine efficient scheduling. Different test for data dependence provide different trade-offs. –Accuracy vs. Efficiency
About this research… Sun UltraSPARC-IIi with 440 MHz CPU and 512 Mbytes main memory. 2 different applications tested –Perfect Club Benchmarks –Lapack 4 different tests applied –Greatest Common Divisor Test (GCD) –Banerjee Test –I – Test –Omega Test
Polaris Compiler Developed at the University of Illinois at Urbana Champaign & Purdue University. Parallelizes Fortran 77 programs for execution on shared memory multiprocessors.
Applications Perfect Club Benchmark (PCB) –Collection of 13 scientific & engineering Fortran 77 programs. Lapack (LP) –A library of subroutines for solving linear algebra problems in Fortran 77.
Tests applied Greatest Common Divisor Test (GCD) –Based on theorem of elementary number theory. Banerjee Test –Based on the Intermediate Value Theorem. These two tests are applied together.
Tests Applied (cont.) I – Test –Based on & enhances the Banerjee test and the GCD test. –Adds “accuracy conditions” to the previous tests. Omega Test –Based on a combination of the Least Remainder Algorithm and Fourier-Motzkin Variable Elimination.
Data Dependence Problems for PCB Banerjee TestI -TestOmega Test KEY: INDEPENDENT DEPENDENT MAYBE ***100% is equal to 59936
Data Dependence Problems for LP Banerjee TestI -TestOmega Test KEY: INDEPENDENT DEPENDENT MAYBE ***100% is equal to 293,718
Avg. Cost per Data Dependence in PCB Time (msec)
Avg. Cost per Data Dependence in LP Time (msec)
Total Compilation Time Time in Minutes Perfect Club BenchmarkLapack Library Time in Minutes
Parallelizable Loops Number of Loops Perfect Club BenchmarkLapack Library
Execution Time Perfect Club Benchmark –Only 4 out of the 11 could be effectively parallelized. Lapack Library –Much better results, the execution time of 7 of the programs were cut in half.
Prog.TestSerial Time 2-p4-p6-p8-p Banerjee I-Test Omega Banerjee I-Test Omega OCEAN BDNA Perfect Club Benchmark
Prog.TestSerial Time 2-p4-p6-p8-p Banerjee I-Test Omega Banerjee I-Test Omega GEP EIN RECT LIN Lapack Library
Conclusions –Data dependence accuracy Depending on program differences, may not be substantial (PBC vs. LP). –Efficiency Often a trade-off (efficiency vs. accuracy), Omega proved more accurate at a high cost. –Effectiveness All 3 tests found similar number of parallelizable loops. –Execution Performance Again all three tests produced similar results in execution.
Thank You Any Questions?