The Impact of Data Dependence Analysis on Compilation and Program Parallelization Original Research by Kleanthis Psarris & Konstantinos Kyriakopoulos Year.

Slides:

Advertisements

Similar presentations

Public Key Cryptography The RSA Cryptosystem. by William M. Faucette Department of Mathematics State University of West Georgia.

Advertisements

Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

On the Interaction of Tiling and Automatic Parallelization Zhelong Pan, Brian Armstrong, Hansang Bae Rudolf Eigenmann Purdue University, ECE

EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.

The I-Test Manish Gupta Santosh Kumar Soumitra Pal Group - 6.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

1 JuliusC A practical Approach to Analyze Divide-&-Conquer Algorithms Speaker: Paolo D'Alberto Authors: D'Alberto & Nicolau Information & Computer Science.

1cs542g-term Notes  Assignment 1 will be out later today (look on the web)

Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.

1cs542g-term Notes  Assignment 1 is out (questions?)

Parallel and Cluster Computing 1. 2 Optimising Compilers u The main specific optimization is loop vectorization u The compilers –Try to recognize such.

Introduction CS 524 – High-Performance Computing.

Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.

EECC551 - Shaaban #1 Winter 2003 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.

EECC551 - Shaaban #1 Winter 2002 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.

7.1 Systems of Linear Equations: Two Equations Containing Two Variables.

Drill Solve the linear system by substitution. 1.y = 6x – 11 -2x – 3y = x + y = 6 -5x – y = 21.

ALGEBRA II SOLUTIONS OF SYSTEMS OF LINEAR EQUATIONS.

Identifying Reversible Functions From an ROBDD Adam MacDonald.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Lecture6 Recursion function © by Pearson Education, Inc. All Rights Reserved. 1.

Scientific Method 5 Steps. 1.Define the problem Starting point for using the scientific method is to define or identify the problem.

GPU Architecture and Programming

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Advanced Computer Architecture and Parallel Processing Rabie A. Ramadan http:

Systems of Linear Equations The Substitution Method.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

MAT 320 Spring 2008 Section 1.2.  Start with two integers for which you want to find the GCD. Apply the division algorithm, dividing the smaller number.

Enhancing the Role of Inlining in Effective Interprocedural Parallelization Jichi Guo, Mike Stiles Qing Yi, Kleanthis Psarris.

Solving by Substitution Method or Elimination (Addition) Method

3-2 Solving Linear Systems Algebraically Objective: CA 2.0: Students solve system of linear equations in two variables algebraically.

Elimination Method: Solve the linear system. -8x + 3y=12 8x - 9y=12.

FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.

ALGEBRA READINESS LESSON 4-2 Warm Up Lesson 4-2 Warm Up.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

Bell Ringer 2. Systems of Equations 4 A system of equations is a collection of two or more equations with a same set of unknowns A system of linear equations.

Synthetic Division 1 March Synthetic Division A trick for dividing polynomials Helps us solve for the roots of polynomials Only works when we divide.

Lesson 4-2: Solving Systems – Substitution & Linear Combinations Objectives: Students will: Solve systems of equations using substitution and linear combinations.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

Quiz1 Question  Add Binary Numbers a) b) c) d) e) none

Algebra Review. Systems of Equations Review: Substitution Linear Combination 2 Methods to Solve:

OBJ: Solve Linear systems graphically & algebraically Do Now: Solve GRAPHICALLY 1) y = 2x – 4 y = x - 1 Do Now: Solve ALGEBRAICALLY *Substitution OR Linear.

Elimination Method - Systems. Elimination Method  With the elimination method, you create like terms that add to zero.

6) x + 2y = 2 x – 4y = 14.

Classifying Systems, Solving Systems by Graphing and Substitution

10.1 SYSTEMS OF LINEAR EQUATIONS: SUBTRACTION, ELIMINATION.

Greatest Common Divisor

Greatest Common Divisor

Parallel Programming By J. H. Wang May 2, 2017.

Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)

ACCELERATING SPARSE CHOLESKY FACTORIZATION ON GPUs

Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:

Parallel Programming in C with MPI and OpenMP

6-3 Bellwork Solve each system by using substitution

Solving Linear Systems by Linear Combinations

Solving Linear Systems

Synthetic Division 1 March 2011.

Notes Solving a System by Elimination

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Solve the linear system.

Synthetic Division 1 March 2011.

3.2 Solving Linear Systems Algebraically

Solution methods for NP-hard Discrete Optimization Problems

Practical Issues Finding an initial feasible solution Cycling

Algoritmos y Programacion

Presentation transcript:

The Impact of Data Dependence Analysis on Compilation and Program Parallelization Original Research by Kleanthis Psarris & Konstantinos Kyriakopoulos Year of Publication: 2003 Presentation by Jamie Perkins

Data Dependence Analysis Key to optimization and detection of implicit parallelism in sequential code. Helps compiler improve memory, improve load balancing and determine efficient scheduling. Different test for data dependence provide different trade-offs. –Accuracy vs. Efficiency

About this research… Sun UltraSPARC-IIi with 440 MHz CPU and 512 Mbytes main memory. 2 different applications tested –Perfect Club Benchmarks –Lapack 4 different tests applied –Greatest Common Divisor Test (GCD) –Banerjee Test –I – Test –Omega Test

Polaris Compiler Developed at the University of Illinois at Urbana Champaign & Purdue University. Parallelizes Fortran 77 programs for execution on shared memory multiprocessors.

Applications Perfect Club Benchmark (PCB) –Collection of 13 scientific & engineering Fortran 77 programs. Lapack (LP) –A library of subroutines for solving linear algebra problems in Fortran 77.

Tests applied Greatest Common Divisor Test (GCD) –Based on theorem of elementary number theory. Banerjee Test –Based on the Intermediate Value Theorem. These two tests are applied together.

Tests Applied (cont.) I – Test –Based on & enhances the Banerjee test and the GCD test. –Adds “accuracy conditions” to the previous tests. Omega Test –Based on a combination of the Least Remainder Algorithm and Fourier-Motzkin Variable Elimination.

Data Dependence Problems for PCB Banerjee TestI -TestOmega Test KEY: INDEPENDENT DEPENDENT MAYBE ***100% is equal to 59936

Data Dependence Problems for LP Banerjee TestI -TestOmega Test KEY: INDEPENDENT DEPENDENT MAYBE ***100% is equal to 293,718

Avg. Cost per Data Dependence in PCB Time (msec)

Avg. Cost per Data Dependence in LP Time (msec)

Total Compilation Time Time in Minutes Perfect Club BenchmarkLapack Library Time in Minutes

Parallelizable Loops Number of Loops Perfect Club BenchmarkLapack Library

Execution Time Perfect Club Benchmark –Only 4 out of the 11 could be effectively parallelized. Lapack Library –Much better results, the execution time of 7 of the programs were cut in half.

Prog.TestSerial Time 2-p4-p6-p8-p Banerjee I-Test Omega Banerjee I-Test Omega OCEAN BDNA Perfect Club Benchmark

Prog.TestSerial Time 2-p4-p6-p8-p Banerjee I-Test Omega Banerjee I-Test Omega GEP EIN RECT LIN Lapack Library

Conclusions –Data dependence accuracy Depending on program differences, may not be substantial (PBC vs. LP). –Efficiency Often a trade-off (efficiency vs. accuracy), Omega proved more accurate at a high cost. –Effectiveness All 3 tests found similar number of parallelizable loops. –Execution Performance Again all three tests produced similar results in execution.

Thank You Any Questions?