Download presentation
Presentation is loading. Please wait.
1
Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann Presented by Vincent Yau
2
Induction Variable Substitution Most of the compilers do not able to transform some general forms of induction variable triangular nested loops multiplicative expressions.
3
General Induction Variable Algorithm Step 1: Recognize the induction variable pattern iv = iv + inc_expression inc_expression - outerloop index, other iv or loop-invariant Step 2: Compute 3 Definitions (next slide) Compute closed forms Step 3: Direct substitution of the closed forms
4
iv = 0 do i = 1, n do j = 1, i a(iv) = … iv = iv + 1 enddo Example:
5
iv = 0 do i = 1, n do j = 1, i a(iv) = … iv = iv + 1 enddo Example: = j
6
iv = 0 do i = 1, n do j = 1, i a(iv) = … iv = iv + 1 enddo Example: = j = i*(i - 1)/2
7
iv = 0 do i = 1, n do j = 1, i a(iv) = … iv = iv + 1 enddo Example: do i = 1, n do j = 1, i a(j+(i 2 - i) / 2 - 1) = … enddo = j = i*(i - 1)/2
8
Symbolic sum function base on Bernoulli numbers Bernoulli numbers are defined by special case of Bernoulli polynomials
9
Wrap-Around Variables Definition: The variable that takes on the value of an induction variable after one iteration of a loop.
10
Example: m = 0 do i = 1, n do j = 1, i lb = j ub = i do k = i, n do l = lb, ub m = m + 1 a(m) = … enddo lb = 1 ub = k + 1 enddo m = m + i enddo
11
Example: m = 0 do i = 1, n do j = 1, i lb = j ub = i do k = i, n do l = lb, ub m = m + 1 a(m) = … enddo lb = 1 ub = k + 1 enddo m = m + i enddo Step 1: recognized the wrap-around variable Step 2: remove the wrap-around variable (lb, ub) by peeling the first iteration of the k loop. (next slide) Step 3: apply induction variable substitution Powerful symbolic manipulation is needed
12
Example: m = 0 do i = 1, n do j = 1, i lb = j ub = i do k = i, n do l = lb, ub m = m + 1 a(m) = … enddo lb = 1 ub = k + 1 enddo m = m + i enddo m = 0 do i = 1, n do j = 1, i do l = j, i m = m + 1 a(m) = … enddo do k = 1 + i, n do l = 1, k m = m + 1 a(m) = … enddo m = m + i enddo step 2
13
m = 0 do i = 1, n do j = 1, i do l = j, i m = l + (i + (-9i 2 - 3i 4 + 6i + 6i 3 - 6in - 6in 2 + 6i 2 n 2 )/4 - 3n - 3j 2 -n 2 + 2i 3 + 3j + 3i 2 - 3ij - 3ji 2 + 3jn + 3jn 2 )/6 - 2i + 2ij a(m) = … enddo do k = 1 + i, n do l = 1, k m = l + ((-9i 2 - 3i 4 + 6i + 6i 3 - 6in - 6in 2 + 6i 2 n 2 )/4 - 3k - 3j 2 - 3n 2 - 2i + 3j 3 + 3j + 3k 2 - 3ij - 3ji 2 +3jn +3jn 2 )/6 - 2i + 2ij a(m) = … enddo Example: m = 0 do i = 1, n do j = 1, i do l = j, i m = m + 1 a(m) = … enddo do k = 1 + i, n do l = 1, k m = m + 1 a(m) = … enddo m = m + i enddo step 3
14
Reduction Recognition Step 1: Recognition Pass A(x 1, x 2, x 3, …) = A(x 1, x 2, x 3, …) + B set the reduction variable flag for A( ) Step 2: Data Dependence Pass analyzes candidate reduction variables. removes the reduction flag, if it can be proven to be independence Step 3: Transformation Pass 3 different types of parallel reduction transformation: blocked privatized expanded
15
Transformation Pass insert synchronization primitives around each reduction statement synchronization overhead was high.
16
Performance Result Overall Program Speedups (running on 8-processor set of an SGI Challenge R440)
17
Performance Results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.