1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008.

Slides:



Advertisements
Similar presentations
YOSI! BLUE (30 sec play) Ball moves to X1 from X2
Advertisements

An Introduction to Calculus. Calculus Study of how things change Allows us to model real-life situations very accurately.
robot con 6 gradi di mobilità
The Derivative in Graphing and Application
Chapter 4 Section 3 Matrix Multiplication. Scalar Multiplication Scalar Product - multiplying each element in a matrix by a scalar (real number) –Symbol:
Constraint Programming Peter van Beek University of Waterloo.
Maximal Independent Subsets of Linear Spaces. Whats a linear space? Given a set of points V a set of lines where a line is a k-set of points each pair.
Rule Learning – Overview Goal: learn transfer rules for a language pair where one language is resource-rich, the other is resource-poor Learning proceeds.
Quid-Pro-Quo-tocols Strengthening Semi-Honest Protocols with Dual Execution Yan Huang 1, Jonathan Katz 2, David Evans 1 1. University of Virginia 2. University.
COMPANY CONFIDENTIAL PM43C Doors Media Door Used to protect media and print mechanism Front access Door Used for front access to tear bar NOT INCLUDED.
PROLOG MORE PROBLEMS.
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Bank Accounts Management System - p. 448
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
BASED ON THE TUTORIAL ON THE BOOK CD Verilog: Gate Level Design 6/11/
Inverting a Singly Linked List Instructor : Prof. Jyh-Shing Roger Jang Designer : Shao-Huan Wang The ideas are reference to the textbook “Fundamentals.
Rahnuma Islam Nishat Debajyoti Mondal Md. Saidur Rahman Graph Drawing and Information Visualization Laboratory Department of Computer Science and Engineering.
LA TABLA DEL 5 5X1= 5X2= 5X3= 5X4= 5X5= 5X6= 5X7= 5X8= 5X9= 5X10=
Daniel Cederman and Philippas Tsigas Distributed Computing and Systems Chalmers University of Technology.
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
2 x /18/2014 Know Your Facts!. 11 x /18/2014 Know Your Facts!
2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!
2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.
Strategies – Multiplication and Division
Properties of Multiplication Zero property of multiplication Identity property of multiplication Commutative property of multiplication Associative property.
16.410: Eric Feron / MIT, Spring 2001 Introduction to the Simplex Method to solve Linear Programs Eric Feron Spring 2001.
5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.
MS 101: Algorithms Instructor Neelima Gupta
Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.
DIRECTIONAL ARC-CONSISTENCY ANIMATION Fernando Miranda 5986/M
Linear Programming – Simplex Method: Computational Problems Breaking Ties in Selection of Non-Basic Variable – if tie for non-basic variable with largest.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
The Problem of K Maximum Sums and its VLSI Implementation Sung Eun Bae, Tadao Takaoka University of Canterbury Christchurch New Zealand.
Multiplication Facts Practice
Parameter identifiability, constraints, and equifinality in data assimilation with ecosystem models Dr. Yiqi Luo Botany and microbiology department University.
A Case Study  Some Advice on How to Go about Solving Problems in Prolog  A Program to Play Connect-4 Game.
Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.
Chapter 8: The Solver and Mathematical Programming Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University.
Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
Computational Facility Layout
Shannon Expansion Given Boolean expression F = w 2 ’ + w 1 ’w 3 ’ + w 1 w 3 Shannon Expansion of F on a variable, say w 2, is to write F as two parts:
Graeme Henchel Multiples Graeme Henchel
CSE202: Lecture 3The Ohio State University1 Assignment.
1.Print and cut round outside of cootie catcher 2.Fold in half and in half again 3.Open out, turn over so.
The Project Problem formulation (one page) Literature review –“Related work" section of final paper, –Go to writing center, –Present paper(s) to class.
0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.
C2 Part 4: VLSI CAD Tools Problems and Algorithms Marcelo Johann EAMTA 2006.
MULTIPLICATION OF INTEGERS
Simplex (quick recap). Replace all the inequality constraints by equalities, using slack variables.
Adding & Subtracting Mixed Numbers. Objective: To develop fluency in +, –, x, and ÷ of non-negative rational numbers. Essential Question: How.
The Mechanical Cryptographer (Tolerant Algebraic Side-Channel Attacks using pseudo-Boolean Solvers) 1.
BINARY/MIXED-INTEGER PROGRAMMING ( A SPECIAL TYPE OF INTEGER PROGRAMMING)
Count to 20. Count reliably at least 10 objects. Use ‘more’ and ‘less’ to compare two numbers. Count reliably at least 10 objects. Estimate number of objects.
7x7=.
Lecture 3: Parallel Algorithm Design
1 Parallel Parentheses Matching Plus Some Applications.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Parallel prefix sum computation Lecture 7. 2 Prefix sum.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Data Structures and Algorithms in Parallel Computing Lecture 8.
Concurrency and Performance Based on slides by Henri Casanova.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Lecture 3: Parallel Algorithm Design
Solve more difficult number problems mentally
PRAM Algorithms.
Lecture 5 PRAM Algorithms (cont.)
Presentation transcript:

1 Lecture 5 PRAM Algorithm: Parallel Prefix Parallel Computing Fall 2008

2 Parallel Operations with Multiple Outputs – Parallel Prefix Problem definition: Given a set of n values x0, x1,..., xn−1 and an associative operator, say +, the parallel prefix problem is to compute the following n results/“sums”. 0: x0, 1: x0 + x1, 2: x0 + x1 + x2,... n − 1: x0 + x xn−1. Parallel prefix is also called prefix sums or scan. It has many uses in parallel computing such as in load-balancing the work assigned to processors and compacting data structures such as arrays. We shall prove that computing ALL THE SUMS is no more difficult that computing the single sum x0 +...xn−1.

3 Parallel Prefix Algorithm1: divide- and-conquer x0 x1 x2 x3 x4 x5 x6 x7 <<Paralel Prefix "Box" for 8 inputs | | | | | 1 | | 2 | <<< 2 PP Boxes for 4 inputs each | | | | | | | | | | | | Take rightmost output of Box 1 and | | | | | | | | combine it with the outputs of Box2 | | | | x0+...+x3 x0+..+x7 x0+...+x2 x0+...+x6 x0+x1 x0+...+x5 x0 x0+...+x4

4 Parallel Prefix Algorithm 2: An algorithm for parallel prefix on an EREW PRAM would require lg n phases. In phase i, processor j reads the contents of cells j and j − 2 i (if it exists) combines them and stores the result in cell j. The EREW PRAM algorithm that solves the parallel prefix problem has performance P = O(n), T = O(lg n), and W = O(n lg n), W2 = O(n).

5 Parallel Prefix Algorithm 2: Example For visualization purposes, the second step is written in two different lines. When we write x x5 we mean x1 + x2 + x3 + x4 + x5. x1 x2 x3 x4 x5 x6 x7 x8 1. x1+x2 x2+x3 x3+x4 x4+x5 x5+x6 x6+x7 x7+x8 2. x1+(x2+x3) (x2+x3)+(x4+x5) (x4+x5)+(x6+x7) 2. (x1+x2)+(x3+x4) (x3+x4)+(x5+x6) (x5+x6+x7+x8) 3. x1+...+x5 x1+...+x7 3. x1+...+x6 x1+...+x8 Finally F. x1 x1+x2 x1+...+x3 x1+...+x4 x1+...+x5 x1+...+x6 x1+...+x7 x1+...+x8

6 Parallel Prefix Algorithm 2: Example 2 For visualization purposes, the second step is written in two different lines. When we write [1 : 5] we mean x1 +x2 + x3 + x4 + x5. We write below [1:2] to denote x1+x2 [i:j] to denote xi x5 [i:i] is xi NOT xi+xi! [1:2][3:4]=[1:2]+[3:4]= (x1+x2) + (x3+x4) = x1+x2+x3+x4 A * indicates value above remains the same in subsequent steps 0 x1 x2 x3 x4 x5 x6 x7 x8 0 [1:1] [2:2] [3:3] [4:4] [5:5] [6:6] [7:7] [8:8] 1 * [1:1][2:2] [2:2][3:3] [3:3][4:4] [4:4][5:5] [5:5][6:6] [6:6][7:7] [7:7][8:8] 1. * [1:2] [2:3] [3:4] [4:5] [5:6] [6:7] [7:8] 2. * * [1:1][2:3] [1:2][3:4] [2:3][4:5] [3:4][5:6] [4:5][6:7] [5:6][7:8] 2. * * [1:3] [1:4] [2:5] [3:6] [4:7] [5:8] 3. * * * * [1:1][2:5] [1:2][3:6] [1:3][4:7] [1:4][5:8] 3. * * * * [1:5] [1:6] [1:7] [1:8] [1:1] [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [1:8] x1 x1+x2 x1+x2+x3 x1+...+x4 x1+...+x5 x1+...+x6 x1+...+x7 x1+...+x8

7 Parallel Prefix Algorithm 2: // We write below[1:2] to denote X[1]+X[2] // [i:j] to denote X[i]+X[i+1]+...+X[j] // [i:i] is X[i] NOT X[i]+X[i] // [1:2][3:4]=[1:2]+[3:4]= (X[1]+X[2])+(X[3]+X[4])=X[1]+X[2]+X[3]+X[4] // Input : M[j]= X[j]=[j:j] for j=1,...,n. // Output: M[j]= X[1]+...+X[j] = [1:j] for j=1,...,n. ParallelPrefix(n) 1. i=1; // At this step M[j]= [j:j]=[j+1-2**(i-1):j] 2. while (i < n ) { 3. j=pid(); 4. if (j-2**(i-1) >0 ) { 5. a=M[j]; // Before this stepM[j] = [j+1-2**(i-1):j] 6.b=M[j-2**(i-1)]; // Before this stepM[j-2**(i-1)]= [j-2**(i-1)+1-2**(i-1):j-2**(i-1)] 7. M[j]=a+b; // After this step M[j]= M[j]+M[j-2**(i-1)]=[j-2**(i-1)+1-2**(i-1):j-2**(i-1)] // [j+1-2**(i-1):j] = [j-2**(i-1)+1-2**(i-1):j]=[j+1-2**i:j] 8. } 9. i=i*2; } At step 5, memory location j − 2 i−1 is read provided that j − 2 i−1 ≥ 1. This is true for all times i ≤ t j = lg(j − 1) + 1. For i > t j the test of line 4 fails and lines 5-8 are not executed.

8 Parallel Prefix Algorithm based on Complete Binary Tree Consider the following variation of parallel prefix on n inputs that works on a complete binary tree with n leaves (assume n is a power of two). Action by nodes 1. Non-leaf : If it receives l and r from left and right children, computes l + r and sends it up and send down to its right child the l. 2. Root : Step [1] except nothing is sent up. 3. Non-leaf : If it gets p from parent it transmits it to its left/right children. 4. Leaf : If it holds l and receives p from its parent it sets l = p + l (this order) [note p is the left argument, l is the right one, order matters]

9 Parallel Prefix Algorithm based on Complete Binary Tree: Example x1 X1+x2+x3+x4+X5+x6+x7+x8 X1+x2+x3+x4X5+x6+x7+x8 X1+x2 x3+x4X5+x6 x7+x8 x2x3x4X5x6 x7 x8 \x1+x2+x3+x4 \x1+x2 \x5+x6 \x1+x2+x3+x4 \x5+x6 \x1+x2+x3+x4 \x1 \x3 \x1+x2 \x7 \x5+x6 \x1+x2+x3+x4 \x5 \x1+x2+x3+x4 after recving: x1+x2 x3+x4 x5+x6 x7+x8 x1+.+x3 x1+..+x4 x1+.+x5 x1+.+x6 x5+.+x7 x5+.+x8 x1+.+x3 x1+..+x4 x1+.+x5 x1+.+x6 x1+.+x7 x1+.+x8

10 Parallel Prefix Algorithm based on Complete Binary Tree: A recursive version The parallel prefix algorithm of the previous page (tree-based) requires about 2 lg n+1 parallel steps, P = n processors and work W = Θ(n lg n), and W2 = Θ(n). One could describe that version due to Ladner and Fischer as follows. By rescheduling the computation and using P = n/ lg n processors, the work can be reduced to linear. begin PPF recursive (In[0..n − 1],Out[0..n − 1],p = 0..n − 1) 1. Out[0] = In[0]; 2. if n > 1 then 3. ∀ i = 0,..., n − 1 dopar 4. X[i] = In[2i] + In[2i+1]; 5. enddo 6. Y=PPF recursive(X[0..n/2 − 1],Y [0..n/2 − 1],p = 0..n/2 − 1); 7. ∀ i = 0,..., n/2 − 1 dopar 8. Out[2i+1]=Y[i]; 9. enddo 10. ∀ i = 1,..., n/2 − 1 dopar 11. Out[2i]=Y[i-1]+A[2i]; 12. enddo 13. endif end PPF recursive

11 Parallel Prefix Algorithm based on Complete Binary Tree: An iterative version An iterative version of that algorithm is depicted below. begin PPF iterative (In[0..n − 1],Out[0..n − 1],p = 0..n − 1) 1. for i = 0,..., n −d1opar 2. T[0,i] = In[i]; 3. enddo 4. for j = 1,..., lg ndo 5. for i = 0,..., n/2 j − 1 dopar 6. T[j,i] = T[j-1,2i] + T[j-1,2i+1]; 7.enddo 8. for j = lgn,..., 0do 9. for i = 0dopar 10. V[j,0] = T[j,0]; //Processor 0 executes only 11. for odd(i), 0 ≤ i ≤ n/2 j − 1dopar 12. V[j,i] = V[j+1,i/2]; //Processor odd(i) executes only 11.for even(i), 2 ≤ i ≤ n/2 j −d1opar 12. V[j,i] = V[j+1,(i-1)/2]+T[j,i]; //Processor even(i) executes only 13. enddo 14. Out[i]=V[0,i]; end PPF iterative

12 End Thank you!