1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking.

Slides:



Advertisements
Similar presentations
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Advertisements

Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Linked List Ranking Parallel Algorithms 1. Work Analysis – Pointer Jumping Number of Steps: Tp = O(Log N) Number of Processors: N Work = O(N log N) T1.
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Lecture 3: Parallel Algorithm Design
Chapter 4: Trees Part II - AVL Tree
1 Parallel Parentheses Matching Plus Some Applications.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
Data Structures Using C++ 2E
1 Potential for Parallel Computation Module 2. 2 Potential for Parallelism Much trivially parallel computing  Independent data, accounts  Nothing to.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 9 Tuesday, 11/20/01 Parallel Algorithms Chapters 28,
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
The Euler-tour technique
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
A Perspective Hardware and Software
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
A.Broumandnia, 1 5 PRAM and Basic Algorithms Topics in This Chapter 5.1 PRAM Submodels and Assumptions 5.2 Data Broadcasting 5.3.
COMP308 Efficient Parallel Algorithms
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
Data Structures Using C++ 2E Chapter 10 Sorting Algorithms.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
CSC 211 Data Structures Lecture 13
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Data Structure Introduction.
Union-find Algorithm Presented by Michael Cassarino.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
Algorithm Analysis (Big O)
5 PRAM and Basic Algorithms
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 A simple parallel algorithm Adding n numbers in parallel.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
PRAM and Parallel Computing
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
PRAM Algorithms.
A Perspective Hardware and Software
PRAM Model for Parallel Computation
Numerical Algorithms Quiz questions
Unit –VIII PRAM Algorithms.
CSC 143 Binary Search Trees.
Presentation transcript:

1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking

2 Definition: Prefix Sums  Given a set of n values a1, a2,…, an and an associative the Prefix Sums problem is to compute the n quantities a1, Example: {2, 7, 9, 4}  {2, 9, 18, 22}

3 Doubling  A processing technique in which accesses or actions are governed by increasing powers or 2  That is, processing proceeds by 1, 2, 4, 8, 16, etc., doubling on each iteration

4 Prefix Sum by Doubling  Overview 1. Each a(i) is added to a(i+1) 2. Each a(i) is added to a(i+2) 3. Each a(i) is added to a(i+4) 4. Each a(i) is added to a(i+8) ETC…..  At any time if an index exceeds n, the operation is supressed

5 Prefix Sums by Doubling Example * *20* *23*30*36* * Operation supressed T1 = O(n) Tp = O(log n)

6 Prefix Sums by Doubling Example 0# * 0#0,1#1,22,33,44,55,6*6,7* 0#0,1#0,1,2 # 0,1,2, 3# 1,2,3, 4 2,3,4, 5 3,4,5, 6 4,5,6, 7 00,10,1,20,1,2, 3 0,1,2, 3,4 0,1,2, 3,4,5 0,1,2 3,4,5, 6 0,1,2, 3,4,5, 6,7 # contains final sum * operation suppressed T1 = O(n) Tp = O(log n)

7 Time Complexity  O(Log N)  At each step, the number of sums that are complete doubles 1, 2, 4, 8,… Thus, the number of steps is log n

8 Total Operations – Work/Cost  For N data values Step 1 = N -1 additions {-2 0 }  1 PC is suppressed Step 2 = N – 2 additions {-2 1 }  2 PCs are suppressed Step 3 = N – 4 additions {-2 2 }  4 PCs are suppressed Etc.

9 Consider case of N = 8  Step 1 = N-1 = 7  Step 2 = N-2 = 6  Step 3 = N-4 = 4 TOTAL = 17  Generalize (N-1) + (N-2) + (N-4) = 3N -7  Sequential = N-1

10 Generalize the Work Sum(i=0 to (log n) -1: N – 2 i = (N-1) + (N-2) + (N-4) +…+ (N-2 log n -1 ) =N*log N – ( log n -1 ) ( log n -1 ) = ???

11 Total Work for Prefix Sums ( log n -1 ) = 2 log n - 1 Size = N * Log N – (2 log n – 1) = N * Log N - 2 log n + 1 T1 = N

12 Prefix Sums - Comparisons AlgorithmWork - CostDepth Sequential N - 1 Doubling N Log N - 2 log n + 1Log N Upper/Lower N/2 Log NLog N Odd/Even 2N – Log N - 22 Log N - 2

13 Parallel Strategies - PRAM  Broadcast, Fan-out, Expand O(log n)  Reduction, Combination, Fan-in O(log n)  These are basically opposites

14 PRAM Algorithm Instructions - Spawn- For all  Step 1 of all PRAM algorithms is to activate P processors (Broadcast) One processor starts activation Activation takes O(Log P) time  Instruction: spawn (processor names) E.G. spawn (P0, P1,.., Pn)  For all do {stmts} endfor E.G. For all Pi, 0<=i<=n-1, do{…}

15 Sum of elements – EREW PRAM Given: n elements in A[0 … n-1] Var: A & j are global, i is local spawn (P0, P1, P2,..P n/2-1 )// P = n/2 For all Pi, 0 <= i <= (n/2 -1) for j = 0 to log n - 1 do if (i mod 2 j = 0) & (2i + 2 j < n) A[2i] = A[2i] + A[2i + 2 j ]

16 Trace for P 0 & P 1 P 0, i = 0 for all operations j = 0 to 2 (i=0 mod 2 0 ) & (2* < n) – yes A[0] = A[0] + A[1] P 1, i = 1 for all operations j = 0 to 2 (i=1 mod 2 0 ) & (2* < n) – yes A[2] = A[2] + A[3]

17 Prefix Sum - Doubling CREW PRAM Given: n elements in A[0 … n-1] Var: A & j are global, i is local spawn (P1, P2,..P n-1 ) // note # of PC For all Pi, 1 <= i <= n -1) for j = 0 to log n - 1 do if (i - 2 j >= 0) A[i] = A[i] + A[i - 2 j ]

18 List Packing An Application of Prefix Sums  Consider an array of upper and lower case letters. Delete the lower case letters and compact the upper case to the low-order end of the array.

19 List Packing - Demonstration aGHinWbN GHWN

20 List Packing Implementation via Prefix Sums  Assign 1 to items to be packed and 0 to items to be deleted.  Perform the prefix sums on the 1/0  If upper case, store in location = sum, otherwise do nothing  Last sum provides number of packed items

21 List Packing Implementation via Prefix Sums aGHinWbN GHWN S’pose: don’t know how 0/1 assigned. Can I still pack?

22  Given a linked list, stored in an array, compute the distance of each element from the end (either end) of the list.  Problem is similar to prefix sums, using all 1’s to sum.  Called Pointer Jumping (not doubling) when using pointers.  Don’t destroy original list! Linked List Ranking

23 Linked List Ranking - demo AQFCDTBP

24 ADQBCFPT Nil rank next data * List Ranking - Demonstration

25 List Ranking Pointer Jumping - similar to Doubling init nil nil nil nil ► ► ► ► ► ►► ►► ► ► ► ► ► ► ► ► ► ► Note: Must copy pointers to preserve the original linked list.

26 List Ranking Code // Copy next and set 1’s for all elements For all P ί, 0 ≤ ί ≤ n-1, pardo N( ί ) = next ( ί ) Rank ( ί ) = 1 // Doubling on the linked list For j = 1 to log n do if N( ί ) ≠ N (N( ί )) then Rank ( ί ) = Rank( ί ) + Rank (N( ί )) N( ί ) = N (N( ί ))

27 What are we really doing???  If your location and your next do not point to the same location then Add rank of next to yourself Change your next to next(next)  Each step doubles distance of your next pointer  Progression of solution Initially – 1 rank correct Step 1 – 2 ranks correct Step 2 – 4 ranks correct Step 3 – 8 ranks correct O(log n) steps

Work Analysis  Number of Steps: Tp = O(Log N)  Number of Processors: N  Work = O(N log N)  T1 = O(N)  Work Optimal?? 28

29 Applications of List Ranking  Expression Tree Evaluation  Parentheses Matching  Tree Traversals  Ear–Decomposition of Graphs  Euler tour of trees  many others