3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures
Sorting an Intransitive Total Ordered Set
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
CS 171: Introduction to Computer Science II
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Prefix Computation Advanced Algorithms & Data Structures Lecture Theme 14 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
TCSS 343, version 1.1 Algorithms, Design and Analysis Transform and Conquer Algorithms Presorting HeapSort.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 9 Tuesday, 11/20/01 Parallel Algorithms Chapters 28,
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
Sorting, Searching, and Simulation in the MapReduce Framework Michael T. Goodrich Dept. of Computer Science.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
The Euler-tour technique
The Complexity of Algorithms and the Lower Bounds of Problems
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
Version TCSS 342, Winter 2006 Lecture Notes Trees Binary Trees Binary Search Trees.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
A Perspective Hardware and Software
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
COMP308 Efficient Parallel Algorithms
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
CSE221/ICT221 Analysis and Design of Algorithms CSE221/ICT221 Analysis and Design of Algorithms Analysis of Algorithm using Tree Data Structure Asst.Prof.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Preview  Graph  Tree Binary Tree Binary Search Tree Binary Search Tree Property Binary Search Tree functions  In-order walk  Pre-order walk  Post-order.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Chapter 2: Basic Data Structures. Spring 2003CS 3152 Basic Data Structures Stacks Queues Vectors, Linked Lists Trees (Including Balanced Trees) Priority.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Binary Search Trees (BST)
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
HYPERCUBE ALGORITHMS-1
CSE 2331/5331 Topic 8: Binary Search Tree Data structure Operations.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 A simple parallel algorithm Adding n numbers in parallel.
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
PRAM Algorithms.
A Perspective Hardware and Software
PRAM Model for Parallel Computation
CHAPTER 30 (in old edition) Parallel Algorithms
ICS 353: Design and Analysis of Algorithms
Unit –VIII PRAM Algorithms.
Parallel Algorithms A Simple Model for Parallel Processing
Topic 6: Binary Search Tree Data structure Operations
Presentation transcript:

3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3

Euler Tours ● Technique for fast optimal processing of tree data ● Euler circuit of directed graph: directed cycle that traverses each edge exactly once ● Represent (rooted) tree by Euler circuit of its directed version

Trees (Balance Parentheses) Key property: The parenthesis subsequence corresponding to a subtree is balanced. ( ( ( ) ( ) ) ( ) ( ( ) ( ) ( ) ) )

Computing the Depth Problem definition ➢ Given a binary tree with n nodes, compute the depth of each node Serial algorithm takes O(n) time A simple parallel algorithm ➢ Starting from root, compute the depths level by level ➢ Still O(n) because the height of the tree could be as high as n Euler tour algorithm ➢ Uses parallel prefix computation

Computing the Depth ● Euler tour: A cycle that traverses each edge exactly once in a graph ➢ It is a directed version of a tree Regard an undirected edge into two directed edges ➢ Any directed version of a tree has an Euler tour by traversing the tree in a DFS way forming a linked list. ● Employ 3*n processors ➢ Each node i has fields i.parent, i.left, i.right ➢ Each node i has three processors, i.A, i.B, and i.C.

Computing the Depth ● Three processors in each node of the tree are linked as follows i.A = i.left.A if i.left != nil i.B if i.left = nil i.B = i.right.A if i.right != nil i.C if i.right = nil i.C = i.parent.B if i is the left child i.parent.C if i is the right child nil if i.parent = nil

Computing the Depth ● Algorithm ➢ Construct the Euler tour for the tree – O(1) time ➢ Assign 1 to all A processors, 0 to B processors, -1 to C processors ➢ Perform a parallel prefix computation ➢ The depth of each node resides in its C processor ● O(log n) ➢ Actually log 3n ● EREW because no concurrent read or write ● Speedup ➢ S = n/log n

Computing the depth

Broadcasting on a PRAM “Broadcast” can be done on CREW PRAM in O(1) steps : Broadcaster sends value to shared memory Processors read from shared memory Requires lg(P) steps on EREW PRAM. M PPPPPPPP B

Concurrent Write - Finding Max ● Finding max problem ➢ Given an array of n elements, find the maximum(s) ➢ sequential algorithm is O(n) ● Data structure for parallel algorithm ➢ Array A[1..n] ➢ Array m[1..n]. m[i] is true if A[i] is the maximum ➢ Use n 2 processors

Concurrent Write - Finding Max ● Fast_max(A, n) for i = 1 to n do, in parallel m[i] = true// A[i] is potentially maximum for i = 1 to n, j = 1 to n do, in parallel if A[i] < A[j] then m[i] = false for i = 1 to n do, in parallel if m[i] = true then max = A[i] return max ● Time complexity: O(1)

Concurrent Write - Finding Max ● Concurrent-write ➢ In step 4 and 5, processors with A[i] < A[j] write the same value ‘false’ into the same location m[i] ➢ This actually implements m[i] = (A[i]  A[1])  …  (A[i]  A[n]) ● Is this work efficient? ➢ No, n 2 processors in O(1) ➢ O(n 2 ) work vs. sequential algorithm is O(n)

Concurrent Write - Finding Max ● What is the time complexity for the Exclusive-write? ➢ Initially elements “think” that they might be the maximum ➢ First iteration: For n/2 pairs, compare. ➢ n/2 elements might be the maximum. ➢ Second iteration: n/4 elements might be the maximum. ➢ log n th iteration: one element is the maximum. ➢ So Fast_max with Exclusive-write takes O(log n). ● O(1) (CRCW) vs. O(log n) (EREW)

Simulating CRCW with EREW ● CRCW algorithms are faster than EREW algorithms ➢ How much fast? ● Theorem ➢ A p-processor CRCW algorithm can be no more than O(log p) times faster than the best p-processor EREW algorithm

Simulating CRCW with EREW ● Proof by simulating CRCW steps with EREW steps ➢ Assumption: A parallel sorting takes O(log n) time with n processors ➢ When CRCW processor p i write a datum x i into a location l i, EREW p i writes the pair (l i, x i ) into a separate location A[i] Note EREW write is exclusive, while CRCW may be concurrent ➢ Sort A by l i O(log p) time by assumption ➢ Compare adjacent elements in A ➢ For each group of the same elements, only one processor, say first, write x i into the global memory l i. Note this is also exclusive. ➢ Total time complexity: O(log p)

Simulating CRCW with EREW

CRCW vs. EREW ● CRCW ➢ Hardware implementations are expensive ➢ Used infrequently ➢ Easier to program, runs faster, more powerful. ➢ Implemented hardware is slower than that of EREW ➢ In reality one cannot find maximum in O(1) time ● EREW ➢ Programming model is too restrictive ➢ Cannot implement powerful algorithms