Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Advanced Topics in Algorithms and Data Structures Lecture 7.2, page 1 Merging two upper hulls Suppose, UH ( S 2 ) has s points given in an array according.
Lecture 3: Parallel Algorithm Design
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
CSE 373: Data Structures and Algorithms Lecture 5: Math Review/Asymptotic Analysis III 1.
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
1 Merge and Quick Sort Quick Sort Reading p
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Introduction to Algorithm design and analysis
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science.
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Parallel computation Section 10.5 Giorgi Japaridze Theory of Computability.
5 PRAM and Basic Algorithms
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Sorting – Lecture 3 More about Merge Sort, Quick Sort.
1 A simple parallel algorithm Adding n numbers in parallel.
Merge Sort.
PRAM and Parallel Computing
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Parallel computation models
PRAM Algorithms.
PRAM Model for Parallel Computation
Growth Functions Algorithms Lecture 8
Parallel and Distributed Algorithms
Unit –VIII PRAM Algorithms.
Parallel Algorithms A Simple Model for Parallel Processing
Module 6: Introduction to Parallel Computing
Presentation transcript:

Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308

CRCW algorithms can solve some problems quickly than can EREW algorithm The problem of finding MAX element can be solved in O(1) time using CRCW algorithm with n 2 processors EREW algorithm for this problem takes  (log n) time and that no CREW algorithm does any better. Why?

Any EREW algorithm can be executed on a CRCW PRAM Thus, the CRCW model is strictly more powerful than the EREW model. But how much more powerful is it? Now we provide a theoretical bound on the power of a CRCW PRAM over an EREW PRAM

Theorem. A p-processor CRCW algorithm can be no more than O(log p) time faster than the best p-processor EREW algorithm for the same problem. Proof. The proof is a simulation argument. We simulate each step of the CRCW algorithm with an O(log p)- time EREW computation. Because the processing power of both machines is the same, we need only focus on memory accessing. Let’s present the proof for simulating concurrent writes here. Implementation of concurrent reading is left as an exercise.

The p processors in the EREW PRAM simulate a concurrent write of the CRCW algorithm using an auxiliary array A of length p. P0P0 P1P1 P2P2 P3P3 P4P4 P5P When CRCW processor P i, for i=0,1,…,p-1, desires to write a datum x i to location l i, each corresponding EREW processor P i instead writes the ordered pair (l i,x i ) to location A[i]. 2. This writes are exclusive, since each processor writes to a distinct memory location. 3. Then, the array A is sorted by the first coordinate of the ordered pairs in O(log p) time, which causes all data written to the same location to be brought together in the output

Simulating step on an EREW PRAM P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 (29,43) 0 (8,12) (29,43) (92,26) (8,12) Simulated CRCW global memory P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 (8,12) 0 (29,43) (92,26) Simulated CRCW global memory P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 (8,12) 0 (29,43) (92,26) sort AA 4. Each EREW processor P i now inspects A[i]=(l j,x j ) and A[i-1]= (l k,x k ), where j and k are values in the range 0  j,k  p-1. If l j  l k or i=0 then P i writes the datum x j to location l j in the global memory. Otherwise, the processor does nothing.

End of the proof Since the array A is sorted by first coordinate, only one of the processors writing to any given location actually succeeds, and thus the write is exclusive. This process thus implements each step of concurrent writing in the common CRCW model in O(log p) time

Optimal sorting in log(n) steps Cole’s algorithm Suppose we know how to merge two increasing sequences in log(log(n)) steps Then we can climb up the merging tree and spend only log(log(n)) per level, thus getting a parallel sorting technique in log(n) log(log(n)) Merges at the same level are performing in parallel

How to merge in log(log(n)) time with n processors Let A and B are to sorted sequences of size n Divide A,B into blocks of length Compare first elements of each block in A with first elements of each block in B Then compare first elements of each block in A with each element in a “suitable” block of B At this point we know where all first elements of each block in A fits into B. AB

Thus the problem has been reduced to a set of disjoint problems each of which involves merging of block of elements of A with some consecutive piece of B. Recursively we solve these problems The parallel time t(n) satisfies to t(n)  2+ t( ) implying t(n)=O(log(log(n)))

The issue arises, therefore, of which model is preferable – CRCW or EREW Advocates of the CRCW models point out that they are easier to program than EREW model and that their algorithms run faster Critics contend that hardware to implement concurrent memory operations is slower than hardware to exclusive memory operations, and thus the faster running time of CRCW algorithm is fictitious. –In reality, they say, one cannot find the maximum of n values in O(1) time Others say that PRAM is the wrong model entirely. Processors must be interconnected by a communication network, and the communication network should be part of the model It is quite clear that the issue of the “right” parallel model is not going to be easily settled in favour of any one model. The important think to realize, however, is that these models are just that: models!