Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Lecture 19: Parallel Algorithms
Instructor Neelima Gupta Table of Contents Parallel Algorithms.
Divide and Conquer Yan Gu. What is Divide and Conquer? An effective approach to designing fast algorithms in sequential computation is the method known.
Lecture 3: Parallel Algorithm Design
Parallel vs Sequential Algorithms
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Prune-and-Search Method
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
1 Efficiency of Algorithms. 2  Two main issues related to the efficiency of algorithms:  Speed of algorithm  Efficient memory allocation  We will.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
Solution methods for Discrete Optimization Problems.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
COMP308 Efficient Parallel Algorithms
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Slide 1 Course Description and Objectives: The aim of the module is –to introduce techniques for the design of efficient parallel algorithms and –their.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
5 PRAM and Basic Algorithms
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
PRAM and Parallel Computing
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
PRAM Algorithms.
PRAM Model for Parallel Computation
Parallel and Distributed Algorithms
Lecture 5 PRAM Algorithms (cont.)
CSE838 Lecture notes copy right: Moon Jung Chung
Numerical Algorithms Quiz questions
Unit –VIII PRAM Algorithms.
Parallel Algorithms A Simple Model for Parallel Processing
Module 6: Introduction to Parallel Computing
Presentation transcript:

Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra (11/05/1930-6/9/2002)

One more time about PRAM model  N synchronized processors  Shared memory EREW, ERCW, CREW, CRCW  Constant time access to the memory standard multiplication/addition Communication (implemented via access to shared memory)

Covered so far… in COMP308  Metrics … efficiency, cost, limits, speed-up  PRAM models and basic algorithms  Simulation of Concurrent Write (CW) with EW  Parallel sorting in logarithmic time TODAY’s topic: how fast we can compute with many processor and how to reduce the number of processors?

Two problems for PRAM Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.

Min of n numbers  Input: Given an array A with n numbers  Output: the minimal number in an array A Sequential algorithm … At least n comparisons should be performed!!! COST = (num. of processors)  (time) Cost = 1  n ? Sequential vs. Parallel Optimal Par.Cost = O(n)

Mission: Impossible … computing in a constant time  Archimedes: Give me a lever long enough and a place to stand and I will move the earth  NOWDAYS…. Give me a parallel machine with enough processors and I will find the smallest number in any giant set in a constant time!

Parallel solution 1 Min of n numbers  Comparisons between numbers can be done independently  The second part is to find the result using concurrent write mode  For n numbers ----> we have ~ n 2 pairs [a 1,a 2,a 3,a 4 ] (a 1,a 2 ) (a 2, a 3 ) (a 3, a 4 ) (a 2, a 4 ) (a 1, a 3 ) (a 1, a 4 ) (a i,a j ) If a i > a j then a i cannot be the minimal number ij 1 n M[1..n]

The following program computes MIN of n numbers stored in the array C[1..n] in O(1) time with n 2 processors. Algorithm A1 for each 1  i  n do in parallel M[i]:=0 for each 1  i,j  n do in parallel if i  j C[i]  C[j] then M[j]:=1 for each 1  i  n do in parallel if M[i]=0 then output:=i

From n 2 processors to n 1+1/2 Step 1: Partition into disjoint blocks of size Step 2: Apply A1 to each block Step 3: Apply A1 to the results from the step 2 A1

From n 1+1/2 processors to n 1+1/4 Step 1: Partition into disjoint blocks of size Step 2: Apply A2 to each block Step 3: Apply A2 to the results from the step 2 A2

n 2 -> n 1+1/2 -> n 1+1/4 -> n 1+1/8 -> n 1+1/16 ->… -> n 1+1/k ~ n 1  Assume that we have an algorithm A k working in O(1) time with processors Algorithm A k+1 1.Let  =1/2 2. Partition the input array C of size n into disjoint blocks of size n  each 3. Apply in parallel algorithm A k to each of these blocks 4. Apply algorithm A k to the array C’ consisting of n/ n  minima in the blocks.

Complexity  We can compute minimum of n numbers using CRCW PRAM model in O(log log n) with n processors by applying a strategy of partitioning the input ParCost = n  log log n

Mission: Impossible (Part 2) Computing a position of the first one in the sequence of 0’s and 1’s in a constant time

Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s. FIRST-ONE-POSITION(C)=4 for the input array C=[0,0,0,1,0,0,0,1,1,1,0,0,0,1] Algorithm A (2 parallel steps and n 2 processors) for each 1  i<j  n do in parallel if C[i] =1 and C[j]=1 then C[j]:=0 for each 1  i  n do in parallel if C[i] =1 then FIRST-ONE-POSITION:=i After the first parallel step C will contain a single element 1

Reducing number of processors Algorithm B – it reports if there is any one in the table. There-is-one:=0 for each 1  i  n do in parallel if C[i] =1 then There-is-one:=

Now we can merge two algorithms A and B 1.Partition table C into segments of size 2.In each segment apply the algorithm B 3.Find position of the first one in these sequence by applying algorithm A 4.Apply algorithm A to this single segment and compute the final value BBBBBBBBBB A A

Complexity  We apply an algorithm A twice and each time to the array of length which need only ( ) 2 = n processors  The time is O(1) and number of processors is n.

Homework:  Construct several algorithms A1,A2,A3,A4... for MIN problem reducing number of processors  Define the algorithm Ak for MIN problem in terms of A1 only [not in terms of A(k-1) ]  How to formulate the algorithm Ak in a direct way?