Unit –VIII PRAM Algorithms.

Slides:



Advertisements
Similar presentations
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advertisements

1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Parallel Algorithms.
College of Information Technology & Design
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Instructor Neelima Gupta Table of Contents Parallel Algorithms.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Parallel Prefix Computation Advanced Algorithms & Data Structures Lecture Theme 14 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 9 Tuesday, 11/20/01 Parallel Algorithms Chapters 28,
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel.
The Euler-tour technique
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Data Structures and Algorithms in Parallel Computing Lecture 1.
5 PRAM and Basic Algorithms
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 A simple parallel algorithm Adding n numbers in parallel.
PRAM and Parallel Computing
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Binary Search Trees A binary search tree is a binary tree
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Lecture 22 Binary Search Trees Chapter 10 of textbook
Parallel computation models
PRAM Algorithms.
PRAM Model for Parallel Computation
CHAPTER 30 (in old edition) Parallel Algorithms
PRAM architectures, algorithms, performance evaluation
The Complexity of Algorithms and the Lower Bounds of Problems
Data Structures and Algorithms in Parallel Computing
Ch 6: Heapsort Ming-Te Chi
Parallel and Distributed Algorithms
Lecture 5 PRAM Algorithms (cont.)
Lecture 29 Heaps Chapter 12 of textbook Concept of heaps Binary heaps
Data Parallel Algorithms
Parallel Algorithms A Simple Model for Parallel Processing
Module 6: Introduction to Parallel Computing
Estimating Algorithm Performance
Presentation transcript:

Unit –VIII PRAM Algorithms

Classification of the PRAM model Engineered for Tomorrow Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing to the shared memory locations. The power of a PRAM depends on the kind of access to the shared memory locations.

Classification of the PRAM model Engineered for Tomorrow Classification of the PRAM model In every clock cycle, In the Exclusive Read Exclusive Write (EREW) PRAM, each memory location can be accessed only by one processor. In the Concurrent Read Exclusive Write (CREW) PRAM, multiple processor can read from the same memory location, but only one processor can write. In the Concurrent Read Concurrent Write (CRCW) PRAM, multiple processor can read from or write to the same memory location.

In the Common CRCW PRAM, all the processors must write the same value. Engineered for Tomorrow In the Common CRCW PRAM, all the processors must write the same value. In the Arbitrary CRCW PRAM, one of the processors arbitrarily succeeds in writing. In the Priority CRCW PRAM, processors have priorities associated with them and the highest priority processor succeeds in writing.

The relative powers of the different PRAM models are as follows. Engineered for Tomorrow The EREW PRAM is the weakest and the Priority CRCW PRAM is the strongest PRAM model. The relative powers of the different PRAM models are as follows.

We say model A is less powerful compared to model B if either: Engineered for Tomorrow We say model A is less powerful compared to model B if either: the time complexity for solving a problem is asymptotically less in model B as compared to model A. or if the time complexities are the same, the processor or work complexity is asymptotically less in model B as compared to model A. An algorithm designed for a stronger PRAM model can be simulated on a weaker model either with asymptotically more processors (work) or with asymptotically more time.

Adding n numbers on a PRAM Engineered for Tomorrow Adding n numbers on a PRAM Adding n numbers on a PRAM

Adding n numbers on a PRAM Engineered for Tomorrow Adding n numbers on a PRAM This algorithm works on the EREW PRAM model as there are no read or write conflicts. We will use this algorithm to design a matrix multiplication algorithm on the EREW PRAM.

Matrix multiplication Engineered for Tomorrow Matrix multiplication For simplicity, we assume that n = 2p for some integer p.

Matrix multiplication Engineered for Tomorrow Matrix multiplication Each can be computed in parallel. We allocate n processors for computing ci,j. Suppose these processors are P1, P2,…,Pn. In the first time step, processor computes the product ai,m x bm,j. We have now n numbers and we use the addition algorithm to sum these n numbers in log n time.

Matrix multiplication Engineered for Tomorrow Matrix multiplication Computing each takes n processors and log n time. Since there are n2 such ci,j s, we need overall O(n3) processors and O(log n) time. The processor requirement can be reduced to O(n3 / log n). Hence, the work complexity is O(n3)

Matrix multiplication Engineered for Tomorrow Matrix multiplication For simplicity, we assume that n = 2p for some integer p.

Engineered for Tomorrow Hence our algorithm runs on the CREW PRAM and we need to avoid the read conflicts to make it run on the EREW PRAM. We will create n copies of each of the elements ai,j (and bi,j). Then one copy can be used for computing each ci,j . Creating n copies of a number in O (log n) time using O (n) processors on the EREW PRAM. In the first step, one processor reads the number and creates a copy. Hence, there are two copies now. In the second step, two processors read these two copies and create four copies.

Engineered for Tomorrow Since the number of copies doubles in every step, n copies are created in O(log n) steps. Though we need n processors, the processor requirement can be reduced to O (n / log n). Since there are n2 elements in the matrix A (and in B), we need O (n3 / log n) processors and O (log n) time to create n copies of each element. After this, there are no read conflicts in our algorithm. The overall matrix multiplication algorithm now take O (log n) time and O (n3 / log n) processors on the EREW PRAM.

Engineered for Tomorrow Parallel Algorithms Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. Multiple processors connected to a shared memory. Each processor access any location in unit time. All processors can access memory in parallel. All processors can perform operations in parallel. Shared memory p0 p1 pn-1

Concurrent vs. Exclusive Access Engineered for Tomorrow Concurrent vs. Exclusive Access Four models EREW: exclusive read and exclusive write CREW: concurrent read and exclusive write ERCW: exclusive read and concurrent write CRCW: concurrent read and concurrent write Handling write conflicts Common-write model: only if they write the same value. Arbitrary-write model: an arbitrary one succeeds. Priority-write model: the one with smallest index succeeds. EREW and CRCW are most popular.

Synchronization and Control Engineered for Tomorrow Synchronization and Control Synchronization: A most important and complicated issue Suppose all processors are inherently tightly synchronized: All processors execute the same statements at the same time No race among processors, i.e, same pace. Termination control of a parallel loop: Depend on the state of all processors Can be tested in O(1) time.

Pointer Jumping – list ranking Engineered for Tomorrow Pointer Jumping – list ranking Given a single linked list L with n objects, compute, for each object in L, its distance from the end of the list. Formally: suppose next is the pointer field d[i]= 0 if next[i]=nil d[next[i]]+1 if next[i]nil Serial algorithm: (n).

List ranking –EREW algorithm Engineered for Tomorrow List ranking –EREW algorithm LIST-RANK(L) (in O(log n) time) for each processor i, in parallel do if next[i]=nil then d[i]0 else d[i]1 while there exists an object i such that next[i]nil do for each processor i, in parallel do if next[i]nil then d[i] d[i]+ d[next[i]] next[i] next[next[i]]

List-ranking –EREW algorithm Engineered for Tomorrow List-ranking –EREW algorithm 1 3 4 6 5 (a) 3 4 6 1 5 (b) 2 2 2 2 1 3 4 6 1 5 (c) 4 3 2 1 3 4 6 1 5 (d) 5 4 3 2 1 20

List ranking –correctness of EREW algorithm Engineered for Tomorrow List ranking –correctness of EREW algorithm Loop invariant: for each i, the sum of d values in the sub-list headed by i is the correct distance from i to the end of the original list L. Parallel memory must be synchronized: the reads on the right must occur before the writes on the left. Moreover, read d[i] and then read d[next[i]]. An EREW algorithm: every read and write is exclusive. For an object i, its processor reads d[i], and then its precedent processor reads its d[i]. Writes are all in distinct locations. 21

LIST ranking EREW algorithm running time Engineered for Tomorrow LIST ranking EREW algorithm running time O(log n): The initialization for loop runs in O(1). Each iteration of while loop runs in O(1). There are exactly log n iterations: Each iteration transforms each list into two interleaved lists: one consisting of objects in even positions, and the other odd positions. Thus, each iteration double the number of lists but halves their lengths. The termination test in line 5 runs in O(1). Define work = #processors  running time. O(n log n).

Parallel prefix on a list Engineered for Tomorrow Parallel prefix on a list A prefix computation is defined as: Input: <x1, x2, …, xn> Binary associative operation  Output:<y1, y2, …, yn> Such that: y1= x1 yk= yk-1 xk for k=2,3, …,n , i.e, yk=  x1  x2 … xk . Suppose <x1, x2, …, xn> are stored orderly in a list. Define notation: [i,j]= xi  xi+1 … xj 23

Prefix computation LIST-PREFIX(L) for each processor i, in parallel Engineered for Tomorrow Prefix computation LIST-PREFIX(L) for each processor i, in parallel do y[i] x[i] while there exists an object i such that next[i]nil do for each processor i, in parallel do if next[i]nil then y[next[i]] y[i]  y[next[i]] next[i] next[next[i]]

Prefix computation –EREW algorithm Engineered for Tomorrow Prefix computation –EREW algorithm [1,1] x1 [2,2] x2 [3,3] [4,4] x4 [5,5] x5 [6,6] x6 (a) x3 x1 x2 x5 x6 x3 x4 (b) [1,1] [1,2] [2,3] [3,4] [4,5] [5,6] x1 x2 x5 x6 x3 (c) [1,1] [1,2] [1,3] [1,4] [2,5] [3,6] x1 x2 x5 x6 x3 (d) [1,1] [1,2] [1,3] [1,4] [1,5] [1,6]

Find root –CREW algorithm Engineered for Tomorrow Find root –CREW algorithm Suppose a forest of binary trees, each node i has a pointer parent[i]. Find the identity of the tree of each node. Assume that each node is associated a processor. Assume that each node i has a field root[i].

CREW algorithm FIND-ROOTS(F) for each processor i, in parallel Engineered for Tomorrow CREW algorithm FIND-ROOTS(F) for each processor i, in parallel do if parent[i] = nil then root[i]i while there exist a node i such that parent[i]  nil do for each processor i, in parallel do if parent[i]  nil then root[i]  root[parent[i]] parent[i]  parent[parent[i]]

All the writes are exclusive Engineered for Tomorrow Running time: O(log d), where d is the height of maximum-depth tree in the forest. All the writes are exclusive But the read in line 7 is concurrent, since several nodes may have same node as parent. 28

Engineered for Tomorrow CREW v/s EREW Q: How fast can n nodes in a forest determine their roots using only exclusive read? A: (log n) Argument: when exclusive read, a given peace of information can only be copied to one other memory location in each step, thus the number of locations containing a given piece of information at most doubles at each step. Looking at a forest with one tree of n nodes, the root identity is stored in one place initially. After the first step, it is stored in at most two places; after the second step, it is Stored in at most four places, …, so need lg n steps for it to be stored at n places. 29

Find maximum – CRCW algorithm Engineered for Tomorrow Find maximum – CRCW algorithm FAST-MAX(A) nlength[A] for i 0 to n-1, in parallel do m[i] true for i 0 to n-1 and j 0 to n-1, in parallel do if A[i] < A[j] then m[i] false do if m[i] =true then max  A[i] return max 5 6 9 2 9 m 5 F T T F T F 6 F F T F T F 9 F F F F F T 2 T T T F T F A[j] A[i] max=9 The running time is O(1).

CRCW v/s EREW If find maximum using EREW, then (lg n). Engineered for Tomorrow CRCW v/s EREW If find maximum using EREW, then (lg n). Argument: consider how many elements “think” that they might be the maximum. First, n, After first step, n/2, After second step n/4. …, each step, halve. Moreover, CREW takes (log n).

CRCW v/s EREW CRCW: Some say : easier to program and more faster. Engineered for Tomorrow CRCW v/s EREW CRCW: Some say : easier to program and more faster. Others say: The hardware to CRCW is slower than EREW. And one can not find maximum in O(1). Still others say: either EREW or CRCW is wrong. Processors must be connected by a network, and only be able to communicate with other via the network, so network should be part of the model.

Engineered for Tomorrow Thank You..