PRAM (Parallel Random Access Machine)

Slides:



Advertisements
Similar presentations
Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Advertisements

1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Instructor Neelima Gupta Table of Contents Parallel Algorithms.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
A Model of Computation for MapReduce
Parallel vs Sequential Algorithms
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Efficient Parallel Algorithms COMP308
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Uzi Vishkin.  Introduction  Objective  Model of Parallel Computation ▪ Work Depth Model ( ~ PRAM) ▪ Informal Work Depth Model  PRAM Model  Technique:
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Models of Parallel Computation
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
A.Broumandnia, 1 5 PRAM and Basic Algorithms Topics in This Chapter 5.1 PRAM Submodels and Assumptions 5.2 Data Broadcasting 5.3.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
RAM, PRAM, and LogP models
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Parallel computation Section 10.5 Giorgi Japaridze Theory of Computability.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction toData structures and Algorithms
CPT-S Advanced Databases
PRAM and Parallel Computing
Lecture 3: Parallel Algorithm Design
PRAM Model for Parallel Computation
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Lecture 22 review PRAM: A model developed for parallel machines
Parallel computation models
PRAM Algorithms.
PRAM Model for Parallel Computation
Pipelining and Vector Processing
PRAM architectures, algorithms, performance evaluation
Data Structures and Algorithms in Parallel Computing
Parallel and Distributed Algorithms
Objective of This Course
CSE838 Lecture notes copy right: Moon Jung Chung
Overview Parallel Processing Pipelining
Unit –VIII PRAM Algorithms.
Parallel Algorithms A Simple Model for Parallel Processing
Module 6: Introduction to Parallel Computing
Presentation transcript:

PRAM (Parallel Random Access Machine) David Rodriguez-Velazquez Spring -09 CS-6260 Dr. Elise de Doncker

Overview What is a machine model? Why do we need a model? RAM PRAM Steps in computation Write conflict Examples

A parallel Machine Model What is a machine model? Describes a “machine” Puts a value to the operations on the machine Why do we need a model? Makes it easy to reason algorithms Achieve complexity bounds Analyzes maximum parallelism

RAM (Random Access Machine) Unbounded number of local memory cells Each memory cell can hold an integer of unbounded size Instruction set included –simple operations, data operations, comparator, branches All operations take unit time Time complexity = number of instructions executed Space complexity = number of memory cells used

PRAM (Parallel Random Access Machine) Definition: Is an abstract machine for designing the algorithms applicable to parallel computers M’ is a system <M, X, Y, A> of infinitely many RAM’s M1, M2, …, each Mi is called a processor of M’. All the processors are assumed to be identical. Each has ability to recognize its own index i Input cells X(1), X(2),…, Output cells Y(1), Y(2),…, Shared memory cells A(1), A(2),…,

PRAM (Parallel RAM) Unbounded collection of RAM processors P0, P1, …, Processors don’t have tape Each processor has unbounded registers Unbounded collection of share memory cells All processors can access all memory cells in unit time All communication via shared memory

PRAM (step in a computation) Consist of 5 phases (carried in parallel by all the processors) each processor: Reads a value from one of the cells x(1),…, x(N) Reads one of the shared memory cells A(1), A(2),… Performs some internal computation May write into one of the output cells y(1), y(2),… May write into one of the shared memory cells A(1), A(2),… e.g. for all i, do A[i] = A[i-1] + 1; Read A[i-1] , compute add 1, write A[i] happened synchronously

PRAM (Parallel RAM) Some subset of the processors can remain idle PN Shared Memory Cells Two or more processors may read simultaneously from the same cell A write conflict occurs when two or more processors try to write simultaneously into the same cell

Share Memory Access Conflicts PRAM are classified based on their Read/Write abilities (realistic and useful) Exclusive Read(ER) : all processors can simultaneously read from distinct memory locations Exclusive Write(EW) : all processors can simultaneously write to distinct memory locations Concurrent Read(CR) : all processors can simultaneously read from any memory location Concurrent Write(CW) : all processors can write to any memory location EREW, CREW, CRCW

Concurrent Write (CW) What value gets written finally? Priority CW: processors have priority based on which value is decided, the highest priority is allowed to complete WRITE Common CW: all processors are allowed to complete WRITE iff all the values to be written are equal. Arbitrary/Random CW: one randomly chosen processor is allowed to complete WRITE

Strengths of PRAM PRAM is attractive and important model for designers of parallel algorithms Why? It is natural: the number of operations executed per one cycle on p processors is at most p It is strong: any processor can read/write any shared memory cell in unit time It is simple: it abstracts from any communication or synchronization overhead, which makes the complexity and correctness of PRAM algorithm easier It can be used as a benchmark: If a problem has no feasible/efficient solution on PRAM, it has no feasible/efficient solution for any parallel machine

Computational power Model A is computationally stronger than model B (A>=B) iff any algorithm written for B will run unchanged on A in the same parallel time and same basic properties. Priority >= Arbitrary >= Common >=CREW >= EREW Most Least powerful powerful Least Most realistic realistic

An initial example How do you add N numbers residing in memory location M[0, 1, …, N] Serial Algorithm = O(N) PRAM Algorithm using N processors P0, P1, P2, …, PN ?

PRAM Algorithm (Parallel Addition) Step 1 P2 + P0 + Step 2 P0 + Step 3

PRAM Algorithm (Parallel Addition) Log (n) steps = time needed n / 2 processors needed Speed-up = n / log(n) Efficiency = 1 / log(n) Applicable for other operations +, *, <, >, etc.

Example 2 p processor PRAM with n numbers (p<=n) Does x exist within the n numbers? P0 contains x and finally P0 has to know Algorithm Inform everyone what x is Every processor checks [n/p] numbers and sets a flag Check if any of the flags are set to 1

Example 2 EREW CREW CRCW (common) Inform everyone what x is log(p) 1 Every processor checks [n/p] numbers and sets a flag n/p Check if any of the flag are set to 1

Some variants of PRAM Bounded number of shared memory cells. Small memory PRAM (input data set exceeds capacity of the share memory i/o values can be distributed evenly among the processors) Bounded number of processor Small PRAM. If # of threads of execution is higher, processors may interleave several threads. Bounded size of a machine word. Word size of PRAM Handling access conflicts. Constraints on simultaneous access to share memory cells

Lemma Assume p’<p. Any problem that can be solved for a p processor PRAM in t steps can be solved in a p’ processor PRAM in t’ = O(tp/p’) steps (assuming same size of shared memory) Proof: Partition p is simulated processors into p’ groups of size p/p’ each Associate each of the p’ simulating processors with one of these groups Each of the simulating processors simulates one step of its group of processors by: executing all their READ and local computation substeps first executing their WRITE substeps then

Lemma Assume m’<m. Any problem that can be solved for a p processor and m- cell PRAM in t steps can be solved on a max(p,m’)-processors m’-cell PRAM in O(tm/m’) steps Proof: Partition m simulated shared memory cells into m’ continuous segments Si of size m/m’ each Each simulating processor P’i 1<=i<=p, will simulate processor Pi of the original PRAM Each simulating processor P’i 1<=i<=m’, stores the initial contents of Si into its local memory and will use M’[i] as an auxiliary memory cell for simulation of accesses to cell of Si Simulation of one original READ operation Each P’i i=1,…,max(p,m’) repeats for k=1,…,m/m’ write the value of the k-th cell of Si into M’[i] i=1…,m’, read the value which the simulated processor Pi i=1,…,,p, would read in this simulated substep, if it appeared in the shared memory The local computation substep of Pi i=1..,p is simulated in one step by P’i Simulation of one original WRITE operation is analogous to that of READ

Conclusions We need some model to reason, compare, analyze and design algorithms PRAM is simple and easy to understand Rich set of theoretical results Over-simplistic and often not realistic The programs written on these machines are, in general, of type MIMD. Certain special cases such as SIMD may also be handled in such a framework

Question Why is PRAM attractive and important model for designers of parallel algorithms ? It is natural: the number of operations executed per one cycle on p processors is at most p It is strong: any processor can read/write any shared memory cell in unit time It is simple: it abstracts from any communication or synchronization overhead, which makes the complexity and correctness of PRAM algorithm easier It can be used as a benchmark: If a problem has no feasible/efficient solution on PRAM, it has no feasible/efficient solution for any parallel machine