Parallel computation models

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Parallel Algorithms and Computing Selected topics Parallel Architecture.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
PRAM (Parallel Random Access Machine)
Efficient Parallel Algorithms COMP308
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Overview Efficient Parallel Algorithms COMP308. COMP 308 Exam Time allowed : 2.5 hours Answer four questions (out of six). If you attempt to answer more.
2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Models of Parallel Computation
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
RAM, PRAM, and LogP models
LogP Model Motivation BSP Model Limited to BW of Network (g) and Load of PE Requires large load per super steps. Need Better Models for Portable Algorithms.
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
Data Structures and Algorithms in Parallel Computing Lecture 1.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Overview Parallel Processing Pipelining
Higher Level Parallelism
Distributed and Parallel Processing
PRAM Model for Parallel Computation
Lecture 23: Interconnection Networks
buses, crossing switch, multistage network.
Pipelining and Retiming 1
Course Outline Introduction in algorithms and applications
Chapter 1.
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Overview Parallel Processing Pipelining
Lecture 22 review PRAM: A model developed for parallel machines
PRAM Algorithms.
A Perspective Hardware and Software
Parallel Programming in C with MPI and OpenMP
PRAM Model for Parallel Computation
PRAM architectures, algorithms, performance evaluation
Guoliang Chen Parallel Computing Guoliang Chen
Data Structures and Algorithms in Parallel Computing
Multiprocessor Introduction and Characteristics of Multiprocessor
Parallel and Distributed Algorithms
CSE838 Lecture notes copy right: Moon Jung Chung
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Part 2: Parallel Models (I)
Unit –VIII PRAM Algorithms.
Module 6: Introduction to Parallel Computing
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Parallel computation models Taxonomy: SIMD MIMD Single Program Multiple Data (SPMD) Communication Models Shared variable communication Shared Memory: PRAM Mode Message passing communication Systolic Array: Regularly connected, special hardware for specific problem. Input is pipelined one by one and synchronized with clock. Interconnection networks Bus, crossbar, tree, Multistage networks, Hypercube, DeBruin's Graph, Cube Connected Cycles Convergence: Logically shared memory, physically interconnection network

Timing Issues Synchronous: Asynchronous (no central clock) Every step is synchronized (central clock) execution results predictable PRAM is a special case Asynchronous (no central clock) execution results : unpredictable example: distributed algorithms Partially Synchronous BSP: Bulk Synchronous Programming LogP model: Latency, Overhead, Gap, and P

LogP Model(Karp93) L: upperbound on the latency incurred in sending a message of a word o: overhead, the length of time that a PE is engaged in the transmission of each message; during this time, the PE cannot perform other operations. g: gap, minimum time interval between consecutive message transmissionor reception. 1/g is per PE communication bandwidth. P: the number of PE/memory modules. Length W messgae: time to reach o+gW+L process the reception: o+gW Network has a finite capacity, at most L/g messages can be in transit from one PE to any other PE. Example of braodcasting 1 to n: PRAM Concurrent Read Model: d unit of time, where d is delay of memory access PRAM Exclusive Read Model: O(d*log_2 n) time LogP Model: O(d*log_d n) time, d=L+o+g

CRCW model variations Same Value (if multiple PE attempt to write, their value should be the same) Priority (highest priority) Random (any can be written)

Example of Shared Memory Algorithm Adding n numbers Max of n numbers O(n) O(logn) O(1) algorithm (CRCW) 1. initially r[i] = 0, 1<= i <= n 2. for all i,j PE[i,j] read a[i] and a[j] 3. for all i,j PE[i,j] set r[i] = 1 if a[i] < a[j] 4. for all i, PE[i] do {if r[i] = 0 them max=a[i]} Applications: Boolean matrix multiplication O(1) time using O(n3) PEs

Work optimal maximum N data, P PEs partition N data so that each PE has N/P data items. find max of each partition find the maximum among PEs.

Simulation of CRCW model using EREW Theorem: each step of Priority CRCW can be simulated by EREW PRAM in O(log p) steps. Proof: Each CRCW step can be simulated by a tournament (of EREW) in O(logn) time.