PRAM and Parallel Computing

Slides:



Advertisements
Similar presentations
1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.
Advertisements

Parallel Algorithms.
PRAM Algorithms Sathish Vadhiyar. PRAM Model - Introduction Parallel Random Access Machine Allows parallel-algorithm designers to treat processing power.
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
PRAM (Parallel Random Access Machine)
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
TECH Computer Science Parallel Algorithms  several operations can be executed at the same time  many problems are most naturally modeled with parallelism.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Simulating a CRCW algorithm with an EREW algorithm Efficient Parallel Algorithms COMP308.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Data Structures Review Session 1
1 Lecture 3 PRAM Algorithms Parallel Computing Fall 2008.
Fall 2008Paradigms for Parallel Algorithms1 Paradigms for Parallel Algorithms.
Basic PRAM algorithms Problem 1. Min of n numbers Problem 2. Computing a position of the first one in the sequence of 0’s and 1’s.
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7.
Parallel Algorithms. Parallel Models u Hypercube u Butterfly u Fully Connected u Other Networks u Shared Memory v.s. Distributed Memory u SIMD v.s. MIMD.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Searching Topics Sequential Search Binary Search.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-1.
1 Algorithms Searching and Sorting Algorithm Efficiency.
16 Searching and Sorting.
Lecture 3: Parallel Algorithm Design
Data Structures I (CPCS-204)
PRAM Model for Parallel Computation
Analysis of Algorithms
COP 3503 FALL 2012 Shayan Javed Lecture 15
Parallel Algorithms (chap. 30, 1st edition)
Lecture 2: Parallel computational models
Randomized Algorithms
PRAM Algorithms.
Computation.
PRAM Model for Parallel Computation
Chapter 3: Principles of Scalable Performance
PRAM architectures, algorithms, performance evaluation
CSE 143 Lecture 5 Binary search; complexity reading:
Algorithm design and Analysis
Parallel and Distributed Algorithms
Bitonic Sorting and Its Circuit Design
Data Structures Review Session
24 Searching and Sorting.
Unit –VIII PRAM Algorithms.
Time Complexity Lecture 14 Sec 10.4 Thu, Feb 22, 2007.
Quick-Sort 2/25/2019 2:22 AM Quick-Sort     2
Parallel Algorithms A Simple Model for Parallel Processing
Low Depth Cache-Oblivious Algorithms
CPS120: Introduction to Computer Science
CENG 351 Data Management and File Structures
CPS120: Introduction to Computer Science
Sum this up for me Let’s write a method to calculate the sum from 1 to some n public static int sum1(int n) { int sum = 0; for (int i = 1; i
Module 6: Introduction to Parallel Computing
CS 583 Analysis of Algorithms
The Selection Problem.
CSE 332: Parallel Algorithms
CS203 Lecture 15.
DIVIDE AND CONQUER.
Presentation transcript:

PRAM and Parallel Computing Jie Liu, Ph.D. Professor Computer Science Division Western Oregon University Monmouth, Oregon, USA liuj@wou.edu

outline The fastest computers The PRAM model The O(1) algorithm that finds the max An elegant parallel merge sorting algorithm A practical parallel sorting algorithm we developed Amdahl’s Law and Gustafson-Barsis’ Law Technologies we should pay attention Q&A session

World top 10 fastest computers

Multi-Core Programming Sequential  Parallel 

PRAM

More About PRAM Each PRAM processor can either Perform the prescribed operation (the same for all processors), Carry out an I/O operation, Idle, or Activate another processor So, it takes n processors to activate another n processors, then we have 2n active processors Now two questions What happens if two processors write to the same memory location? How many steps does it take to activate p processors

Handling Writing Conflicts in PRAM EREW (Exclusive Read Exclusive Write) CREW (Concurrent Read Exclusive Write) CRCW (Concurrent Read Concurrent Write) Common– all the values are the same Arbitrary – pick a value and set it Priority – the processors with the highest priority is the winner A multi-core computer is which one of the above?

Activating n Processors What is the complexity – O(log n)? It forms a binomal tree

Finding Max in a constant time O(1) Input: an array of n integers arrA[0..n-1] Output: the largest of numbers in arrA[0..n-1] Global variable arrB[0..n-1], i, j Assume the computer is a CRCW/Common with n2 CPUs FindignMax(arrA[0..n-1]) { for all where 0 <= i < n-1 arrB[i] = 1 for all where 0 <= i, j < n-1 if (arrA[i] < arrA[j]) arrB[i] = 0 for all where 0 <= i < n-1 if (arrB[i] = 1) return arrA[i] }

Finding Max – how does it work After line 2, every B[i] is 1 for all where 0 <= i < n-1 arrB[i] = 1 for all where 0 <= i, j < n-1 if (arrA[i] < arrA[j]) arrB[i] = 0 for all where 0 <= i < n-1 if (arrB[i] = 1) return arrA[i] Write a 0 to B[i] if A[i] is smaller than any element in A because it is CRCW/Common

Finding Max questions How to do it sequentially, and what is the complexity then? How to do it in parallel, and what is the complexity? How many processors are needed? On the PRAM, what is the min amount of time required to run the algorithm, assuming only is activated? What is the cost? Cost of a parallel algorithm is defined to be the Number of processors X execution time For our algorithm, the cost is O(n2), or even O(n2 log n) while the sequential one is O(n), so ours is NOT cost optimal

Merging Two Sorted Arrays The problem: n is an even number. An array of size n stores two sorted sequence of integers of size n/2, we need to merge the two sorted segment in O(log (n)) steps.

Merging Two Sorted Arrays (2) The sequential approach uses two yardsticks and has no concurrency to exploit Calling for new algorithms Key idea: if we know there are k elements smaller than A[i], we can copy A[i] to A[k] in one step. If i<n/2, then there are i -1 elements smaller than A[i] (assuming array is 1 based). Now how can we find the number of elements in the second half of A that is also smaller than A[i]  binary search (a log (n) algorithm)! The sequential algorithm identifies a spot and find the element to occupy the spot. The parallel algorithm find identifies an element and find the spot it needs to occupy

Merging Two Sorted Arrays In Parallel //A[1] to A[n/2] and A[n/2 +1] to A[n] are two sorted sections MergeArray(A[1..n]) { int x, low, high, index for all where 1 <= i <= n // The lower half search the upper half, the upper half search for the lower half { high = 1 // assuming it is the upper half low = n/2 If i <= (n/2) { high = n low = (n/2) + 1} x = A[i] // perform binary search Repeat { index = If x < A[index] high = index – 1 else low = index + 1 } until low > high A[high + I – n/2] = x

A practical parallel sorting algorithm Sorting on a real shared memory parallel computers has its uniqueness The entire array is accessible The number of processors is much much less than the number of elements Generally there must be some partitioning of data Data move distance is irrelevant to costs We developed a practical algorithms also used this “Move to” idea My students called our algorithm the Jie-Sort, I call it the J-Sort

J-Sort through an example The array 5 17 42 3 9 22 51 26 15 32 19 99 Marking S1 1 Prefix Sum S1 2 4 6 7 Marking S2 Prefix Sum S2 Partitioned Array 52 What if you have only 4 processors

When fix the number of processors First, divide in to the p = 4 chunks Find sizes of the S1 and S2 for each chunk Perform prefix sum on size arrays, Copy the elements

J-Sort is cost optimal We proved that Which means the cost is O(n log n), which is the cost of merge sort, and the lower bound of comparison bases sorting algorithms, so J-Sort is cost optimal!!!

J-Sort Perform

Amdahl’s Law and Gustafson-Barsis’ Law Amdahl’s Law: Let s be the fraction of operations in a computation that must be performed sequentially, where 0≤ s ≤ 1. The maximum speedup  achievable by a parallel computer with p processors performing the computation is Gustafson-Barsis’s Law: Given a parallel program solving a problem using p processors, let s denote the fraction of the total execution performed sequentially. The maximum speedup  achievable by this program is These two laws contradict with each other. How can we explain this contradiction?

10 technologies we should pay attention 5G + Cloud + “As a service” model Big Data/BI/ML + deep learning DBMS for analytics Autonomous Vehicles Block chain Artificial Intelligence Virtual & Augmented Reality Internet of Things Parallel Processing Mobile software development Android surpassed Windows and is the most popular OS