Presentation is loading. Please wait.

Presentation is loading. Please wait.

PRAM and Parallel Computing

Similar presentations


Presentation on theme: "PRAM and Parallel Computing"— Presentation transcript:

1 PRAM and Parallel Computing
Jie Liu, Ph.D. Professor Computer Science Division Western Oregon University Monmouth, Oregon, USA

2 outline The fastest computers The PRAM model
The O(1) algorithm that finds the max An elegant parallel merge sorting algorithm A practical parallel sorting algorithm we developed Amdahl’s Law and Gustafson-Barsis’ Law Technologies we should pay attention Q&A session

3 World top 10 fastest computers

4 Multi-Core Programming
Sequential  Parallel 

5 PRAM

6 More About PRAM Each PRAM processor can either
Perform the prescribed operation (the same for all processors), Carry out an I/O operation, Idle, or Activate another processor So, it takes n processors to activate another n processors, then we have 2n active processors Now two questions What happens if two processors write to the same memory location? How many steps does it take to activate p processors

7 Handling Writing Conflicts in PRAM
EREW (Exclusive Read Exclusive Write) CREW (Concurrent Read Exclusive Write) CRCW (Concurrent Read Concurrent Write) Common– all the values are the same Arbitrary – pick a value and set it Priority – the processors with the highest priority is the winner A multi-core computer is which one of the above?

8 Activating n Processors
What is the complexity – O(log n)? It forms a binomal tree

9 Finding Max in a constant time O(1)
Input: an array of n integers arrA[0..n-1] Output: the largest of numbers in arrA[0..n-1] Global variable arrB[0..n-1], i, j Assume the computer is a CRCW/Common with n2 CPUs FindignMax(arrA[0..n-1]) { for all where 0 <= i < n-1 arrB[i] = 1 for all where 0 <= i, j < n-1 if (arrA[i] < arrA[j]) arrB[i] = 0 for all where 0 <= i < n-1 if (arrB[i] = 1) return arrA[i] }

10 Finding Max – how does it work
After line 2, every B[i] is 1 for all where 0 <= i < n-1 arrB[i] = 1 for all where 0 <= i, j < n-1 if (arrA[i] < arrA[j]) arrB[i] = 0 for all where 0 <= i < n-1 if (arrB[i] = 1) return arrA[i] Write a 0 to B[i] if A[i] is smaller than any element in A because it is CRCW/Common

11 Finding Max questions How to do it sequentially, and what is the complexity then? How to do it in parallel, and what is the complexity? How many processors are needed? On the PRAM, what is the min amount of time required to run the algorithm, assuming only is activated? What is the cost? Cost of a parallel algorithm is defined to be the Number of processors X execution time For our algorithm, the cost is O(n2), or even O(n2 log n) while the sequential one is O(n), so ours is NOT cost optimal

12 Merging Two Sorted Arrays
The problem: n is an even number. An array of size n stores two sorted sequence of integers of size n/2, we need to merge the two sorted segment in O(log (n)) steps.

13 Merging Two Sorted Arrays (2)
The sequential approach uses two yardsticks and has no concurrency to exploit Calling for new algorithms Key idea: if we know there are k elements smaller than A[i], we can copy A[i] to A[k] in one step. If i<n/2, then there are i -1 elements smaller than A[i] (assuming array is 1 based). Now how can we find the number of elements in the second half of A that is also smaller than A[i]  binary search (a log (n) algorithm)! The sequential algorithm identifies a spot and find the element to occupy the spot. The parallel algorithm find identifies an element and find the spot it needs to occupy

14 Merging Two Sorted Arrays In Parallel
//A[1] to A[n/2] and A[n/2 +1] to A[n] are two sorted sections MergeArray(A[1..n]) { int x, low, high, index for all where 1 <= i <= n // The lower half search the upper half, the upper half search for the lower half { high = 1 // assuming it is the upper half low = n/2 If i <= (n/2) { high = n low = (n/2) + 1} x = A[i] // perform binary search Repeat { index = If x < A[index] high = index – 1 else low = index + 1 } until low > high A[high + I – n/2] = x

15 A practical parallel sorting algorithm
Sorting on a real shared memory parallel computers has its uniqueness The entire array is accessible The number of processors is much much less than the number of elements Generally there must be some partitioning of data Data move distance is irrelevant to costs We developed a practical algorithms also used this “Move to” idea My students called our algorithm the Jie-Sort, I call it the J-Sort

16 J-Sort through an example
The array 5 17 42 3 9 22 51 26 15 32 19 99 Marking S1 1 Prefix Sum S1 2 4 6 7 Marking S2 Prefix Sum S2 Partitioned Array 52 What if you have only 4 processors

17 When fix the number of processors
First, divide in to the p = 4 chunks Find sizes of the S1 and S2 for each chunk Perform prefix sum on size arrays, Copy the elements

18 J-Sort is cost optimal We proved that
Which means the cost is O(n log n), which is the cost of merge sort, and the lower bound of comparison bases sorting algorithms, so J-Sort is cost optimal!!!

19 J-Sort Perform

20 Amdahl’s Law and Gustafson-Barsis’ Law
Amdahl’s Law: Let s be the fraction of operations in a computation that must be performed sequentially, where 0≤ s ≤ 1. The maximum speedup  achievable by a parallel computer with p processors performing the computation is Gustafson-Barsis’s Law: Given a parallel program solving a problem using p processors, let s denote the fraction of the total execution performed sequentially. The maximum speedup  achievable by this program is These two laws contradict with each other. How can we explain this contradiction?

21 10 technologies we should pay attention
5G + Cloud + “As a service” model Big Data/BI/ML + deep learning DBMS for analytics Autonomous Vehicles Block chain Artificial Intelligence Virtual & Augmented Reality Internet of Things Parallel Processing Mobile software development Android surpassed Windows and is the most popular OS


Download ppt "PRAM and Parallel Computing"

Similar presentations


Ads by Google