Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 3, 2005 Session 7
Computer Science and Engineering Contents Abstract Models PRAM Model Complexity Analysis Introduction to Parallel Algorithms Sorting
Computer Science and Engineering What is a Model? According to Webster’s Dictionary, a model is “a description or analogy used to help visualize something that cannot be directly observed.” According to The Oxford English Dictionary, a model is “a simplified or idealized description or conception of a particular system, situation or process.”
Computer Science and Engineering Why Models? In general, the purpose of Modeling is to capture the salient characteristics of phenomena with clarity and the right degree of accuracy to facilitate analysis and prediction. Megg, Matheson and Tarjan (1995)
Computer Science and Engineering Models in Problem Solving Computer Scientists use models to help design problem solving tools such as: Fast Algorithms Effective Programming Environments Powerful Execution Engines
Computer Science and Engineering A model is an interface separating high level properties from low level ones An Interface Applications Architectures Provides operations Requires implementation MODEL
Computer Science and Engineering Models in this class Shared Memory Model Distributed Memory Model
Computer Science and Engineering PRAM Model Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Complexity: T(n), P(n), C(n) Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory
Computer Science and Engineering The PRAM model and its variations (cont.) There are different modes for read and write operations in a PRAM. Exclusive read(ER) Exclusive write(EW) Concurrent read(CR) Concurrent write(CW) Common Arbitrary Minimum Priority Based on the different modes described above, the PRAM can be further divided into the following four subclasses. EREW-PRAM model CREW-PRAM model ERCW-PRAM model CRCW-PRAM model
Computer Science and Engineering Analysis of Algorithms Sequential Algorithms Time Complexity Space Complexity An algorithm whose time complexity is bounded by a polynomial is called a polynomial-time algorithm. An algorithm is considered to be efficient if it runs in polynomial time.
Computer Science and Engineering Analysis of Sequential Algorithms NP P NP-complete NP-hard The relationships among P, NP, NP-complete, NP-hard
Computer Science and Engineering Analysis of parallel algorithm Performance of a parallel algorithm is expressed in terms of how fast it is and how much resources it uses when it runs. Run time, which is defined as the time during the execution of the algorithm Number of processors the algorithm uses to solve a problem The cost of the parallel algorithm, which is the product of the run time and the number of processors
Computer Science and Engineering Analysis of parallel algorithm The NC-class and P-completeness NP P NP-complete NC P-complete NP-hard The relationships among P, NP, NP-complete, NP-hard, NC, and P- complete
Computer Science and Engineering Simulating multiple accesses on an EREW PRAM Broadcasting mechanism: P1 reads x and makes it known to P2. P1 and P2 make x known to P3 and P4, respectively, in parallel. P1, P2, P3 and P4 make x known to P5, P6, P7 and P8, respectively, in parallel. These eight processors will make x know to another eight processors, and so on.
Computer Science and Engineering Simulating multiple accesses on an EREW PRAM (cont.) Simulating Concurrent read on EREW PRAM with eight processors using Algorithm Broadcast_EREW x x x P1 (a) x x x x P2 (b) x x x x x P3 (c) x x x x x x x x x P5 (d) x P4 x P6 x P7 x P8 LLL L
Computer Science and Engineering Parallel Algorithms Constructs Processor Pi Forall Where Do in Parallel Others
Computer Science and Engineering Simulating multiple accesses on an EREW PRAM (cont.) Algorithm Broadcast_EREW Processor P 1 y (in P 1 ’s private memory) x L[1] y for i=0 to log p-1 do forall P j, where 2 i +1 < j < 2 i+1 do in parallel y (in P j ’s private memory) L[j-2 i ] L[j] y endfor
Computer Science and Engineering Enumeration Sort Given a list on n numbers a 1, a 2, …, a n We try to find the position of each element a i in the sorted list by computing the number of elements smaller than it It c i elements are smaller than a i, then it is the (c i +1)th element in the sorted list If 2 or more elements have the same value, the element with the largest index in the unsorted list will be considered the largest in the sorted list.
Computer Science and Engineering Sort-CRCW Assumptions To sort n elements, we use n 2 processors (n rows and n columns) P i,j processor in row i, column j Concurrent write sum of all values A[1..n] array of elements in global memory C[1..n] array to store number of elements smaller than every element in A
Computer Science and Engineering Sort-CRCW Two steps 1. Each row of processors i computes C[i], the number of elements smaller than A[i]. Each processor P i,j compares A[i] and A[j], then updates C[i] appropriately 2. The first row in each P i,1 row places places A[i] in its proper position in the sorted list (C[i] + 1)
Computer Science and Engineering Algorithm Details Detail of two step Algorithm /* step 1 */ forall P i,j, where 1 < i, j<n do in parallel if A[i] > A[j] or (A[i] = A[j] and i > j) then C[i] 1 else C[i] 0 endif endfor /* step 2 */ forall P i,l, where 1 < i<n do in parallel A[C[i] +1] A [i] endfor
Computer Science and Engineering Complexity Run time: T(n) = O(1) Number of processors: P(n) = n 2 Cost: c(n) = O(n 2 ) Is it cost optimal? No! (sequential sort can be done in O(n log n)
Computer Science and Engineering Example: sort (9, 4, 6) P 1,1 P 1,2 P 1,3 649 A = 9 & 99 & 49 & 6 P 2,1 P 2,2 P 2,3 4 & 94 & 44 & 6 P 3,1 P 3,2 P 3,3 6 & 96 & 46 & C = 964 A = Concurrent write SUM T(n) = O(1) P(n) = n 2 C(n) = T(n) * P(n) = O(n 2 )