Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instructor Neelima Gupta Table of Contents Parallel Algorithms.

Similar presentations


Presentation on theme: "Instructor Neelima Gupta Table of Contents Parallel Algorithms."— Presentation transcript:

1 Instructor Neelima Gupta ngupta@cs.du.ac.in

2 Table of Contents Parallel Algorithms

3 Thanks to: Tejinder Kaur (35, MCS '09) Instructor: Ms Neelima Gupta

4 Thanks to: Tejinder Kaur (35, MCS '09)  Solving a problem on multiple processors.  S(n) is sequential time to solve a problem.  T(n,p) is the parallel time to solve a problem on p processors.  W(n) is the work done by a parallel algorithm. W(n)=T(n,p) p  A parallel algorithm is optimal if the work done is best of known sequential algorithm. i.e. if W(n)=S(n)  Speed up is how much time is gained by using more processors. speed up = S(n)/T(n,p)

5 Thanks to: Tejinder Kaur (35, MCS '09) Take a problem of computing sum of numbers. Sequential time = Θ(n) We have 2 processors p1 and p2 and the numbers are 2,3,4,5,1,11,13,10,7,8 Initially all the numbers are with p1 and it sends half of them to p2. Both p1 and p2 compute sums and send the sums s1 and s2 to each other. So both have the final sum. p1 p2 2,3,4,5,1 11,13,10,7,8 (s1+s2) (s1+s2) Communication time= Θ(1) Computation time= Θ(n/2) T(n,2)= Θ(n/2) W(n)= n/2 2 =n Hence this algorithm is optimal. Speed up = n/ n = 2 2

6 Thanks to: Tejinder Kaur (35, MCS '09) PARALLEL MODELS Distributed Computing Several independent machines are there.They communicate with each oher by passing messages.The final result comes from all independent machines. M1M2 M3 M5 M4

7 Thanks to: Tejinder Kaur (35, MCS '09) SHARED MEMORY MODEL  All the processors are reading and writing to the same memory.  There is no communication between them.  Can not write at same time but can read at same time. Shared memory Shared memory p1 p2 p3 pn

8 Thanks to: Tejinder Kaur (35, MCS '09) Models for concurrency in shared memory model EREW(Exclusive read exclusive write) CREW(Concurrent read exclusive write) CRCW(Concurrent read Concurrent Write) The weakest is EREW. CREW is Better than EREW but weaker than CRCW. If we go from CRCW to CREW there is a slowdown of factor of log(n).

9 CRCW Model wrt searching Made By : Deepika Kamboj ( Roll No.7, MSc '11 ) CRCW Common All processors should write same value Arbitrary Processors can write different values but Any one value gets written Priority Processors can write different values but the processor given the highest priority gets to write its value

10 Searching for a key Key = x1 x2 xn x3 p1p pnpp3p p2p x1= xn= x3== x2== …….… 0 COMPARISON OUTPUT Thanks to 'PREETI' xi …….… ….….… …….… pip xi==

11 CRCW Key = x1 x2 xn x3 p1p pnpp3p p2p x1= xn= x3== x2== …….… 0 COMPARISON OUTPUT Thanks to 'PREETI' xi …….… ….….… …….… pip xi== Match Match found

12 CRCW Key = x1 x2 xn x3 p1p pnpp3p p2p x1= xn= x3== x2== …….… 1 COMPARISON OUTPUT Thanks to 'PREETI' xi …….… ….….… …….… pip xi== Match Match found

13 VERSION 1 OF SEARCHING To find the existence of the given KEY. MODEL used CRCW Common Priority Arbitrary Thanks to 'PREETI'

14 example for version1 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

15 example for version1 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

16 example for version1 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 1 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

17 VERSION 2 OF SEARCHING To find the processor id. MODEL used CRCW Common Priority Arbitrary Thanks to 'PREETI'

18 example for version2 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

19 example for version2 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

20 example for version2 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 p5 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 P2 or p5 gets written Thanks to 'PREETI'

21 Or Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 p2 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 P2 or p5 gets written Thanks to 'PREETI'

22 VERSION 3 OF SEARCHING To find the LEFT MOST OCCURRENCE of the given KEY. MODEL used CRCW Common Arbitrary Priority × Thanks to 'PREETI'

23 example for version3 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

24 example for version3 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 0 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 Thanks to 'PREETI'

25 example for version3 Key = 7 12 7 30 22 p1p p6p p3p p2p 12≠7 p2 COMPARISON OUTPUT 15 7 p4p p5p 30≠7 7=7 15≠7 22≠7 7=7 P2 has highest priority. Thanks to 'PREETI'

26 Thanks to: Tejinder Kaur (35, MCS '09) SUM PROBLEM Find sum of n numbers and there are n processors. n processors n/2 processors n/4 processors 1 processor a1a1 a2a2 a3a3 a4a4 anan

27 Thanks to: Tejinder Kaur (35, MCS '09) Height of this tree is log n. Each step is taking constant time. Hence this algo takes O(log n) time. W(n)= n log n= nlogn. Speed up=n/log n. This algorithm is not optimal as half of the processors are idle in first step and number of idle processors is increasing in further steps. What if we use n/log n processors.

28 Thanks to: Tejinder Kaur (35, MCS '09) As the number of processors is n/log n.Each processor will get log n values. s1 s2 sm Take m=n/log n Each processor has n/log n values so sm sums will be generated.

29 Thanks to: Tejinder Kaur (35, MCS '09) The height is log m. So it will take log m time <= log n So T(n,p) <= 2logn = O(log n) W(n)= n=O(S(n)) As sequential time is O(n). Hence this algorithm is optimal.

30 Thanks to: Tejinder Kaur (35, MCS '09) SORTING Sort n numbers in parallel with n processors. Initially each procesor has an element. n/2,2 merge n/4,4 merge 1,n merge a1a1 a2a2 a3a3 a4a4 anan

31 Thanks to: Tejinder Kaur (35, MCS '09) The last step will take n units of time n + n/2 + n/4 + - - - - - - + 2 <= 2n So it takes O(n) time. W(n)= n 2

32 Thanks to: Surbhi Tripathi (27, MCS '09) Instructor: Ms Neelima Gupta

33 Thanks to: Surbhi Tripathi (27, MCS '09) Definition: Prefix Sums Given: Set of n values A = { a0,a1…….,a n-1 } We want to find the prefix sums S0, S1,………..S n-1. Where, S0=a0 S1=a1+a0 | | S n-1 =a n-1 +…………+a1+a0

34 Thanks to: Surbhi Tripathi (27, MCS '09) STEP - II a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 P1:s1P2:a2oa3 P3:a4oa5P4:a6oa7 p2: s3(s1oa2oa3) p3:a4oa5oa6 p4:a4oa5oa6oa7 p1: s2(s1oa2)

35 Thanks to: Surbhi Tripathi (27, MCS '09) STEP - III a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 P1:s1 P2:a2oa3 P3:a4oa5P4:a6oa7 p2: s3(s1oa2oa3) p3:a4oa4oa6 p4:a4oa5oa6oa7 p1: s2(s1oa2) p1=s4 (s3oa4) p4: s7 (s3oa4oa5oa6oa7) p3: s6 (s3oa4oa5oa6) p2: s5 (s3oa4oa5)

36 Thanks to: Surbhi Tripathi (27, MCS '09) CREW Model Computations of prefix sums do not require any concurrent writes.

37 Thanks to: Surbhi Tripathi (27, MCS '09) TIME COMPLEXITY To compute prefix sums of n numbers As,the number of prefix sums computed doubles at each step.While computing n prefix sums we get a tree of height log n. Each step takes constant time. So, computing n prefix sums using n processors in parallel takes log n time


Download ppt "Instructor Neelima Gupta Table of Contents Parallel Algorithms."

Similar presentations


Ads by Google