Download presentation
Presentation is loading. Please wait.
1
Sieve of Eratosthenes
2
Sieve of Eratosthenes The Sieve of Eratosthenes is an algorithm to find the prime numbers between 2 and n Start with an array of booleans from 2 to n initially set to all true For each known prime starting with 2, mark all the multiples (composites) of that prime Stop when the next prime > √n What is left unmark are the primes
3
Sieve of Eratosthenes Next Prime = 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
4
Sieve of Eratosthenes Next Prime = 3 Next Prime = 5 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
5
Sieve of Eratosthenes Next Prime = 7 Next Prime = 11
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 11 112=121 > 65 so stop
6
Sequential Sieve of Eratosthenes Algorithm (pseudo-code)
Sieve_Eratosthenes(int n) { boolean marked [n] = { true, ... }; prime = 2; while (prime * prime < n) { // or prime < sqrt(n) composite = prime * prime; // start with prime^2 while (composite < n) { marked[composite] = false; composite = composite + prime; // multiples of prime } do { // find next prime prime++; } while (marked[prime]);
7
Sequential Complexity
The outermost loop will iterate at most √n times The 1st inner loop could iterate up to n/2 times The 2nd loop will iterate √n times total (for all iterations of the outermost loop) Complexity =
8
Parallelizing Sieve of Eratosthenes
An obvious approach is to parallelize the loop marking multiples of prime composite = prime * prime; while (composite < n) { marked[composite] = false; composite = composite + prime; } We can rewrite this as a for loop: for (composite=prime*prime; composite < n; composite += prime) {
9
Parallelizing Sieve of Eratosthenes
Now we have a Scatter/Gather pattern We only need to broadcast n We don’t really need to broadcast the initial array “marked” because all processors can initialize it The difficulty is in selecting the next prime We have to do a reduction and broadcast of the marked array so all processors can continue updating the array
10
Parallelizing Sieve of Eratosthenes
Sieve_Eratosthenes(int n) { boolean *marked; #prgama paraguin begin_parallel #pragma paraguin bcast n // Allocate an array of n ints marked = malloc (n * sizeof(int)); for (int i = 0; i < n; i++) marked[i] = 1; // True prime = 2; while (prime * prime < n) {
11
Parallelizing Sieve of Eratosthenes
composite=prime*prime; numIterations = n / prime – prime; #pragma paraguin forall for (i = 0; i < numInterations; i++) { marked[composite] = false; composite += prime } #pragma paraguin reduce land marked temp // copy temp back into marked #pragma paraguin bcast marked( n ) do { prime++; } while (marked[prime]); We need the loop to be a simple loop
12
Parallelizing Sieve of Eratosthenes
A better approach is to partition the array instead of the loop With p processors, each processor is responsible for elements of the array It is the master process’ responsibility to determine the next prime This need to be broadcast, but we can eliminate the reduction Also, only a single integer needs to be broadcast instead of an array
13
Sieve of Eratosthenes √n primes will be used.
These need to be within the master process’ region. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 p 0 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 p 1 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 p 2 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 p 3
14
Parallelizing Sieve of Eratosthenes
First we need to make sure the master process will compute √n values The master process needs to be able to determine all values used for prime #pragma paraguin begin_parallel proc0_size = (n-1)/__guin_NP; if (2 + proc0_size < (int) sqrt((double) n)) { if (__guin_rank == 0) printf (“Too many processors\n”); return 1; }
15
Parallelizing Sieve of Eratosthenes
low_value = (__guin_rank*n-1)/__guin_NP; high_value = ((rank+1)*n-1)/__guin_NP; prime = 2; do { if (low_value % prime == 0) first = low_value; else first = low_value + prime – (low_value%prime); for (i=first + prime; i < high_value; i += prime) marked[i] = false; Partition the array: each processor is responsible for values from low to high Compute the first multiple of prime greater to or equal to low_value Mark multiple of prime in each processor’s range
16
Parallelizing Sieve of Eratosthenes
if (__guin_rank == 0) { do { prime++; } while (marked[prime]); } #pragma paraguin bcast prime } while (prime * prime < n); #pragma paraguin gather marked( n ) Master determines next prime and broadcasts it. No reduction is needed until we finish the computation
17
Parallel Complexity The outermost loop will iterate √n times
The 1st inner loop could iterate up to n/2p times with p processors The 2nd loop will iterate √n times total (for all iterations of the outermost loop) The broadcast take log(p) time Complexity = Communication Computation
18
Improving the Parallel Algorithm
Still we have a broadcast within the outermost loop How can we eliminate that? Can we have all processors determine what the next prime is?
19
Improving the Parallel Algorithm
The number of primes used in the Sieve of Eratosthenes is √n All processors compute the primes from 2 … √n Now all processors have their own private copy of the primes used We can eliminate the broadcast As well as the requirement that the master process’ section be at least √n
20
Complexity The complexity is essentially the same for most processors, except: Communication is eliminated until the end There is added complexity to compute the first √n primes sequentially Complexity to compute the first √n primes:
21
Complexity Final Complexity: Implementation left as an exercise
22
Questions
23
Discussion Question Question: Would it be better to implement this algorithm in shared-memory (using OpenMP) than distributed-memory (using MPI)?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.