Sieve of Eratosthenes.

Slides:



Advertisements
Similar presentations
Sorting algorithms Sieve of Eratosthenes
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
Parallel Programming in C with MPI and OpenMP
Parallel Sorting – Odd-Even Sort David Monismith CS 599 Notes based upon multiple sources provided in the footers of each slide.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Unit 171 Algorithms and Problem Solving - II Algorithm Efficiency Primality Testing Improved Primality Testing Sieve of Eratosthenes Primality Testing.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
Prime numbers Jordi Cortadella Department of Computer Science.
Searching. Linear (Sequential) Search Search an array or list by checking items one at a time. Linear search is usually very simple to implement, and.
LAB#7. Insertion sort In the outer for loop, out starts at 1 and moves right. It marks the leftmost unsorted data. In the inner while loop, in starts.
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
1. Create list of unmarked natural numbers 2, 3, …, n 2. k  2 3. Repeat: (a) Mark all multiples of k between k 2 and n (b) k  smallest unmarked number.
Sieve of Eratosthenes. The Sieve of Eratosthenes is a method that.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Computer Science 320 A First Program in Parallel Java.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Using Compiler Directives Paraguin Compiler 1 © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 310 session2a.ppt Modification date: Jan 9, 2013.
Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.
Sieve of Eratosthenes Quiz questions ITCS4145/5145, Parallel Programming Oct 24, 2013.
Searching and Sorting Searching algorithms with simple arrays
Introduction to OpenMP
Hybrid Parallel Programming with the Paraguin compiler
Sieve of Eratosthenes.
Introduction to OpenMP
Paraguin Compiler Examples.
Algorithm Analysis CSE 2011 Winter September 2018.
Computer Programming Fundamentals
MODIFIED SIEVE OF ERATOSTHENES
Parallel Sorting Algorithms
Parallel Graph Algorithms
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Parallel Programming with MPI and OpenMP
Design and Analysis of Prime Number Sieves
Using compiler-directed approach to create MPI code automatically
Describing algorithms in pseudo code
Paraguin Compiler Examples.
Decomposition Data Decomposition Functional Decomposition
Using compiler-directed approach to create MPI code automatically
Hybrid Parallel Programming
AP Java Warm-up Boolean Array.
Pipelined Computations
Paraguin Compiler Communication.
Paraguin Compiler Version 2.1.
Paraguin Compiler Examples.
Paraguin Compiler Version 2.1.
Coding Concepts (Basics)
Pipelined Pattern This pattern is implemented in Seeds, see
Parallel Sorting Algorithms
Hybrid Parallel Programming
Using compiler-directed approach to create MPI code automatically
Hybrid Parallel Programming
Introduction to OpenMP
Sieve of Eratosthenes The Sieve of Eratosthenes uses a bag to find all primes less than or equal to an integer value n. Begin by creating a bag an inserting.
Patterns Paraguin Compiler Version 2.1.
COMP108 Algorithmic Foundations Searching
Common Array Algorithms
Sieve of Eratosthenes short demonstration
Parallel Graph Algorithms
Hybrid Parallel Programming
COMP108 Algorithmic Foundations Searching
Quiz Questions How does one execute code in parallel in Paraguin?
Presentation transcript:

Sieve of Eratosthenes

Sieve of Eratosthenes The Sieve of Eratosthenes is an algorithm to find the prime numbers between 2 and n Start with an array of booleans from 2 to n initially set to all true For each known prime starting with 2, mark all the multiples (composites) of that prime Stop when the next prime > √n What is left unmark are the primes

Sieve of Eratosthenes Next Prime = 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

Sieve of Eratosthenes Next Prime = 3 Next Prime = 5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

Sieve of Eratosthenes Next Prime = 7 Next Prime = 11 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Next Prime = 11 112=121 > 65 so stop

Sequential Sieve of Eratosthenes Algorithm (pseudo-code) Sieve_Eratosthenes(int n) { boolean marked [n] = { true, ... }; prime = 2; while (prime * prime < n) { // or prime < sqrt(n) composite = prime * prime; // start with prime^2 while (composite < n) { marked[composite] = false; composite = composite + prime; // multiples of prime } do { // find next prime prime++; } while (marked[prime]);

Sequential Complexity The outermost loop will iterate at most √n times The 1st inner loop could iterate up to n/2 times The 2nd loop will iterate √n times total (for all iterations of the outermost loop) Complexity =

Parallelizing Sieve of Eratosthenes An obvious approach is to parallelize the loop marking multiples of prime composite = prime * prime; while (composite < n) { marked[composite] = false; composite = composite + prime; } We can rewrite this as a for loop: for (composite=prime*prime; composite < n; composite += prime) {

Parallelizing Sieve of Eratosthenes Now we have a Scatter/Gather pattern We only need to broadcast n We don’t really need to broadcast the initial array “marked” because all processors can initialize it The difficulty is in selecting the next prime We have to do a reduction and broadcast of the marked array so all processors can continue updating the array

Parallelizing Sieve of Eratosthenes Sieve_Eratosthenes(int n) { boolean *marked; #prgama paraguin begin_parallel #pragma paraguin bcast n // Allocate an array of n ints marked = malloc (n * sizeof(int)); for (int i = 0; i < n; i++) marked[i] = 1; // True prime = 2; while (prime * prime < n) {

Parallelizing Sieve of Eratosthenes composite=prime*prime; numIterations = n / prime – prime; #pragma paraguin forall for (i = 0; i < numInterations; i++) { marked[composite] = false; composite += prime } #pragma paraguin reduce land marked temp // copy temp back into marked #pragma paraguin bcast marked( n ) do { prime++; } while (marked[prime]); We need the loop to be a simple loop

Parallelizing Sieve of Eratosthenes A better approach is to partition the array instead of the loop With p processors, each processor is responsible for elements of the array It is the master process’ responsibility to determine the next prime This need to be broadcast, but we can eliminate the reduction Also, only a single integer needs to be broadcast instead of an array

Sieve of Eratosthenes √n primes will be used. These need to be within the master process’ region. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 p 0 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 p 1 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 p 2 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 p 3

Parallelizing Sieve of Eratosthenes First we need to make sure the master process will compute √n values The master process needs to be able to determine all values used for prime #pragma paraguin begin_parallel proc0_size = (n-1)/__guin_NP; if (2 + proc0_size < (int) sqrt((double) n)) { if (__guin_rank == 0) printf (“Too many processors\n”); return 1; }

Parallelizing Sieve of Eratosthenes low_value = (__guin_rank*n-1)/__guin_NP; high_value = ((rank+1)*n-1)/__guin_NP; prime = 2; do { if (low_value % prime == 0) first = low_value; else first = low_value + prime – (low_value%prime); for (i=first + prime; i < high_value; i += prime) marked[i] = false; Partition the array: each processor is responsible for values from low to high Compute the first multiple of prime greater to or equal to low_value Mark multiple of prime in each processor’s range

Parallelizing Sieve of Eratosthenes if (__guin_rank == 0) { do { prime++; } while (marked[prime]); } #pragma paraguin bcast prime } while (prime * prime < n); #pragma paraguin gather marked( n ) Master determines next prime and broadcasts it. No reduction is needed until we finish the computation

Parallel Complexity The outermost loop will iterate √n times The 1st inner loop could iterate up to n/2p times with p processors The 2nd loop will iterate √n times total (for all iterations of the outermost loop) The broadcast take log(p) time Complexity = Communication Computation

Improving the Parallel Algorithm Still we have a broadcast within the outermost loop How can we eliminate that? Can we have all processors determine what the next prime is?

Improving the Parallel Algorithm The number of primes used in the Sieve of Eratosthenes is √n All processors compute the primes from 2 … √n Now all processors have their own private copy of the primes used We can eliminate the broadcast As well as the requirement that the master process’ section be at least √n

Complexity The complexity is essentially the same for most processors, except: Communication is eliminated until the end There is added complexity to compute the first √n primes sequentially Complexity to compute the first √n primes:

Complexity Final Complexity: Implementation left as an exercise

Questions

Discussion Question Question: Would it be better to implement this algorithm in shared-memory (using OpenMP) than distributed-memory (using MPI)?