1. Objective Sorting is used in human activities and devices like personal computers, smart phones, and it continues to play a crucial role in the development.

Slides:



Advertisements
Similar presentations
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Advertisements

Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Reference: Message Passing Fundamentals.
1 Characterizing the Sort Operation on Multithreaded Architectures Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad* The Advanced Computer Architecture.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Computer and Programming. Computer Basics: Outline Hardware and Memory Programs Programming Languages and Compilers.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Quicksort This is probably the most popular sorting algorithm. It was invented by the English Scientist C.A.R. Hoare It is popular because it works well.
Parallel Computing Presented by Justin Reschke
Sorting – Lecture 3 More about Merge Sort, Quick Sort.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
CHAPTER 1 INTRODUCTION TO OPERATING SYSTEMS
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
These slides are based on the book:
Advanced Sorting 7 2  9 4   2   4   7
Chapter 23 Sorting CS1: Java Programming Colorado State University
Advanced Computer Systems
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 4: Threads.
Parallel Programming By J. H. Wang May 2, 2017.
Computer Engg, IIT(BHU)
The University of Adelaide, School of Computer Science
Parallel Density-based Hybrid Clustering
Constructing a system with multiple computers or processors
Introduction to Operating System (OS)
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Chapter 7 Sorting Spring 14
Implementation and Experimentation of Producer- Consumer Synchronization Problem 呂鴻洋 Introduction Producer-consumer problem is one classical.
Database Performance Tuning and Query Optimization
Parallel Sorting Algorithms
CSE 143 Lecture 23: quick sort.
Chapter 4: Threads.
Objectives At the end of the class, students are expected to be able to do the following: Understand the purpose of sorting technique as operations on.
Chapter 4: Threads.
Objective of This Course
Unit-2 Divide and Conquer
Programming Languages
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Fast Communication and User Level Parallelism
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Parallel Sorting Algorithms
Sub-Quadratic Sorting Algorithms
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Subject Name: Operating System Concepts Subject Number:
Outline Chapter 2 (cont) OS Design OS structure
EE 312 Software Design and Implementation I
Chapter 11 Database Performance Tuning and Query Optimization
Java Programming Introduction
Chapter 4: Threads & Concurrency
Analysis of Algorithms
Copyright © Aiman Hanna All rights reserved
LO2 – Understand Computer Software
Parallel Programming in C with MPI and OpenMP
COMP755 Advanced Operating Systems
Types of Parallel Computers
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Parallel Programming Project Parallel Implementation and Evaluation of QuickSort using Open MPI By : Soulemane Moumie

1. Objective Sorting is used in human activities and devices like personal computers, smart phones, and it continues to play a crucial role in the development of recent technologies. The QuickSort algorithm has been known as one of the fastest and most efficient sorting algorithm. It was invented by C.A.R Hoare in 1961and is using the divide-and-conquer strategy for solving problems [3]. Its partitioning aspects make QuickSort amenable to parallelization using task parallelism. MPI is a message passing interface library allowing parallel computing by sending codes to multiple processors, and can therefore be easily used on most multi-core computers available today. The main aim of this study is to implement the QuickSort algorithm using the Open MPI library and therefore compare the sequential with the parallel execution.

Presentation Plan 2. Related Works 3. Theory of Experiment 4. Algorithm and Implementation of Quicksort using MPI 5. Experimental Setup and Results 6. Conclusion 7. Future Works References

2. Related Works Existing implementations: OpenMP, Pthreads, OpenMPI and more etc… in [9]. Here authors have used OpenMP and its simplicity to considerably reduce the execution time of Quicksort in parallel compared to its sequential version. In this study the unpredictability of the execution time in parallel version appears to be the main problem. Another similar experiment [7] on parallel Quicksort is carried out using multiprocessors on clusters by OpenMPI and Pthreads highlighting different benchmarking and optimization techniques. The same implementation of the algorithm done in [7] is used in [8] this time focusing merely on performance analysis. In short [7, 8] present almost a common implementation of parallel Quicksort and MPI with a tree based merging. We have used the MPI collective routine and an additional sorting step to replace the merge functionality using the C programming language. 9 = Prifti, Bala and others 7 = Kataria 8 = Perera In the past they have been multiple parallel implementations of QS with different libraries : mp, pthreads, mpi. Last year prifti and other took advantage of the simplicity on openMP, to parallelize the QS and their main problem was the unpredictability of the execution time. Earlier Kataria used multiprocessors on clusters by mpi and pthreads to highlight different benchmarking and optimization technique on QS. Later on Perera uses the same implementation to analyze the performance. Both of these last, have a parallel implementation of QS with a tree based merging approach. However in this experiment we use mpi collective routine and an additional sorting step to simulate the merging, in C.

3. Theory of Experiment

3.a. Basic idea behind Quicksort Given an initial array of element to be sorted Choose any element of the array to be the pivot Divide all other elements (except the pivot) into two partitions All elements less than the pivot must be in the first partition All elements greater than the pivot must be in the second partition Use recursion to sort both partitions / Stack Join the first sorted partition, the pivot and the second partition The output is a sorted array. Quicksort uses a divide and conquer paradign.

3.b. Pseudocode (1) Rosetta code, available at http://rosettacode.org/wiki/Sorting_algorithms/Quicksort

3.c. Time complexity Running times can be influenced by the inputs distribution (uniform, sorted or semi-sorted, unsorted, duplicates) and the choice of the pivot element. It will take the QuickSort an average running time of O(NlogN). The worst-case running time for QuickSort on N items is O(N2) Many studies have revealed that: The worst-case running time for QuickSort will occur when the pivot is a unique minimum or maximum element. These different running times can be influenced by the inputs distribution (uniform, sorted or semi-sorted, unsorted, duplicates) and the choice of the pivot element. * P. Tsigas and Y. Zhang. A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on SUN Enterprise 10000

3.d. Open MPI OpenMPI is the library used for parallelizing the QuickSort. Learning message passing interface (MPI) allows us to strengthen fundamentals knowledge on parallel programming, given that MPI is lower level than equivalent libraries (OpenMP). The basic idea behind MPI is that messages can be passed or exchanged among different processes in order to perform a given task. An illustration can be a communication and coordination by a master process which split a huge task into chunks and share them to its slave processes. As elaborated in [4], MPI has two types of communication routines: point-to-point communication routines and collective communication routines. Collective routines as explained in the implementation section have been used in this study. MPI give a more granular control to programmer than other equivalent libraries

4. Algorithm and Implementation

4. Implementation (1)

4. Implementation (2)

4. Implementation (3)

4. Implementation(4) Recursive Quicksort Iterative Quicksort Compile & Run

5. Experimental Setup and Results Hardware specifications Software specifications Installation Problem Results Visualization

5.a. Hardware specifications Computer : ProBook-4440s Memory : 3,8 GiB Processor : Intel® Core™ i5-3210M CPU @ 2.50GHz × 4 Graphics : Intel® Ivybridge Mobile x86/MMX/SSE2 OS type : 32-bit Disk : 40,1 GB 5.a. Hardware specifications

5.b. Software specifications Operating system : Ubuntu 14.04 LTS Open MPI : 1.10.1 released Compilers : gcc version 4.8.4 Microsoft Excel 2007 for plotting

5.c. Installation Cygwin Virtual Machine Linux (Ubuntu)

5.d. Problems This project was initially set to be done on Microsoft windows using Cygwin. After updating CygWin we faced an unresolved issue and critical error, the system repeatedly crashed . A sample screenshot of error message is shown below. Running on Virtual machine, Open MPI on CygWin was less user-friendly and sometimes extremely slow due to the limited hardware configuration. It became better after we installed CygWin on Ubuntu, with Ubuntu a normal operating system. Regarding our current implementation of QuickSort using open MPI, there exist certain undesirable features and limitations opening new scope for future works

5.e. Results

5.f. Visualization

5.f. Visualization

5.f. Visualization Sample output file

7. Conclusion To conclude this project, we have successfully implemented the sequential QuickSort algorithm both the recursive and iterative version using the C programming language. Then we have done a parallel implementation using the Open MPI library. Through this project we have explored different aspects of MPI and the QuickSort algorithm as well as their respective limitations. This study has revealed that in general the sequential QuickSort is faster on small inputs while the parallel QuickSort excels on large inputs. The study also shows that in general the iterative version of QuickSort perform slightly better than the recursive version. The study reveals certain undesirable features and limitations and due to the time constraints they could not be overcome, the next section opens an eventual scope for future improvements.

8. Future Works This parallel implementation of QuickSort using MPI can be expanded or explored by addressing its current limitations in the following ways: Instead of the input size being always divisible by the number of processes, it can be made more dynamic. Collective communication routines can be replaced by point-to-point communication routines. Gathering locally sorted data and performing a final sort can just be replaced with a merge function. Minimizing some variation observed in the execution time. This proposed algorithm can be re-implemented with other programming language supporting open MPI like C++, Java or Fortran.

References [1] Rosetta code, available at http://rosettacode.org/wiki/Sorting_algorithms/Quicksort , December 2015. [2] P. Tsigas and Y. Zhang. A Simple, Fast Parallel Implementation of Quicksort and its Performance Evaluation on SUN Enterprise 10000. In Proceedings of the Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing (Euro-PDP’03), 2003, IEEE. [3] HOARE, C. A. R. 1961. Algorithm 64: Quicksort.Commun. ACM 4, 7, 321. [4] Blaise Barney, Lawrence Livermore National Laboratory, available at https://computing.llnl.gov/tutorials/mpi/, December 2015 [5] Rer. nat. Siegmar Groß, available at http://www2.hsfulda.de/~gross/cygwin/cygwin_2.0/inst_conf_cygwin_2.0.x_x11r7.x.pdf , December 2015 [6] D. G. Martinez, S. R. Lumley, Installation of Open MPI parallel and distributed programming. [7] P. Kataria, Parallel quicksort implementation using MPI and Pthreads, December 2008 [8] P. Perera, Parallel Quicksort using MPI and Performance Analysis [9] V. Prifti, R. Bala, I. Tafa, D. Saatciu and J. Fejzaj. The Time Profit Obtained by Parallelization of Quicksort Algorithm Used for Numerical Sorting. In Science and Information Conference 2015, June 28-30, 2015, IEEE.

Questions & Discussion

Danke !