Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato

Slides:



Advertisements
Similar presentations
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Advertisements

Analysis of Algorithms
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
MS 101: Algorithms Instructor Neelima Gupta
Practice Quiz Question
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu, Mauro Bianco, Nancy Amato Parasol Lab, Department of Computer.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Parallel Algorithms in STAPL Implementation and Evaluation Jeremy Vu Faculty Mentor: Dr. Nancy Amato Supervisor: Dr. Mauro Bianco.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
Sorting Algorithms: Topic Overview
PContainerARMI Communication Library Oil well logging simulation MPIOpenMPPthreadsNative pAlgorithmspContainers User Application Code pRange STAPL Overview.
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley,
CS 584. Sorting n One of the most common operations n Definition: –Arrange an unordered collection of elements into a monotonically increasing or decreasing.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
CS107 Introduction to Computer Science Lecture 7, 8 An Introduction to Algorithms: Efficiency of algorithms.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.
Parallel Programming in C with MPI and OpenMP
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
STAPL: A High Productivity Programming Infrastructure for Parallel & Distributed Computing Lawrence Rauchwerger Parasol Lab, Dept.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Survey of Sorting Ananda Gunawardena. Naïve sorting algorithms Bubble sort: scan for flips, until all are fixed Etc...
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises (Lab 2: Sorting)
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
1 Radix Sort. 2 Classification of Sorting algorithms Sorting algorithms are often classified using different metrics:  Computational complexity: classification.
A Comparison of Parallel Sorting Algorithms on Different Architectures Nancy M. Amato, Ravishankar Iyer, Sharad Sundaresan and Yan Wu Texas A&M University.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Intro to the C++ STL Timmie Smith September 6, 2001.
Adaptive Sorting “A Dynamically Tuned Sorting Library” “Optimizing Sorting with Genetic Algorithms” By Xiaoming Li, Maria Jesus Garzaran, and David Padua.
CSCI-455/552 Introduction to High Performance Computing Lecture 23.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Data Structures and Algorithms in Parallel Computing Lecture 8.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
A Parallel Communication Infrastructure for STAPL
Advanced Sorting 7 2  9 4   2   4   7
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Parallel Sorting Algorithms
Ch8: Sorting in Linear Time Ming-Te Chi
Objective of This Course
Linear Sorting Sorting in O(n) Jeff Chastine.
Parallel Sorting Algorithms
Analysis of Algorithms
Parallel Sorting Algorithms
Linear Time Sorting.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato Parasol Lab, Department of Computer Science, Texas A&M University, ContainerpContainerRuntime system : aRMI IteratorpRange TOOLBOXES: Performance Optimization AlgorithmspAlgorithms System Profiling STAPL STL The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ISO Standard C++ Standard Template Library (STL). It executes on uni- or multi-processor systems that utilize shared or distributed memory. The goal of STAPL is to allow the user to work at a high level of abstraction by insulating them from the complexity of parallel programming, such as problem decomposition, problem mapping, scheduling, and execution, while still providing scalable performance. l Ease of use –STAPL emulates Shared Memory Programming. Users can program assuming a single address space in both shared and distributed systems. l Efficiency –STAPL provides building blocks equivalent to STL containers, iterators, and algorithms that are automatically tuned for parallel and distributed systems. l Portability –STAPL has its own runtime that hides machine specific details and provides a uniform and efficient communication interface. STAPL Design GoalsSTAPL Main Components STAPL: Standard Template Adaptive Parallel Library l pContainer –Distributed data structures. l pRange –Presents an abstract view of a scoped data space, which allows random access to a partition or subrange of the data in a pContainer –Stores data dependence information. l pAlgorithms –Parallel Algorithms which provide basic functionality, bound with the pContainer by pRange. l Adaptive Runtime System –Adaptive Remote Method Invocation (aRMI) communication library hides machine specifics and provides a uniform communication interface. –Adaptive performance optimization toolbox, including scheduler, load-balancer, and system profiling tools. Parallel Sorting Algorithms l [1] "STAPL: An Adaptive, Generic Parallel C++ Library," P.An, A.Jula, S.Rus, S.Saunders, T.Smith, G.Tanase, N.Thomas, N.Amato and L.Rauchwerger, l [2] “A Comparison of Parallel Sorting Algorithms on Different Architectures,” N.Amato, R.Iyer, S.Sundaresan, Y.Wu, To be able to adaptively select the best algorithm based on the data provided and the system information available Radix SortSample Sort l Sequential Algorithm 1. Select p-1 splitters 2. Sort the splitters; the splitters are the upper and lower bounds that define p buckets 3. Compare each element to the splitters and place it in an appropriate bucket 4. Sort the content of each bucket 5. Copy the values from buckets into the original container l Parallelization –If each processor is responsible for one bucket, the steps can be done in parallel –The running time of this algorithm is dependent on the maximum number of elements contained in any bucket (distribution of elements between buckets) –Thus we want all buckets to contain an equal number of elements –Technique used to achieve the above: oversampling l Performance of parallel sorts depends on: –Machine Architecture –Number of Processors –Type of Elements to Sort –How presorted the elements are l Works for all types of elements that can be compared l Sequential Complexity: O(n log n) l Parallel Complexity: O(n/p log n/p) l Works only for integers l Sequential Complexity: O(n) l Parallel Complexity: O(n/p) Sequential Algorithm –Radix sort is not a comparison sort, therefore it is not subject to the O(n log n) sorting lower bound –Each element is represented by b bits (i.e. 32 bit integers) –The algorithm performs a number of iterations; each iteration considers only r bits of each elementat a time, with the ith pass sorting according to the ith group of the least significant bits –The sorting algorithm used to sort the r bits must be stable, meaning that if two elements have the same value, they appear in the same order in the output sequence as they did in the input sequence. Counting sort is usually used here, as demonstrated on the parallel example to the right Bitonic Sort References l Parallel Algorithm 1. Locally sort the elements on each thread 2. Form a bitonic sequence (a sequence which is first increasing and then decreasing, or can be circularly shifted to become so) 3. Sort in an increasing order l Note: Each step of the Bitonic Sort consists of 2 threads exchanging data, merging the 2 sequences, and keeping its corresponding half l Heuristic applied: the threads exchange minimum and maximum first, then trade only the elements necessary for the merge l Works for all types of elements that can be compared l Sequential Complexity: O(n log n) l Parallel Complexity: O(n/p log n/p + n/p log p) (sort+merge) Performance Goal Comparison l Random Data –Radix Sort is faster up to 8 processors –Sample Sort outperforms Radix Sort as the number of processors increases Sample SortRadix Sort l Sample Sort –Scales better than Radix Sort –Performs well on various data types l Radix Sort –The fastest sort for integers –Scalability is poor for random data l Nearly Sorted Data –Radix Sort is faster –The difference in performance is smaller as the number of processors increases