Parallel Processing (CS526) Spring 2012(Week 5).  There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly.

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Types of Algorithms.
© 2004 Goodrich, Tamassia Merge Sort1 7 2  9 4   2  2 79  4   72  29  94  4.
© 2004 Goodrich, Tamassia Merge Sort1 7 2  9 4   2  2 79  4   72  29  94  4.
Data Structures Lecture 9 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
1 Merge Sort Review of Sorting Merge Sort. 2 Sorting Algorithms Selection Sort uses a priority queue P implemented with an unsorted sequence: –Phase 1:
Quicksort Quicksort     29  9.
Merge Sort1 Part-G1 Merge Sort 7 2  9 4   2  2 79  4   72  29  94  4.
Merge Sort1 7 2  9 4   2  2 79  4   72  29  94  4.
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Insertion Sort. Selection Sort. Bubble Sort. Heap Sort. Merge-sort. Quick-sort. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010.
CSC2100B Quick Sort and Merge Sort Xin 1. Quick Sort Efficient sorting algorithm Example of Divide and Conquer algorithm Two phases ◦ Partition phase.
© 2004 Goodrich, Tamassia Quick-Sort     29  9.
© 2004 Goodrich, Tamassia QuickSort1 Quick-Sort     29  9.
Quick-Sort     29  9.
© 2004 Goodrich, Tamassia Quick-Sort     29  9.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
© 2004 Goodrich, Tamassia Merge Sort1 Quick-Sort     29  9.
MergeSort Source: Gibbs & Tamassia. 2 MergeSort MergeSort is a divide and conquer method of sorting.
CSC 213 – Large Scale Programming. Today’s Goals  Review past discussion of data sorting algorithms  Weaknesses of past approaches & when we use them.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Fundamentals of Algorithms MCS - 2 Lecture # 7
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
File Organization and Processing Week 13 Divide and Conquer.
Lecture10: Sorting II Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Merge Sort Data Structures and Algorithms CS 244 Brent M. Dingle, Ph.D. Department of Mathematics, Statistics, and Computer Science University of Wisconsin.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
© 2004 Goodrich, Tamassia Quick-Sort     29  9.
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
1 Merge Sort 7 2  9 4   2  2 79  4   72  29  94  4.
Towers of Hanoi Move n (4) disks from pole A to pole B such that a larger disk is never put on a smaller disk A BC ABC.
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Merge Sort 1/12/2018 5:48 AM Merge Sort 7 2   7  2  2 7
Advanced Sorting 7 2  9 4   2   4   7
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Merge Sort 1/12/2018 9:39 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Merge Sort 1/12/2018 9:44 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Parallel Programming By J. H. Wang May 2, 2017.
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Quick Sort and Merge Sort
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
MergeSort Source: Gibbs & Tamassia.
More on Merge Sort CS 244 This presentation is not given in class
Objectives Introduce different known sorting algorithms
Unit-2 Divide and Conquer
Merge Sort 2/23/ :15 PM Merge Sort 7 2   7  2   4  4 9
Merge Sort 2/24/2019 6:07 PM Merge Sort 7 2   7  2  2 7
Quick-Sort 2/25/2019 2:22 AM Quick-Sort     2
Copyright © Aiman Hanna All rights reserved
Copyright © Aiman Hanna All rights reserved
Quick-Sort 4/8/ :20 AM Quick-Sort     2 9  9
Divide-and-Conquer 7 2  9 4   2   4   7
Merge Sort 4/10/ :25 AM Merge Sort 7 2   7  2   4  4 9
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Divide & Conquer Sorting
Divide-and-Conquer 7 2  9 4   2   4   7
Merge Sort 5/30/2019 7:52 AM Merge Sort 7 2   7  2  2 7
CS203 Lecture 15.
Divide-and-Conquer 7 2  9 4   2   4   7
Presentation transcript:

Parallel Processing (CS526) Spring 2012(Week 5)

 There are no rules, only intuition, experience and imagination!  We consider design techniques, particularly top-down approaches, in which we find the main structure first, using a set of useful conceptual paradigms.  We also look for useful primitives to compose in a bottom-up approach.

 A parallel algorithm emerges from a process in which a number of interlocked issues must be addressed:  Where do the basic units of computation (tasks) come from? ◦ This is sometimes called “partitioning" or “decomposition".  Sometimes it is natural to think in terms of partitioning the input or output data (or both).  On other occasions a functional decomposition may be more appropriate (i.e. thinking in terms of a collection of distinct, interacting activities).

 How do the tasks interact? ◦ We have to consider the dependencies between tasks (dependency, interaction graphs). Dependencies will be expressed in implementations as communication, synchronisation and sharing(depending upon the machine model).  Are the natural tasks of a suitable granularity? ◦ Depending upon the machine, too many small tasks may incur high overheads in their interaction. Should they be agglomerated (collected together) into super-tasks? This is related to scaling-down.

 How should we assign tasks to processors? ◦ Again, in the presence of more tasks than processors, this is related to scaling down. The owner computes rule is natural for some algorithms which have been devised with a data-oriented partitioning. We need to ensure that tasks which interact can do so as (quickly) as possible.  These issues are often in tension with each other

 Use recursive problem decomposition.  Create sub-problems of the same kind, which are solved recursively.  Combine sub-solutions to solve the original problem.  Define a base case for direct solution of simple instances.  Well-known examples include quick-sort, merge-sort, matrix multiply.

© 2010 Goodrich, Tamassia 7  Merge-sort on an input sequence S with n elements consists of three steps: ◦ Divide: partition S into two sequences S 1 and S 2 of about n  2 elements each ◦ Recur: recursively sort S 1 and S 2 ◦ Conquer: merge S 1 and S 2 into a unique sorted sequence Algorithm mergeSort(S, C) Input sequence S with n elements, comparator C Output sequence S sorted according to C if S.size() > 1 (S 1, S 2 )  partition(S, n/2) mergeSort(S 1, C) mergeSort(S 2, C) S  merge(S 1, S 2 )

© 2010 Goodrich, Tamassia8  The conquer step of merge-sort consists of merging two sorted sequences A and B into a sorted sequence S containing the union of the elements of A and B  Merging two sorted sequences, each with n  2 elements takes O(n) time  I.e Mergesort have a sequential complexity of  Ts=O(nlogn) Algorithm merge(A, B) Input sequences A and B with n  2 elements each Output sorted sequence of A  B S  empty sequence while  A.isEmpty()   B.isEmpty() if A.first().element() < B.first().element() S.addLast(A.remove(A.first())) else S.addLast(B.remove(B.first())) while  A.isEmpty() S.addLast(A.remove(A.first())) while  B.isEmpty() S.addLast(B.remove(B.first())) return S

© 2010 Goodrich, Tamassia9  An execution of merge-sort is depicted by a binary tree ◦ each node represents a recursive call of merge-sort and stores  unsorted sequence before the execution and its partition  sorted sequence at the end of the execution ◦ the root is the initial call ◦ the leaves are calls on subsequences of size 0 or  9 4   2  2 79  4   72  29  94  4

 There is an obvious tree of processes to be mapped to available processors.  There may be a sequential bottleneck at the root.  Producing an efficient algorithm may require us to parallelize it, producing nested parallelism.  Small problems may not be worth distributing  trade on between distribution costs and re- computation costs.

Algorithm Parallel_mergeSort(S, C) Input sequence S with n elements, comparator C Output sequence S sorted ◦ according to C if S.size() > 1 (S 1, S 2 )  partition(S, n/2) initiate a process to invoke mergeSort(S 1, C) mergeSort(S 2, C) ============ Sync all invoked processes ============ S  merge(S 1, S 2 )

 The serial runtime for the Merge sort can be expressed as: ◦ Ts(n)=nlogn+n = O(nlogn) Parallel runtime for Merge-sort on n processors: Tp(n)=logn+n = O(n)  S= O(logn) Cost = n 2 is it cost optimal??

 We have a parallelism bottleneck (Merge) ◦ can we Parallelize it?  We have two sorted lists  Searching in a sorted list is best done using a divide an conquer (binary search) O(logn)  If we partition the two lists on the middle element and merge the two lists depending on a binary search technique we could reduce the merge operation to a O(logn 2 )

ListA Parallel_Merge(S1,S2) { if (length of any S1 or S2 ==1) {add the element in the shortest list to the other list and return the resulting list.} Else TS1  all elements from 0 to S1.length/2 TS2  all elements from S1.length/2 to S1.length-1 TS3  all elements from 0 to S2.length/2 TS4  all elements from S2.length/2 to S2.length-1 in parallel merge these list according to a comparison between the splitting elements calling recursively Parallel merge function for each process which will return the two partitions S1,S2 sorted as S1<S2 Sync A  S1+S2 }

 The serial runtime for the Merge sort can be expressed as: ◦ Ts(n)=nlogn+n = O(nlogn) Parallel runtime for Merge-sort on n processors: Tp(n)=logn+logn = O(logn)  S= O(n) Cost = nlogn is it cost optimal??  Read the Paper on the website for another parallel merge-sort algorithm

 We ignored a very important overhead cost which is communication cost.  Think of the implementation of this algorithm if it was on a message passing environment.  The analysis of an algorithm must take into account the underlying platform in which it will operate.  What about the merge sort what is the cost if it was on a message passing parallel archetecture?

 In reality its some times difficult or inefficient to assign tasks to processing elements at the design time.  The Bag of Tasks pattern suggests an approach which may be able to reduce overhead, while still providing the flexibility to express such dynamic, unpredictable parallelism.  In bag of tasks, a fixed number of worker processes/threads maintain and process a dynamic collection of homogeneous “tasks". Execution of a particular task may lead to the creation of more task instances.

place initial task(s) in bag; co [w = 1 to P] { while (all tasks not done) { get a task; execute the task; possibly add new tasks to the bag; }  The pattern is naturally load-balanced: each worker will probably complete a different number of tasks, but will do roughly the same amount of work overall.

 The Producers-Consumers pattern arises when a group of activities generate data which is consumed by another group of activities.  The key characteristic is that the conceptual data flow is all in one direction, from producer(s) to consumer(s).  In general, we want to allow production and consumption to be loosely synchronized, so we will need some buffering in the system.  The programming challenges are to ensure that no producer overwrites a buffer entry before a consumer has used it, and that no consumer tries to consume an entry which doesn't really exist (or re-use an already consumed entry).

 Depending upon the model, these challenges motivate the need for various facilities. For example, with a buffer in shared address space, we may need atomic actions and condition synchronization.  Similarly, in a distributed implementation we want to avoid tight synchronization between sends to the buffer and receives from it.

 When one group of consumers become the producers for yet another group, we have a pipeline.  Items of data flow from one end of the pipeline to the other, being transformed by and/or transforming the state of the pipeline stage processes as they go.

 The Sieve of Eratosthenes provides a simple pipeline example, with the additional factor that we build the pipeline dynamically.  The object is to find all prime numbers in the range 2 to N. The gist of the original algorithm was to write down all integers in the range, then repeatedly remove all multiples of the smallest remaining number. After each removal phase, the new smallest remaining number is guaranteed to be prime  We will implement a message passing pipelined parallel version by creating a generator process and a sequence of sieve processes, each of which does the work of one removal phase. The pipeline grows dynamically, creating new sieves on demand, as unsieved numbers emerge from the pipeline.