CSE 332: Parallel Algorithms

Slides:



Advertisements
Similar presentations
CS 1031 Recursion (With applications to Searching and Sorting) Definition of a Recursion Simple Examples of Recursion Conditions for Recursion to Work.
Advertisements

CS4413 Divide-and-Conquer
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Efficient Sorts. Divide and Conquer Divide and Conquer : chop a problem into smaller problems, solve those – Ex: binary search.
CSC2100B Quick Sort and Merge Sort Xin 1. Quick Sort Efficient sorting algorithm Example of Divide and Conquer algorithm Two phases ◦ Partition phase.
Quicksort Divide-and-Conquer. Quicksort Algorithm Given an array S of n elements (e.g., integers): If array only contains one element, return it. Else.
Ver. 1.0 Session 5 Data Structures and Algorithms Objectives In this session, you will learn to: Sort data by using quick sort Sort data by using merge.
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting Dan Grossman Last Updated: November.
CSE332: Data Abstractions Lecture 20: Parallel Prefix and Parallel Sorting Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting Tyler Robison Summer
CS2420: Lecture 10 Vladimir Kulyukin Computer Science Department Utah State University.
Data Structures Review Session 1
Mergesort and Quicksort Chapter 8 Kruse and Ryba.
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting Steve Wolfman, based on work by Dan.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2.Solve smaller instances.
Computer Science 101 Fast Algorithms. What Is Really Fast? n O(log 2 n) O(n) O(n 2 )O(2 n )
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
Parallel Prefix, Pack, and Sorting. Outline Done: –Simple ways to use parallelism for counting, summing, finding –Analysis of running time and implications.
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Fundamentals of Algorithms MCS - 2 Lecture # 11
Lecture 3: Parallel Algorithm Design
Subject Name: Design and Analysis of Algorithm Subject Code: 10CS43
Lecture 8-1 : Parallel Algorithms (focus on sorting algorithms)
Searching – Linear and Binary Searches
CS 162 Intro to Programming II
COMP 53 – Week Seven Big O Sorting.
CPSC 411 Design and Analysis of Algorithms
Quick Sort and Merge Sort
Quicksort
Chapter 4 Divide-and-Conquer
Quicksort The basic quicksort algorithm is recursive Chosing the pivot
Divide and Conquer divide and conquer algorithms typically recursively divide a problem into several smaller sub-problems until the sub-problems are.
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Chapter 7 Sorting Spring 14
Quicksort "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Divide and Conquer.
Quicksort 1.
CS 3343: Analysis of Algorithms
Chapter 4: Divide and Conquer
Unit-2 Divide and Conquer
CSE332: Data Abstractions Lecture 20: Parallel Prefix, Pack, and Sorting Dan Grossman Spring 2012.
Data Structures Review Session
QuickSort Previous slides based on ones by Ethan Apter & Marty Stepp
CSE 373 Data Structures and Algorithms
Sub-Quadratic Sorting Algorithms
Chapter 4.
CSE 326: Data Structures Sorting
CSE 332: Data Abstractions Sorting I
Quicksort.
ITEC 2620M Introduction to Data Structures
CSC 143 Java Sorting.
Topic: Divide and Conquer
Quicksort.
CSE 326: Data Structures: Sorting
Richard Anderson Lecture 14, Winter 2019 Divide and Conquer
Design and Analysis of Algorithms
CSE 332: Sorting II Spring 2016.
CS203 Lecture 15.
Divide and Conquer Merge sort and quick sort Binary search
Quicksort.
Divide-and-Conquer 7 2  9 4   2   4   7
Divide-and-Conquer Divide: the problem into subproblems
Presentation transcript:

CSE 332: Parallel Algorithms Richard Anderson Spring 2016 1

Announcements Project 2: Due tonight Project 3: Available soon 2

Analyzing Parallel Programs Let TP be the running time on P processors Two key measures of run-time: Work: How long it would take 1 processor = T1 Span: How long it would take infinity processors = T Speed-up on P processors: T1 / TP

Amdahl’s Fairly Trivial Observation 5/27/2019 Amdahl’s Fairly Trivial Observation Most programs have parts that parallelize well parts that don’t parallelize at all Let S = proportion that can’t be parallelized, and normalize T1 to 1 1 = T1 = S + (1 – S) Suppose we get perfect linear speedup on the parallel portion: TP = S + (1-S)/P So the overall speed-up on P processors is (Amdahl’s Law): T1 / T P = 1 / (S + (1-S)/P) T1 / T  = 1 / S Examples?

Results from Friday Parallel Prefix Quicksort O(N) Work O(log N) Span Partition can be solved with Parallel Prefix Overall result O(N log N) work, O(log2N)) Span

Prefix sum Prefix-sum: input output 6 3 11 10 8 2 7 8 6 9 20 30 38 40 47 55 Sum [0,7]: Sum<0: Sum [0,3]: Sum [4,7]: Sum<4: Sum [0,1]: Sum [2,3]: Sum<2: Sum [4,5]: Sum [6,7]: Sum<6: 6 3 11 10 8 2 7 6

Parallel Partition Pick pivot Pack (test: <6) Right pack (test: >=6) 8 1 4 9 3 5 2 7 6 1 4 3 5 2 1 4 3 5 2 6 8 9 7 7

Parallel Quicksort Quicksort Complexity (avg case) Pick a pivot O(1) Partition into two sub-arrays O(log n) span values less than pivot values greater than pivot Recursively sort A and B in parallel T(n/2), avg Complexity (avg case) T(n) = O(log n) + T(n/2) T(0) = T(1) = 1 Span: O( log2 n) Parallelism (work/span) = O( n / log n ) partition: O(log n); T(n) = O(log n) + T(n/2). Span = O(log^2 n). Parallelism = O(n/log n) much better 8

Sequential Mergesort Mergesort (review): Complexity (worst case) Sort left and right halves 2T(n/2) Merge results O(n) Complexity (worst case) T(n) = n + 2T(n/2) T(0) = T(1) = 1 O(n logn) How to parallelize? Do left + right in parallel, improves to O(n) To do better, we need to… 9

Parallel Merge How to merge two sorted lists in parallel? 4 6 8 9 1 2 4 6 8 9 1 2 3 5 7 10

Parallel Merge Choose median M of left half O( ) M Split both arrays into < M, >=M O( ) how? 4 6 8 9 1 2 3 5 7 M median: O(1). Split right half requires binary search to find M O(log n). 11

Parallel Merge merge Choose median M of left half 4 6 8 9 1 2 3 5 7 merge Choose median M of left half Split both arrays into < M, >=M how? Do two submerges in parallel 12

merge merge merge merge merge 4 6 8 9 1 2 3 5 7 merge merge 4 1 2 3 5 6 8 9 7 4 1 2 3 5 6 8 9 7 merge merge merge 1 2 4 3 5 6 7 8 9 1 2 4 3 5 6 7 8 9 merge merge 1 2 4 3 5 6 7 8 9 1 2 3 4 5 6 7 8 9 13

merge merge merge merge merge 4 6 8 9 1 2 3 5 7 merge merge 4 1 2 3 5 6 8 9 7 When we do each merge in parallel: we split the bigger array in half use binary search to split the smaller array And in base case we copy to the output array 4 1 2 3 5 6 8 9 7 merge merge merge 1 2 4 3 5 6 7 8 9 1 2 4 3 5 6 7 8 9 merge merge 1 2 4 3 5 6 7 8 9 1 2 3 4 5 6 7 8 9 14

Parallel Mergesort Pseudocode Merge(arr[], left1, left2, right1, right2, out[], out1, out2 ) int leftSize = left2 – left1 int rightSize = right2 – right1 // Assert: out2 – out1 = leftSize + rightSize // We will assume leftSize > rightSize without loss of generality if (leftSize + rightSize < CUTOFF) sequential merge and copy into out[out1..out2] int mid = (left2 – left1)/2 binarySearch arr[right1..right2] to find j such that arr[j] ≤ arr[mid] ≤ arr[j+1] Merge(arr[], left1, mid, right1, j, out[], out1, out1+mid+j) Merge(arr[], mid+1, left2, j+1, right2, out[], out1+mid+j+1, out2) 15

Analysis Parallel Merge (worst case) Parallel Mergesort (worst case) Height of partition call tree with n elements: O( ) Complexity of each thread (ignoring recursive call): O( ) Span: O( ) Parallel Mergesort (worst case) Span: O( ) Parallelism (work / span): O( ) Subtlety: uneven splits but even in worst case, get a 3/4 to 1/4 split still gives O(log n) height height: O(log n), each thread O(log n), -> Merge span O(log^2 n) merge sort requires O(log n) merges -> O(log^3 n). Parallelism: O(n/log^2 n) 4 6 8 1 2 3 5 16

Parallel Quicksort vs. Mergesort Parallelism (work / span) quicksort: O(n / log n) avg case mergesort: O(n / log2 n) worst case 17