Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Lower bound for sorting, radix sort COMP171 Fall 2005.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Using Divide and Conquer for Sorting
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
Chapter 7: Sorting Algorithms
Chapter 19: Searching and Sorting Algorithms
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 171: Introduction to Computer Science II Quicksort.
Insertion sort, Merge sort COMP171 Fall Sorting I / Slide 2 Insertion sort 1) Initially p = 1 2) Let the first p elements be sorted. 3) Insert the.
Lower bound for sorting, radix sort COMP171 Fall 2006.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Insertion sort, Merge sort COMP171 Fall Sorting I / Slide 2 Insertion sort 1) Initially p = 1 2) Let the first p elements be sorted. 3) Insert the.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
CHAPTER 11 Sorting.
Sorting. Introduction Assumptions –Sorting an array of integers –Entire sort can be done in main memory Straightforward algorithms are O(N 2 ) More complex.
Quicksort.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting - 3 CS 202 – Fundamental Structures of Computer Science II.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
CSC – 332 Data Structures Sorting
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Analysis of Algorithms CS 477/677
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Chapter 7 Sorting Sort is.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Introduction to Algorithms
Chapter 7 Sorting Spring 14
Algorithm Design and Analysis (ADA)
Quick Sort (11.2) CSE 2011 Winter November 2018.
Ch8: Sorting in Linear Time Ming-Te Chi
Data Structures Review Session
EE 312 Software Design and Implementation I
Analysis of Algorithms
Lower bound for sorting, radix sort
CSE 373 Data Structures and Algorithms
The Selection Problem.
Design and Analysis of Algorithms
CSE 332: Sorting II Spring 2016.
Sorting We have actually seen already two efficient ways to sort:
Presentation transcript:

Lecture 8 Sorting

Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points to remember: We have O(N 2 ) algorithms O(NlogN) algorithms (heap sort) In general, any algorithm for sorting must be  (NlogN) (will prove it) In special cases, we can have O(N) sorting algorithms.

Insertion Sort Section 7.2 (Weiss) Initially P = 1 Let the first P elements be sorted. (3)Then the P+1 th element is inserted properly in the list so that now P+1 elements are sorted. Now increment P and go to step (3)

P+1th element is inserted as follows: Store the P+1 th element first as some temporary variable, temp; If Pth element greater than temp, then P+1th element is set equal to the Pth one, If P-1 th element greater than temp, then Pth element is set equal to P-1th one….. Continue like this till some kth element is less than or equal to temp, or you reach the first position. Let this stop at kth position (k can be 1). Set the k+1th element equal to temp.

Extended Example Need to sort: P = 1; Looking at first element only, and we do not change. P = 2; Temp = 8; 34 > Temp, so second element is set to 34. We have reached the end of the list. We stop there. Thus, first position is set equal to Temp;

After second pass; Now, the first two elements are sorted. Set P = 3. Temp = 64, 34 < 64, so stop at 3 rd position and set 3 rd position = 64 After third pass: P = 4, Temp = 51, 51 < 64, so we have , 34 < 51, so stop at 2nd position, set 3 rd position = Temp, List is

Now P = 5, Temp = 32, 32 8, so we stop at first position and set second position = 32, we have , Now P = 6, We have

Pseudo Code Assume that the list is stored in an array, A (can do with a linked list as well) Insertion Sort(A[],int N) { for (P = 1; P < N; P++) { Temp = A[P]; for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1]; A[j] = Temp; } }

Complexity Analysis Inner loop is executed P times, for each P Because of outer loop, P changes from 1 to N. Thus overall we have …..N. This is O(N 2 ) Space requirement: O(N)

Heap Sort Section 7.5 Complexity: O(NlogN) Suppose we need to store the sorted elements, in this case we need an additional array. Total space is 2N. This is O(N), but can we do with a space of N positions only? Whenever we do a deletemin, the heapsize shrinks by one. We can store the elements in the additional spaces.

We have a heap of size N. After the first deletemin, the last element is at position N-1. Store the retrieved element at position N After the second deletemin, the last element is at position N- 2, so the retrieved element is stored at N-1, and so on. Array will finally have elements in increasing order

Merge Sort Divide and Conquer Divide the list into two smaller lists: Sort the smaller lists. Merge the sorted lists to get an overall sorted list. How do we merge 2 sorted lists?

Consider 2 sorted arrays A and B We will merge them into a sorted array C. Have three variables, posA, posB, posC Initially, both of these are at the respective first positions (3)Compare the elements at posA and posB Whichever is smaller, goes into the posC position of C, and the position is advanced in the corresponding array. posC is also advanced (e.g., if element of A goes into C, then posA is advanced). Go to step (3) After all the elements are exhausted in one array, the remaining of the other are copied in C.

Suppose we want to merge Pos A = 1, pos B = 2, pos C = 1 Merged List: 1 Complexity of the merging? pos A = 13, pos B = 2, pos C = 2 Merged List: 1, 2 pos A = 13, pos B = 15, pos C = 13 Merged List: 1,2, 13 pos A = 24, pos B = 15, pos C = 15 Merged List: 1,2,13,15 pos A = 24, pos B = 27, pos C = 24 Merged List: 1,2,13,15,24 pos A = 26, pos B = 27, pos C = 26 Merged List: 1,2,13,15,24,26 A ends, so remaining of B is added: 1,2,13,15,24,26,27,38 O(m + n)

Pseudocode for merging Merge (A, B, C){ posA = 0; posB=0; posC=0; While (posA < sizeA)&& (posB < sizeB) {if (A[posA] <B[posB]) { C[posC]=A[posA]; posA = posA + 1;}

Else {C[posC] = B[posB]; posB = posB + 1;} posC = posC + 1;} While (posB < sizeB) {C[posC] =B[posB]; Increment posC and posB;} While (posA < sizeA) {C[posC] =A[posA]; Increment posC and posA;} }

Overall approach Divide an array into 2 parts Recursively sort the 2 parts (redivide, recursively sort, sorting one element is easy) Merge the 2 parts Mergesort(A,left,right) { if (left  right) break; center =  (left + right)/2  ; Mergesort(A,left,center); Mergesort(A,center+1,right); B = left half of A; C = right half of A; Merge(B,C,D); Set left to right of A = D;}

Need to sort: Sort : 8, 34, 64 Sort : 21, 32, 51 Merge the two: We have 8, 21, 32, 34,51,64

T(n) = 2T(n/2) + cn T(1) = 1; Using master theorem, we get T(n) as O(nlogn) Any problem with merge sort? Additional space

Quick Sort Storage: O(n) We will choose a pivot element in the list. Partition the array into two parts. One contains elements smaller than this pivot, another contains elements larger than this pivot. Recursively sort and merge the two partitions. Section 7.7 How do we merge? We do not need any additional space for merging. Thus storage is O(n)

How can the approach become bad? Want to sort …n Choose 1 as pivot The partitions are … To get these partitions we spent ? Let next time pivot be 2 partitions are ….n Spent how much? O(n-1) O(n-2) Overall how much do we spend for partitioning? O(n 2 )

Moral of the story: We would like the partitions to be as equal as possible. This depends on the choice of the pivot. Equal partitions are assured if we can choose our pivot as the element which is in the middle of the sorted list. It is not easy to find that. Pivots are chosen randomly. In worst case, pivot choice can make one partition have n-1 elements, another partition 1 element. O(n 2 ) On an average this will not happen. Worst case complexity?

Pivot can be chosen as a random element in the list. Or choose the first, the last and the center and take the median of the three (middle position). We will do the latter. We discuss the partitioning now.

Interchange the pivot with the last element Have a pointer at the first element (P1), and one at the second last element (P2). Move P1 to the right skipping elements which are less than the pivot. Move P2 to the left skipping elements which are more than the pivot. Stop P1 when we encounter an element greater than or equal to the pivot. Stop P2 when we encounter an element lesser than or equal to the pivot.

Interchange the elements pointed to by P1 and P2. If P1 is right of P2, stop, otherwise move P1 and P2 as before till we stop again When we stop, swap P1 with the last element which is the pivot First = 8 Last = 0, Median = 6, Pivot = P1 P2

P1 P P1 P P1 P P2 P Partition 1Partition 2

At any time can you say anything about the elements to the left of P1? Elements to the right of P1 are greater than or equal to the pivot. When P1 and P2 cross, what can you say about the elements in between P1 and P2? They are all equal to the pivot. Also, for right of P2? Elements to the left of P1 are less than or equal to the pivot.

Suppose P1 and P2 have crossed, and stopped and the pivot is interchanged with P1. How do we form the partition? Everything including P1 and its right are in one partition (greater). Remaining are in the left partition. We can also do some local optimizations in length. Complexity? Space? O(n)

Procedure Summary Partition the array Sort the partition recursively Partition 1Partition Partition 1 SortedPartition 2 Sorted Need to do any thing more? Merger is automatic

Pseudocode Quicksort(A, left, right) { Find pivot; Interchange pivot and A[right]; P1 = left; P2 = right – 1; Partition(A, P1, P2, pivot); /*returns newP1*/ Interchange A[newP1] and A[right]; Quicksort(A, left, newP1-1); Quicksort(A, newP1, right); }

Partition(A, P1, P2,pivot) { While (P1  P2) { While (A[P1] < pivot) increment P1; While (A[P2] > pivot) decrement P2; Swap A[P1] and A[P2]; increment P1; decrement P2; } newP1 = P1; return(newP1); }

Worst Case Analysis T(n) = T(n 1 ) + T(n 2 ) + cn T(1) = 1; n 1 + n 2 = n In good case, n 1 = n 2 = n/2 always Thus T(n) = O(nlogn)

In bad case, n 1 = 1 n 2 = n-1 always T(n) = pn + T(n-1) = pn + p(n-1) + T(n-2) …….. = p(n + n-1 +……+1) = pn 2 Thus T(n) is O(n 2 ) in worst case. Average case complexity is O(nlog n)

Quicksort performs well for large inputs, but not so good for small inputs. When the divisions become small, we can use insertion sort to sort the small divisions instead.

General Lower Bound For Sorting Suppose a list of size k must be sorted. How many orderings can we have for k members? Depending on the values of the n numbers, we can have n! possible orders, e.g., the sorted output for a,b,c can be a,b,c, b,a,c, a,c,b, c,a,b, c,b,a, b,c,a Section 7.9

Any comparison based sorting process can be represented as a binary decision tree. A node represents a comparison, and a branch represents the outcome of a comparison. The possible orders must be the leaves of the decision tree, i.e., depending on certain orders we will have certain comparison sequences, and we will reach a certain leaf.

a b c b a c a c bb c a c b a c a b C1 C2 C3 C4 C5 Compare a and b C6 Different sorting algorithms will have different comparison orders, but the leaves are same for all sorting trees.

Thus any sorting algorithm for sorting n inputs can be represented as a binary decision tree of n! leaves. Such a tree has depth at least log n! This means that any comparison based sorting algorithm need to perform at least log n! Comparisons in the worst case. log n! = log n + log (n-1) +…..+log(2) + log(1)  log n + log (n-1) +……+ log (n/2)  (n/2)log (n/2) = (n/2)log (n) – (n/2) =  (nlog n)

Special Case Sorting Now we will present some linear time sorting algorithms. These apply only when the input has a special structure, e.g., inputs are integers. Counting sort Radix sort Bucket sort Please follow class notes

Counting Sort Suppose we know that the list to be sorted consists of n integers in the range 1 to M We will declare an array B with M positions. Initially all positions of B contain 0. Scan through list A A[j] is inserted in B[A[j]] B is scanned once again, and the nonzero elements are read out.

What do we do with equal elements? B is an array of pointers. Each position in the array has 2 pointers, head and tail. Tail points to the end of a linked list, and head points to the beginning. A[j] is inserted at the end of the list B[A[j]] Again, Array B is sequentially traversed and each nonempty list is printed out. M = 10, Wish to sort Output:

M = 10, Wish to sort Output:

Complexity? Storage? Storage could be large for large ranges. Supposing we have a list of elements, such that every element has 2 fields, one an integer, and another field something else. During sorting we just look at the integer field. Supposing element a occurs before element b in the input list, and a and b have the same integer components, Can their positions be reversed in the output if counting sort is used? O(n + M) If M is O(n), then complexity is O(n) O(n+M)

Stability Property Relative position of 2 equal elements remain the same in the output as in the input. Is merge sort stable? Quick sort? Insertion sort? Stability property of counting sort will be used in designing a new sorting algorithm, radix sort. Yes No Yes

Radix Sort Every integer is represented by at most d digits (e.g., decimal representation) Every digit has k values (k = 10 for decimal, k = 2 for binary) There are n integers

We would sort the integers by their least significant digits first using counting sort. Next sort it by its second least significant digit and so on Should it work? Sort by least significant digit: Sort by second least significant digit: Sort by most significant digit:

Note that if the most significant digit of one number a is less than that of another number b, then a comes before b. However, if the most significant digits are the same for a and b, and the difference is in the second most significant digits ( second most significant digit of b is less than that of a), then b comes before a. Why? Using stability property of counting sort

Complexity Analysis There are d counting sorts. Each counting sort works on n numbers with the range being 0 to k. Complexity of each counting sort is O(n + k) Overall O(d(n+k)) Storage?O(n + k)

If just counting sort were used for these numbers, what is the worst case complexity? O(k d + n) for storage and running time

We have sorted integers in worst case linear time. Can we do anything similar for real numbers? Real numbers can be sorted in average case linear complexity under a certain assumption. assuming that the real numbers are uniformly distributed if we know that the real numbers belong to a certain range, then uniform distribution means any two equal size subranges contain roughly the same number of real numbers in the list. say numbers are between 0 to 10, then the total number of numbers between 1 to 3 is roughly equal to that between 7 and 9 Bucket Sort

Bucket sort can be used to sort such real numbers in average case linear complexity The idea is similar to direct sequence hashing and counting sort. We have an array of n pointers. Each position has a pointer pointing to a linked list. We have a function which generates an integer from a real number, and adds the real number in the linked list at the position. We have a list of n real numbers.

The function is chosen such that the elements at the position j of the array are less than those at position k, if j < k. The function must be such that the number of elements in each linked list is roughly the same. Thus we just sort the linked lists at every position. Next we start from position 0, print out the sorted linked list, go to position 1, print out the sorted linked (assuming it is not empty) and so on. The output is a sorted list.

So if there are n real numbers and n positions in the array, then each linked has roughly a constant number c. Thus sorting complexity for a linked list is a constant. What is the printing complexity? Thus overall complexity is O(n)

How do we find such a function? We can if the numbers are uniformly distributed. Suppose we know the numbers are in an interval of size M. We divide the interval M into n equal size subintervals, and name them 0, 1,….n-1. Any real number in subinterval j is added to the list at position j in the array. Given any real number, the function actually finds the number of the subinterval where it belongs.

Firstly, why should such a function put roughly equal number of elements in each list? Second, evaluation time for the function must be a constant. That is, the function must find the interval number in constant time. Will give an example function which applies for a specific case, but in general such functions can be designed for more general cases as well. Because subranges of equal size contain roughly equal number of elements because of the uniform distribution assumption

Suppose the real numbers are in the interval [0, 1]. The function  nx  gets the correct interval number for any real number x in the interval [0, 1]. n is the number of ranges we consider What is  nx  for any number x is the interval [0, 1/n) ? any number x is the interval [1/n, 2/n) ? any number x is the interval [2/n, 3/n) ? …………………………….. Evaluation complexity for the function? O(1)

Thus we have divided the interval [0, 1] in n equal sized subintervals, and We have a function which returns the subinterval number of a real number in constant time. n =

Procedure Summary We have a list of n real numbers. Given any number x, we add it at the list at A[  nx  ]. We sort all of the individual linked lists. We output the sorted linked list at A[0], then at A[1], and so on. The output is sorted.

On an average every linked list contains a constant number of elements, (as a linked list contains elements from a subinterval, and all subintervals are equal sized, and hence they have equal number of elements because of uniform distribution) Happens if we choose n roughly equal to the number of inputs. Thus sorting complexity is constant for each linked list. Thus overall sorting and printing complexity is O(n).