TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Sorting: –Intro: aspects of sorting, different strategies –Insertion Sort, Selection Sort, –Quick Sort –Heap Sort, –Merge Sort –Theoretical lower bound for comparison-based sorting, –Digital Sorting: BucketSort, RadixSort
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT The Sorting Problem Input: A list L of data items with keys Output: A list L’ of the same data items in increasing order of keys, Caution! Don’t over use sorting! Do you really need to have it sorted, or will a dictionary do fine instead of a sorted array?
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Aspects of Sorting: Internal / External In-place / With additional memory Stable / unstable Comparison-based / Digital
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Strategies Insertion sorts: For each new element to add to the sorted set, look for the right place in that set to put the element... Linear insertion, Shell sort,... Selection sorts: In each iteration, search the unsorted set for the smallest (largest) remaining item to add to the end of the sorted set Straight selection, Heap sort,... Exchange sorts: Browse back and forth in some pattern, and whenever we are looking at a pair with wrong relative order, swap them... Quick sort, Merge sort...
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT (Linear) insertion sort ” In each iteration, insert the first item from unsorted part Its proper place in the sorted part” In-place A[0.. n -1] ! Data stored in A[0.. n -1] from i =1 to n -1: –Sorted data in A[0.. i -1] –Unsorted data in A[ i.. n-1 ] Scan sorted part for index s for insertion of the selected item Increase i iisi
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Analysis of Insertion Sort t 1 : n-1 passes t 2 : n-1 passes... t 3 : I = worst case no. of iterations in inner loop: I = 1+2+…+n-1 = (n-2)(n-1)/2 = n 2 -3n+2 t 4 : I passes t 5 : n-1 passes T: t 1 +t 2 +t 3 +t 4 +t 5 = 3*(n-1)+2*(n 2 -3n+2) = 3n-3+2n 2 -6n+4 = 2n 2 - 3n+1 thus O ( n 2 )in worst case, but …. good if file almost sorted Procedure InsertionSort (table A[0..n-1]): 1for i from 1 to n-1 do 2 s i; x A[i] 3 while j 1 and A[j-1]>x do 4 A[j] A[j-1] ; j j-1 5 A[j] x
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT (Straight) selection sort ” In each iteration, search the unsorted set for the smallest remaining item to add to the end of the sorted set” In-place A[0.. n -1] ! from i =1 to n -1: –Sorted data in A[0.. i -1] –Unsorted data in A[ i.. n-1 ] Find index s of smallest key Swap places for A[ i ] and A[ s ] i is i iisisisisisisisis
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Analysis of selection sort t 1 : n-1 passes t 2 : n-1 passes... t 3 : I = no. of iterations in inner loop: I = n-2 + n-3 + n = (n-2)(n-1)/2 = n 2 -3n+2 t 4 : I passes t 5 : n-1 passes T: t 1 +t 2 +t 3 +t 4 +t 5 = 3*(n-1)+2*(n 2 -3n+2) = 3n-3+2n 2 -6n+4 = 2n 2 - 3n+1...thus O ( n 2 )...rather bad! Procedure Selectionsort (table A[0..n-1]): 1for i from 0 to n-2 do 2 s i 3 for j from i+1 to n-1 do 4 if A[j] < A[s] then s j 5 A[i] A[s]
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Divide–and–conquer principle 1.divide a problem into smaller, independent sub- problems 2.conquer: solve the sub-problems recursively (or directly if trivial) 3.combine the solutions of the sub-problems
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Quick Sort Example – basic idea Procedure QuickSort (table A[l : r] ): 1.If l r return 2.select some element of A, e.g. A[l], as the so–called pivot element: p A[l] ; 3.partition A in–place into two disjoint sub-arrays A L, A R : m partition( A[l : r], p ) ; { determines m, l<m<r, and reorders A[l : r], such that all elements in A[l : m] are now p and all in A[m+1 : r] are now p.} 4.apply the algorithm recursively to A L and A R : quicksort ( A[l : m] ); {sorts A L } quicksort ( A[m +1 : r] ); {sorts A R }
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT In-place Partition int partition ( array A[l : r], key p ) { the pivot element p is A[l] } i l-1 ; j r+1 ; while ( true ) do do i i+1 while A[i] p if ( i < j ) A[i] A[j] else return j; This code will scan through the entire set once, and will as a max move each element once!...thus: Running time of partition : r – l + 1)
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Warning – details matter! Book: right most element as pivot, swaps it in at end, recurses at either side excluding the old pivot Slides: left most as pivot, includes it in area to partition, returns one position containing an element of size equal to the pivot – recurse on both halves including the pivot...and the way i ’s and j ’s are compared ( < or ), if they are incremented (decremented) before or after comparison, etc...
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Randomized Quicksort Randomization algorithmic design principle applicable where choosing among several alternative directions to avoid long sequences of bad decisions with high probability, independently of the input Select pivot randomly p A[random(l,r)] ; cannot construct bad input data...
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Quick sort – fine tuning.... When only a few elements remain (e.g., |A| < 4)... Overhead for recursion becomes significant Entire A is almost sorted (except for small, locally unsorted sections) Stop sorting by QuickSort, perform one global sort using Linear InsertionSort– although O(n 2 ) worst case, much better on allmost sorted data., which is the case now!
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Straight Insertion – the good case? If table is almost sorted? E.g., max 3 items unsorted, then remainder are bigger? t 1 : n-1 passes over this ”constant speed” code t 2 : n-1 passes... T : I = no. of iterations in inner loop (max 3 elements ”totaly unsorted”): I = (n-1)*3 worst case, all three allways in reverse order t 6 : n-1 passes T: t 1 +t 2 +t t 6 = 3*(n-1)+3*(n-1)= 3n-3...thus we have an algorithm in O ( n )...rather good! Procedure InsertionSort(table A[0..n-1]): 1for i from 1 to n-1 do 2 j i; tmp A[i] 3 while j>0 and tmp < A[j-1] do 4 j j-1 5 A[j+1] A[j] 6 A[j] tmp