SORTING 2014-T2 Lecture 13 School of Engineering and Computer Science, Victoria University of Wellington COMP 103 Marcus Frean
RECAP ArraySet algorithms and costs Binary Search is O(log n) TODAY Two algorithms for sorting a list: Selection Sort Insertion Sort Announcements: Test is getting closer – keep up to date with your work so it’s not stressful next week. 2 RECAP-TODAY
Why Sort ? we can find items in an array much faster if they are sorted finding1-in-a-billion takes as long as the ArraySet takes to find1-in-30 But... constructing a sorted array one item at a time is very slow : O(n 2 ) for 10 million items you need 50,000,000,000,000 steps at 10nS per step ⇒ 500,000 seconds = 58 days. There are sorting algorithms that are much faster if you can sort whole array at once: O(n log (n) ) for 10 million items you need 250,000,000 steps ⇒ 2.5 seconds 3
Ways of sorting: Selecting sorts: Find the next largest/smallest item and put in place Builds the correct list in order Inserting sorts: For each item, insert it into an ordered sublist Builds a sorted list, but keeps changing it Compare-and-Swap sorts: Find two items that are out of order, and swap them Keeps “improving” the list 4
Ways to rate sorting algorithms Efficiency What is the (worst-case) order of the algorithm ? Is the average case much faster than worst-case ? Requirements on Data Does the algorithm need random-access data? Does it need anything more than “compare” and “swap” ? Space Usage Can the algorithm sort in-place, or does it need extra space ? Stability Is the algorithm “stable” (will it ever reverse the order of equivalent items?) Performance on Nearly Sorted Data Is the algorithm faster when given sorted (or nearly sorted) data ? (All items close to where they should be, or only a few out of order.) 5
Selecting Sorts Selection Sort (slow) HeapSort (fast) search for minimum here 6
Insertion Sort (slow) Shell Sort (pretty fast) Merge Sort (fast) (Divide and Conquer) Inserting Sorts
Compare and Swap Sorts Bubble Sort (easy but terrible) QuickSort (the fastest) (Divide and Conquer) things bubble up quickly, but bubble down slowly things bubble up quickly, but bubble down slowly and so on... 8
Other Sorts Radix Sort (only works on some data) Permutation Sort(very slow) Random Sort(Generate and Test)
Coding it: some design decisions... We could sort Lists ⇒ general and flexible but efficiency depends on how the List is implemented or just sort Arrays ⇒ less general efficiency is well defined NB: method toArray() converts any Collection to an array We could require items to be Comparable i.e. any item can call compareTo(otherObj)...on another ("natural ordering") OR provide a Comparator i.e. the sorter can call compare(obj1, obj2)...on any two items. We will sort an Array, using a Comparator: public void …Sort(E[] data, int size, Comparator comp) number of items 10 comparator array
Selection Sort public void selectionSort(E[ ] data, int size, Comparator comp){ // for each position, from 0 up, find the next smallest item // and swap it into place for (int i=0; i<size-1; i++){ int minIndex = i; for (int j=i+1; j<size; j++) if (comp.compare(data[j], data[minIndex]) < 0) minIndex=j; swap(data, i, minIndex); }
Efficiency of Selection Sort Cost: step ? best case: what is it ? cost: worst case: what is it ? cost: average case: what is it ? cost: 12
Selection Sort Analysis Efficiency worst-case: O(n 2 ) average case exactly the same. Requirements on Data Needs random-access data, but easy to modify for files Needs compare and swap Space Usage in-place Stability ?? Performance on Nearly Sorted Data No faster when given nearly sorted data 13
Insertion Sort public void insertionSort(E[] data, int size, Comparator comp){ // for each item, from 0, insert into place in the sorted region (0..i-1) for (int i=1; i<size; i++){ E item = data[i]; int place = i; while (place > 0 && comp.compare(item, data[place-1]) < 0){ data[place] = data[place-1]; place--; } data[place]= item; }
Efficiency of Insertion Sort first: what is the step ? best case: what is it ? cost: worst case: what is it ? cost: average case: what is it ? cost: almost sorted cost: 15
Insertion Sort Analysis Efficiency worst-case: O(n 2 ) average case: half the cost of worst case. Requirements on Data Needs random-access data Needs compare and swap Space Usage in-place Stability ?? Performance on Nearly Sorted Data Very fast (O(n)) when given nearly sorted data 16
InsertionSort, SelectionSort and BubbleSort are all slow (apart from Insertion on nearly-sorted lists) Problem: Selection compares every pair of items, ie. it ignores the results of previous comparisons. Insertion (and Bubble) only compare adjacent items only move items one step at a time Doing better is possible, but we will need recursion... Why so slow? Can we do better? 17