ISOM MIS 215 Module 7 – Sorting
ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners MIS215 Binary Search Search Techniques Sorting Techniques Bubblesort Basic Algorithms Fast Sorting algos (quicksort, mergesort) Hashtables Graphs, Trees Linked Lists Stacks, Queues List StructuresAdvanced structures
ISOM Today’s buzzwords Bubblesort The sorting method in which the big elements “bubble” up to the top Selection sort The sorting method where you “select” the smallest item to be placed in the appropriate place Insertion sort The sorting method where an item from the unsorted part of the array is “inserted” in the sorted part Divide-and-conquer techniques Techniques where the problem space is divided, solved separately, and merged Typically results in more efficient solutions Shellsort An advanced version of insertion sort where elements are inserted into their appropriate intervals Mergesort The divide-and-conquer sorting technique that divides the array into two, sorts them separately and merges them Partitioning The process of dividing an array into two parts based on the value of a “pivot” element Quicksort The sorting technique where the array is partitioned and sorted separately
ISOM Basic Sorting Techniques Bubble Sort Selection Sort Insertion Sort Sorting Objects Comparing the simple sorts
ISOM Bubble sort – what is it? The simplest sorting strategy – at most 5-6 lines of code The large elements “bubble up” to the end of the list
ISOM Bubble sort – basic strategy Start at left end Compare two adjacent items, put bigger one in the right position any reason why not the smaller one? what to do if they’re equal? Move to the right, repeat the comparison till you get to the (right) end What have you got? biggest one at the right end Now, start at the left end again, but move how far? Only to the right end minus 1. Repeating.. Why is this called a “bubble” sort?
ISOM Bubble sort - analysis How many comparisons? How many swaps? What’s the best case? What’s the worst case?
ISOM Selection sort Start at left end Keep track of smallest; when you find a new smaller, swap it to left end any reason why not the biggest? what to do if equal? Move to the right, repeating... What have you got? smallest at the left end Now start at left end plus 1, repeating…
ISOM Selection sort - analysis How many comparisons? How many copies? Best case? Worst case?
ISOM Insertion sort Start at left end plus one put this item in its proper place among the ones on its left now move one position to the right, put this item in its proper place among the ones the left, repeat....
ISOM Insertion sort - analysis How many comparisons? How many swaps? Best case? Worst case?
ISOM Efficiencies For random data (which means what?), all 3 run in O(N 2 ) so if it took 20 sec to sort 10,000 items, how long will it take to sort 20,000? Remember this ignores the constants... so...under the right conditions, one runs faster than another for smaller N, selection sort may be faster than bubble sort for data already nearly in order, insertion runs in (nearly) O(N)
ISOM Divide-and-Conquer techniques Take a problem space, divide into two, solve each, merge solution Why is this better? Often you can disregard one of the partitions! You can only divide into 2 log n times So, if merging takes O(1), we get total O(lg n)! And if merging takes O(n), we get total O(n lg n)! O(lg n) is way better than O(n) O(n lg n) is way better than O(n 2 )
ISOM Mergesort – a divide- conquer sorting strategy Divide the array into two equal (almost) halves Sort each half separately Can call mergesort recursively! Merge the two halves O(n) time complexity if we use O(n) extra space O(n) complexity even if we do not use extra space – how? Total runtime is O(n lg n) – average and optimal case.
ISOM Recursion Tree for Mergesort: An Example of 7 numbers Start Finish
ISOM Divide and Conquer Sorting: Quicksort In Quicksort We first choose some key from the list for which, we hope, about half the keys will come before and half after. Call this key the pivot. Then we partition the items so that all those with keys less than the pivot come in one sublist, and all those with greater keys come in another. Then we sort the two reduced lists separately, put the sublists together, and the whole list will be in order.
ISOM Trace of Quicksort: Example of the Same 7 Numbers Partition into (12) and (22); pivot=19 Sort (12) Sort (22) Combine into (12, 19, 22) Partition into (19, 12, 22) and (33, 35, 29); pivot=26 Sort(19, 12, 22) Partition into (29) and (35); pivot=33 Sort (29) Sort (35) Combine into (29, 33, 35) Combine into (12, 19, 22, 26, 29, 33, 35) Sort (26,33,35,29,19,12,22)
ISOM Recursion Tree for Quicksort: Example of the Same 7 Numbers
ISOM Quicksort: Contiguous Implementation <p<pp> = p low pivotlovation high p lowlastsmall i <p<p> = p? Goal (Postcondition): Loop Invariant:
ISOM Quicksort Implementation: (Contd.) p Restore the Invariant: Final Position: p lastsmall i <p<p> = p? p Swap lastsmall i <p<p> = p<p<p? <p<p low lastsmall high
ISOM Analysis of Quicksort Worst-case analysis: If the pivot is chosen poorly, one of the partitioned sublists may be empty and others reduced by only one entry. In this case, quicksort is slower than either insertion sort or selection sort. Average-case analysis: In its average case, quicksort performs O(n lg n) comparisons of keys in sorting a list of n entries in random order.
ISOM Quicksort: Choice of Pivot First or last entry: Worst case appears for a list already sorted or in reverse order. Median of three entry: Poor cases are rare. Random entry: Poor cases are very unlikely to occur. Ensure that the random number generation algorithm is fast.
ISOM Pointers and Pitfalls Many computer systems have a general-purpose sorting utility. If you can access this utility and it proves adequate for your application, then use it rather than writing a sorting program from scratch. In choosing a sorting method, take into account the ways in which the keys will usually be arranged before sorting, the size of the application, the amount of time available for programming, the need to save computer time and space, the way in which the data structures are implemented, the cost of moving data and cost of comparing keys. Divide-and-conquer is one of the most widely applicable and most powerful methods for designing algorithms. When faced with programming problem, see if its solution can be obtained by first solving the problem for two (or more) problems of the same general form but of a smaller size. If so, you may be able to formulate an algorithm that uses the divide-and- conquer method and program it using recursion.