Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in some sensible order. You learned in the last chapter.

Similar presentations


Presentation on theme: "Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in some sensible order. You learned in the last chapter."— Presentation transcript:

1 Sorting

2 We live in a world obsessed with keeping information, and to find it, we must keep it in some sensible order. You learned in the last chapter that in worst case the searching time is proportional to the size of the list. O(n) The only way to reduce the searching time is to keep list ordered, as in binary search. O(log 2 n )

3 Sorting Sorting is the process of creating some sensible order. Sorting is closely related to searching in that we must sift through an unordered list a number of times looking for a particular element or a particular place to put an element.

4 Ordered List An ordered list is a list in which each entry contains a key, such that the keys are in order. That is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j.

5 Sorting Several years ago, it was estimated, more than half the time on many commercial computers was spent in sorting. Because sorting is so important, a great many algorithms have been devised for doing it. KNUTH dealt with about twenty-five sorting methods in his vol-3 and claims that they are “only a fraction of the algorithms that have been devised so far.”

6 Sorting Your text describes only a few of them: –Insertion Sort –Selection Sort –Shell Sort –Divide-and-Conquer Sorting –Mergesort for Linked Lists –Quicksort for Contiguous Lists –Heaps and Heapsort

7 Evaluate sorting methods We will evaluate sorting methods using “Big Oh” notation. In searching, the total amount of work done was clearly related to the number of comparisons of keys. The same observation is true for sorting algorithms, but sorting algorithms must also move their entries around the list or change pointers.

8 Required tasks when Sorting Compare the target item to other items Rearrange unordered items work done depends on: –number of comparisons –number of moves

9 Analysis As before, both the worst-case performance and the average performance of a sorting algorithm are of interest. To find the average, we shall consider what would happen if the algorithm were run on all possible orderings of the list (with n entries, there are n! such orderings altogether) and take the average of the results.

10 Sortable Lists We shall be particularly concerned with the performance of our sorting algorithms. In order to optimize performance of a program for sorting a list, we shall need to take advantage of any special features of the list’s implementation. For example, we shall see that some sorting algorithms work very efficiently on contiguous lists, but different implementations and different algorithms are needed to sort linked lists efficiently. Hence, to write efficient sorting programs, we shall need access to the private data members of the lists being sorted. Therefore, we shall add sorting functions as methods of our basic List data structures. The augmented list structure forms a new ADT that we shall call a Sortable_List.

11 class definition of Sortable Lists The class definition for a Sortable_List takes the following form. template class Sortable_list :: public List { public: // Add prototypes for sorting methods here. private: // Add prototypes for auxiliary functions here. }; This definition shows that a Sortable_list is a List with extra sorting methods. The base list class can be any of the List implementations of Chapter 6.

12 Record and Key We use a template parameter class called Record to stand for entries of the Sortable_list. As in Chapter 7, we assume that the class Record has the following properties: Every Record has an associated key of type Key. A Record can be implicitly converted to the corresponding Key. Moreover, the keys (hence also the records) can be compared under the operations ‘,’ ‘ >=,’ ‘ <=,’ ‘ ==,’ and ‘ !=.’

13 Instance of Sortable List a program for testing our Sortable_list might simply declare: Sortable_list test_list; Here, the client uses the type int to represent both records and their keys.

14 INSERTION SORT The name of this algorithms comes from the fact that as we build an ordered list from an unordered one, we do so by choosing an element from the unordered list and “inserting” it into its correct place in the ordered list.

15 Sortable Lists

16 algorithm Take the first item in the unsorted list Insert it into the correct position in the sorted list Repeat until the unsorted list is empty

17 implementation If we wish to design an implementation of an algorithm to do this, we must be more specific: i.e. -what data structure will be used? -where does the sorted list begin and end? -how do we “do” steps 1 & 2 above?

18 Ordered insertion An ordered list is an abstract data type, defined as a list in which each entry has a key, and such that the keys are in order; that is, if entry i comes before entry j in the list, then the key of entry i is less than or equal to the key of entry j. For ordered lists, we shall often use two new operations that have no counterparts for other lists, since they use keys rather than positions to locate the entry. One operation retrieves an entry with a specified key from the ordered list. Retrieval by key from an ordered list is exactly the same as searching. The second operation, ordered insertion, inserts a new entry into an ordered list by using the key in the new entry to determine where in the list to insert it. Note that ordered insertion is not uniquely specified if the list already contains an entry with the same key as the new entry, since the new entry could go into more than one position.

19 Ordered insertion

20 ordered insertion We begin with the ordered list shown in part (a) of the figure and wish to insert the new entry hen. In contrast to the implementation-independent version of insert from Section 7.3, we shall start comparing keys at the end of the list, rather than at its beginning. Hence we first compare the new key hen with the last key ram shown in the coloured box in part (a). Since hen comes before ram, we move ram one position down, leaving the empty position shown in part (b). We next compare hen with the key pig shown in the coloured box in part (b). Again, hen belongs earlier, so we move pig down and compare hen with the key dog shown in the coloured box in part (c). Since hen comes after dog, we have found the proper location and can complete the insertion as shown in part (d).

21 Sorting by Insertion To sort an unordered list, we think of –removing its entries one at a time and then –inserting each of them into an initially empty new list, always keeping the entries in the new list in the proper order according to their keys. This method is illustrated in Figure 8.2, which shows the steps needed to sort a list of six words. At each stage, the words that have not yet been inserted into the sorted list are shown in coloured boxes, and the sorted part of the list is shown in white boxes.

22 Sorting by Insertion

23 In the initial diagram, the first word hen is shown as sorted, since a list of length 1 is automatically ordered.

24 The main step of contiguous insertion sort

25 Sorting by Insertion The main step required to insert an entry denoted current into the sorted part of the list is shown in Figure 8.3. In the method that follows, we assume that the class Sorted_list is based on the contiguous List implementation of Section 6.2.2. Both the sorted list and the unsorted list occupy the same Lis t, member array, which we recall from Section 6.2.2 is called entry. The variable first_unsorted marks the division between the sorted and unsorted parts of this array.

26 insertion_sort( ) template void Sortable_list :: insertion_sort( ) /* Post: The entries of the Sortable_list have been rearranged so that the keys in all the entries are sorted into increasing order. Uses: Methods for the class Record; the contiguous List implementation of Chapter 6 */ { int first_unsorted; // position of first unsorted entry int position; // searches sorted part of list Record current; // holds the entry temporarily removed from list for (first_unsorted = 1; first_unsorted < count; first_unsorted ++ ) if (entry[first_unsorted] < entry[first_unsorted - 1]) { position = first_unsorted; current = entry[first_unsorted];//Pull unsorted entry out of the list. do { // Shift all entries until the proper position is found. entry[position] = entry[position - 1]; position -- ; // position is empty. } while (position > 0 && entry[position - 1] > current); entry[position] = current; }

27 insertion_sort( ) a list with only one entry is automatically sorted, the loop on first_unsorted = 1 starts with the second entry. –if it is in the correct position, nothing needs to be done. –otherwise, the new entry is pulled out of the list into the variable current, and –the do : : while loop pushes entries one position down the list until the correct position is found, and finally current is inserted there. –The case when current belongs in the first position of the list must be detected specially, since in this case there is no entry with a smaller key that would terminate the search. We treat this special case as the first clause in the condition of the do : : while loop, position > 0.

28 Analysis of Insertion Sort Analyze the performance of the contiguous version of the program.

29 Analysis of Insertion Sort Assumptions: We restrict our attention to the case when the list is initially in random order (meaning that all possible orderings of the keys are equally likely). When we deal with entry i, how far back must we go to insert it? There are i possible ways to move it: –not moving it at all, –moving it one position, –moving it up to i - 1 positions to the front of the list. Given randomness, these are equally likely. The probability that it need not be moved is thus 1/i, in which case only one comparison of keys is done, with no moving of entries.

30 Analysis of Insertion Sort inserting one entry The remaining case, in which entry i must be moved, occurs with probability (i - 1)/i. Let us begin by counting the average number of iterations of the do : : while loop. Since all of the i - 1 possible positions are equally likely, the average number of iterations is (1 + 2 +... + (i - 1)) / (i - 1)(p.647) = ((i - 1) i) / (2 (i - 1)) = i /2

31 Analysis of Insertion Sort One key comparison and one assignment are done for each of these iterations, with one more key comparison done outside the loop, along with two assignments of entries. Hence, in this second case, entry i requires, on average, i /2 + 1 comparisons and i /2 + 2 assignments.

32 Analysis of Insertion Sort When we combine the two cases with their respective probabilities, we have 1/i. 1 + (i - 1)/i. (i /2 + 1)comparisons = (i - 1)/2 and 1/i. 0 + (i - 1)/i. (i /2 + 2)assignments = (i + 3)/2 - 2/i

33 Analysis of Insertion Sort inserting all entries We wish to add these numbers from i = 2 t o i = n, but to avoid complications in the arithmetic, we first use the big- O notation to approximate each of these expressions by suppressing the terms bounded by a constant; that is, terms that are O(1). We thereby obtain i /2 + O(1) for both the number of comparisons and the number of assignments of entries. In making this approximation, we are really concentrating on the actions within the main loop and suppressing any concern about operations done outside the loop or variations in the algorithm that change the amount of work only by some bounded amount.

34 Analysis of Insertion Sort To add i /2 + O(1) from i = 2 to i = n, we apply Theorem A.1 on page 647. We also note that adding n terms, each of which is O ( 1 ), produces a result that is O(n ). We thus obtain for both the number of comparisons of keys and the number of assignments of entries. n i = 2 (½ i + O ( 1 )) = n i = 2 i + O ( n ) ½ = ¼ n 2 + O(n)

35 Analysis of Insertion Sort for both the number of comparisons of keys and the number of assignments of entries. As n becomes larger, the contributions from the term involving n 2 become much larger than the remaining terms collected as O(n ). Hence as the size of the list grows, the time needed by insertion sort grows like the square of this size. O(n 2 ). = ¼ n 2 + O(n)

36 Analysis of Insertion Sort The worst case for the contiguous version of insertion sort is when the keys are input in reversed order. This would require i - 1 comparisons and i + 1 assignments for the i th entry in the list, with n keys being checked, giving a worst case comparison count of n i = 2 (i - 1) = ½ (n-1) n

37 Analysis of Insertion Sort Worst Case: 543212 moves 453213 moves 34521... 23451 unsorted n-1 moves 12345 n moves sorted Total moves = 2 + 3 + 4 +... + (n-1) + n  1 + 2 + 3 + 4 +... + (n-1) + n = O(n 2 )

38 Linked Version of Insertion Sort For a linked version of insertion sort, since there is no movement of data, there is no need to start searching at the end of the sorted sublist. Instead, we shall traverse the original list, taking one entry at a time and inserting it in the proper position in the sorted list. The pointer variable last_sorted will reference the end of the sorted part of the list, and last_sorted->next will reference the first entry that has not yet been inserted into the sorted sublist. We shall let first_unsorted also point to this entry and use a pointer current to search the sorted part of the list to find where to insert *first_unsorted. If *first_unsorted belongs before the current head of the list, then we insert it there. Otherwise, we move current down the list until first_unsorted->entry entry and then insert *first_unsorted before *current. To enable insertion before *current we keep a second pointer trailing in lock step one position closer to the head than current. A sentinel is an extra entry added to one end of a list to ensure that a loop will terminate without having to include a separate check. Since we have

39 Analysis of Insertion Sort the node *first_unsorted is already in position to serve as a sentinel for the search, and the loop moving current is simplified. Finally, let us note that a list with 0 or 1 entry is already sorted, so that we can check these cases separately and thereby avoid trivialities elsewhere. The details appear in the following function and are illustrated in Figure 8.4.

40 Insertion Sort function template void Sortable_list :: insertion_sort( ) /* Post: The entries of the Sortable_list have been rearranged so that the keys in all the entries are sorted into nondecreasing order. Uses: Methods for the class Record. The linked List implementation of/ { Node *first_unsorted, // the first unsorted node to be inserted *last_sorted, // tail of the sorted sublist *current, // used to traverse the sorted sublist *trailing; // one position behind current if (head != NULL) { // Otherwise, the empty list is already sorted. last_sorted = head; // The first node alone makes a sorted sublist.

41 Insertion Sort function while (last_sorted->next != NULL) { first_unsorted = last_sorted->next; if (first_unsorted->entry entry) { // Insert *first_unsorted at the head of the sorted list: last_sorted->next = first_unsorted->next; first_unsorted->next = head; head = first_unsorted; } else { // Search the sorted sublist to insert *first_unsorted: trailing = head; current = trailing->next; while (first_unsorted->entry > current->entry) { trailing = current; current = trailing->next; }

42 Insertion Sort function // *first_unsorted now belongs between *trailing and *current. if (first_unsorted == current) last_sorted = first_unsorted; // already in right position else { last_sorted->next = first_unsorted->next; first_unsorted->next = current; trailing->next = first_unsorted; }

43 Analysis of Insertion Sort the node *first_unsorted is already in position to serve as a sentinel for the search, and the loop moving current is simplified. Finally, let us note that a list with 0 or 1 entry is already sorted, so that we can check these cases separately and thereby avoid trivialities elsewhere. The details appear in the following function and are illustrated in Figure 8.4.

44 Analysis of Insertion Sort

45 Linked Insertion Sort With no movement of data, there is no need to search from the end of the sorted sublist, as for the contiguous case. Traverse the original list, taking one entry at a time and inserting it in the proper position in the sorted list. Pointer last_sorted references the end of the sorted part of the list. Pointer first_unsorted == last_sorted->next references the first entry that has not yet been inserted into the sorted sublist.

46 Linked Insertion Sort Pointer current searches the sorted part of the list to nd where to insert *first_unsorted. If *first_unsorted belongs before the head of the list, then insert it there. Otherwise, move current down the list until first_unsorted->entry entry and then insert *first_unsorted before *current. To enable insertion before *current, keep a second pointer trailing in lock step one position closer to the head than current.

47 Linked Insertion Sort A sentinel is an extra entry added to one end of a list to ensure that a loop will terminate without having to include a separate check. Since last_sorted->next == first_unsorted, the node *first_unsorted is already in position to serve as a sentinel for the search, and the loop moving current is simplied. A list with 0 or 1 entry is already sorted, so by checking these cases separately we avoid trivialities elsewhere.

48 Sorting Algorithms and Average Case Number of Comparisons Simple Sorts –Straight Selection Sort –Bubble Sort –Insertion Sort More Complex Sorts –Quick Sort –Merge Sort –Heap Sort O(N 2 ) O(N*log N) 48

49 Selection Sort We can analyze the performance of function selection_sort in the same way that it is programmed. The main function does nothing except some bookkeeping and calling the subprograms. The function swap is called n - 1 times, and each call does 3 assignments of entries, for a total count of 3(n - 1). The function max_key is called n - 1 times, with the length of the sublist ranging from n down to 2. If t is the number of entries on the part of the list for which it is called, then max_key does exactly t - 1 comparisons of keys to determine the maximum. Hence, altogether, there are (n - 1) + (n - 2) +…+ 1 = 1/2 *n(n - 1) comparisons of keys, which we approximate to = ½ n 2 + O(n)

50 Analysis and comparison:

51 Selection sort moves the entries very efficiently but does many redundant comparisons. In its best case, insertion sort does the minimum number of comparisons, but it is inefficient in moving entries only one position at a time. Our goal now is to derive another method that avoids, as much as possible, the problems with both of these. Let us start with insertion sort and ask how we can reduce the number of times it moves an entry. Shell Sort

52 The reason why insertion sort can move entries only one position is that it compares only adjacent keys. If we were to modify it so that it first compares keys far apart, then it could sort the entries far apart. Afterward, the entries closer together would be sorted, and finally the increment between keys being compared would be reduced to 1, to ensure that the list is completely in order. This is the idea implemented in 1959 by D. L. SHELL in the sorting method bearing his name. This method is also sometimes called diminishing- increment sort.

53 Example of Shell Sort

54 Shell Sort we first sort all names that are at distance 5 from each other (so there will be only two or three names on each such list), then re-sort the names using increment 3, and finally perform an ordinary insertion sort (increment 1). You can see that, even though we make three passes through all the names, the early passes move the names close to their final positions, so that at the final pass (which does an ordinary insertion sort), all the entries are very close to their final positions so the sort goes rapidly.

55 Shell Sort we start with increment == count, where we recall that count represents the size of the List being sorted, and at each pass reduce the increment with a statement: increment = increment/3 + 1;

56 Analysis of Shell Sort Very large empirical studies have been made of Shell sort, and it appears that the number of moves, when n is large, is in the range of n 1:25 to 1.6n 1:25. This constitutes a substantial improvement over insertion sort.

57 57 Merge Sort Algorithm Cut the array in half. Sort the left half. Sort the right half. Merge the two sorted halves into one sorted array. [first] [middle] [middle + 1] [last] 74 36... 9575 29... 52 36 74... 95 29 52... 75

58 // Recursive merge sort algorithm template void MergeSort ( ItemType values[ ], int first, int last ) // Pre: first <= last // Post: Array values[ first.. last ] sorted into ascending order. { if ( first < last ) // general case {int middle = ( first + last ) / 2 ; MergeSort ( values, first, middle ) ; MergeSort( values, middle + 1, last ) ; // now merge two subarrays // values [ first... middle ] with // values [ middle + 1,... last ]. Merge( values, first, middle, middle + 1, last ) ; } 58

59 59 Using Merge Sort Algorithm with N = 16 16 8 8 4 4 4 4 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

60 60 Merge Sort of N elements: How many comparisons? The entire array can be subdivided into halves only log 2 N times. Each time it is subdivided, function Merge is called to re-combine the halves. Function Merge uses a temporary array to store the merged elements. Merging is O(N) because it compares each element in the subarrays. Copying elements back from the temporary array to the values array is also O(N). MERGE SORT IS O(N*log 2 N).

61 Figure 11-24

62 Figure 11-25

63 Mergesort

64

65

66


Download ppt "Sorting. We live in a world obsessed with keeping information, and to find it, we must keep it in some sensible order. You learned in the last chapter."

Similar presentations


Ads by Google