Download presentation
Presentation is loading. Please wait.
Published byRose Potter Modified over 9 years ago
2
CHAPTER 8 SORTING 【学习内容】 【学习内容】 BASIC CONCEPTS 1. Introduction and Notation 2. Insertion Sort 3. Selection Sort 4. Shell Sort 5. Lower Bounds 6. Divide-and-Conquer Sorting 7. Mergesort for Linked Lists 8. Quicksort for Contiguous Lists 9. Heaps and Heapsort 10. Review: Comparison of Methods
3
Definitions of Sorting Problem Description: Given a list of records ( R 0, R 1, , R n 1 ) where each R i has a key value K i. There is a transitive ordering relation “<” on the keys such that for any two key values K i and K j, either K i = K j or K i < K j or K j < K i. Sorting is to find a permutation such that K (i 1) K (i) for all 0 < i n 1. The desired ordering is then ( R (0), R (1), , R (n 1) ). Remarks: No single sorting technique is the best in all cases. If there are identical key values, then is not unique. If s is a permutation which is not only sorted, but also stable — that is, if K i = K j for i < j, then R i precedes R j in the sorted list — then such a s is unique. A sorting method is stable if it generates s. An internal sort is to sort the list entirely in main memory. An external sort is to sort the file piece by piece in main memory.
4
概述 排序:将一组杂乱无章的数据按一定的规律 顺次排列起来。 排序:将一组杂乱无章的数据按一定的规律 顺次排列起来。 数据表 (datalist): 它是待排序数据对象的有 限集合。 数据表 (datalist): 它是待排序数据对象的有 限集合。 排序码 (key): 通常数据对象有多个属性域, 即多个数据成员组成, 其中有一个属性域可 用来区分对象, 作为排序依据。该域即为排 序码。每个数据表用哪个属性域作为排序码, 要视具体的应用需要而定。 排序码 (key): 通常数据对象有多个属性域, 即多个数据成员组成, 其中有一个属性域可 用来区分对象, 作为排序依据。该域即为排 序码。每个数据表用哪个属性域作为排序码, 要视具体的应用需要而定。
5
排序算法的稳定性 : 如果在对象序列中有两 个对象 r [i] 和 r [j], 它们的排序码 k [i] == k [j], 且 在排序之前, 对象 r [i] 排在 r [j] 前面。如果在排 序之后, 对象 r [i] 仍在对象 r [j] 的前面, 则称这 个排序方法是稳定的, 否则称这个排序方法 是不稳定的。 排序算法的稳定性 : 如果在对象序列中有两 个对象 r [i] 和 r [j], 它们的排序码 k [i] == k [j], 且 在排序之前, 对象 r [i] 排在 r [j] 前面。如果在排 序之后, 对象 r [i] 仍在对象 r [j] 的前面, 则称这 个排序方法是稳定的, 否则称这个排序方法 是不稳定的。 内排序与外排序 : 内排序是指在排序期间 数据对象全部存放在内存的排序;外排序 是指在排序期间全部对象个数太多,不能 同时存放在内存,必须根据排序过程的要 求,不断在内、外存之间移动的排序。 内排序与外排序 : 内排序是指在排序期间 数据对象全部存放在内存的排序;外排序 是指在排序期间全部对象个数太多,不能 同时存放在内存,必须根据排序过程的要 求,不断在内、外存之间移动的排序。
6
排序的时间开销 : 排序的时间开销是衡量 算法好坏的最重要的标志。排序的时间开销 可用算法执行中的数据比较次数与数据移动 次数来衡量。 排序的时间开销 : 排序的时间开销是衡量 算法好坏的最重要的标志。排序的时间开销 可用算法执行中的数据比较次数与数据移动 次数来衡量。 算法运行时间代价的大略估算一般都按平均 情况进行估算。对于那些受对象排序码序列 初始排列及对象个数影响较大的,需要按最 好情况和最坏情况进行估算。 算法运行时间代价的大略估算一般都按平均 情况进行估算。对于那些受对象排序码序列 初始排列及对象个数影响较大的,需要按最 好情况和最坏情况进行估算。 算法执行时所需的附加存储 : 评价算法好 坏的另一标准。 算法执行时所需的附加存储 : 评价算法好 坏的另一标准。
7
Sortable_lists template class Sortable_list: public List { public: // Add prototypes for sorting methods here. private: // Add prototypes for auxiliary functions here. };
8
Insertion Sort template void Sortable_list :: insertion sort( ) /* Post: The entries of the Sortable_list have been rearranged so that the keys in all the entries are sorted into nondecreasing order. Uses: Methods for the class Record; the contiguous List implementation of Chapter 6 */ { int first_unsorted; // position of first_unsorted entry int position; // searches sorted part of list Record current; // holds the entry temporarily removed from list for (first_unsorted = 1; first_unsorted < count; first_unsorted++) if (entry[first_unsorted] < entry[first_unsorted − 1]) { position = first_unsorted; current = entry[first_unsorted]; // Pull unsorted entry out of the list. do { // Shift all entries until the proper position is found. entry[position] = entry[position − 1]; position−−; // position is empty. } while (position > 0 && entry[position − 1] > current); entry[position] = current; } Worst case T p = O( ? )= O( n 2 ) The pivot key is placed at the right position with respect to the sorted sub-list.
9
Insertion Sort Linked Version
11
Insertion Sort A record R i is LOO ( Left Out of Order ) If there are k records that are LOO, then the worst case Good for k << n and n is not too large. If k = n, T p = O( n 2 ). Average T p = O( n 2 ). Variations: Replace the sequential search by the binary search. Search faster but still have to move the records. Use linked list for representation. Now insertion becomes simple, but we have to use the sequential search.
12
Selection Sort
14
Shell Sort
15
希尔排序 (Shell Sort) 希尔排序方法又称为缩小增量排序。该方法 的基本思想是 : 设待排序对象序列有 n 个对 象, 首先取一个整数 gap < n 作为间隔, 将 全部对象分为 gap 个子序列, 所有距离为 gap 的对象放在同一个子序列中, 在每一个 子序列中分别施行直接插入排序。然后缩小 间隔 gap, 例如取 gap = gap/2 ,重复上述 的子序列划分和排序工作。直到最后取 gap == 1, 将所有对象放在同一个序列中排序为 止。 希尔排序方法又称为缩小增量排序。该方法 的基本思想是 : 设待排序对象序列有 n 个对 象, 首先取一个整数 gap < n 作为间隔, 将 全部对象分为 gap 个子序列, 所有距离为 gap 的对象放在同一个子序列中, 在每一个 子序列中分别施行直接插入排序。然后缩小 间隔 gap, 例如取 gap = gap/2 ,重复上述 的子序列划分和排序工作。直到最后取 gap == 1, 将所有对象放在同一个序列中排序为 止。
16
Shell Sort
17
Lower Bounds of Sort
18
Optimal Sorting Time 【 Theorem 】 Any algorithm that sorts by comparisons only must have a worst case computing time of ( n log 2 n ). Proof: K 0 K 1 K 1 K 2 K 0 K 2 stop [0,1,2] stop [0,2,1] stop [2,0,1] TF TF K 0 K 2 K 1 K 2 stop [1,0,2] stop [1,2,0] stop [2,1,0] TF TF TF Decision tree for insertion sort on R 0, R 1, and R 2 When sorting n distinct elements, there are n! different possible results. Thus any decision tree must have at least n! leaves. If the height of the tree is k, then n! 2 k 1 (# of leaves in a complete binary tree) k log 2 n! + 1 Since n! (n/2) n/2 and log 2 n! (n/2)log 2 (n/2) = ( n log 2 n ) Therefore T p = k c n log 2 n.
19
Divide and Conquer Sorting void Sortable_list :: sort( ) { if the list has length greater than 1 { partition the list into lowlist, highlist; lowlist.sort( ); highlist.sort( ); combine(lowlist, highlist); }
20
Merge Sort Iterative Merge Sort Sketch of the idea: list 0123 …… n4n4n3n3n2n2n1n1 …… ……
21
21 25 49 25* 16 08 21 25 49 25* 16 08 21 25 49 25 49 21 25* 16 08 25* 16 08 21 25 49 25* 16 08 16 08 25* 25 49 21 递归递归 25* 16 08 49 25 25* 16 08 21 25 49
22
Merge Sort
24
Divide Linked List in Half
25
Analysis of Mergesort
26
Quick Sort —— gives the best average T p Sketch of the idea: The pivot K i is placed at the right position with respect to the whole list. R 0 R 1 R 2 R i R j R n 2 R n 1 pivot < K 0 > K 0 < K 0 > K 0 …… K 0 K 0 …… i < j R j R i+1 R j 1 R i Continue for sublist until... R j R i Right position for R 0 [ R j R 1 R 2 R j 1 ] R 0 [ R i R n 2 R n 1 ] Smaller Universes
27
Quicksort
28
Quick Sort
30
Partitioning the List
31
Quick Sort Worst case T p = O( ? ) n2n2 if the list is already in sorted order. Lucky case: [...... ] [...... ] T ( n ) = O( n ) + 2 T ( n / 2 ) = O( n ) + 2 [ O( n / 2 ) + 2 T ( n / 2 2 ) ] = 2 O( n ) + 2 2 T ( n / 2 2 ) =...... = k O( n ) + 2 k T ( n / 2 k ) n / 2 k = 1 k = log 2 n = O( n log 2 n ) + n T ( 1 ) = O( n log 2 n ) 【 Lemma 7.1 】 Let T avg ( n ) be the expected time for quicksort to sort a file with n records. Then there exists a constant k such that T avg ( n ) k n log e n for n 2. Proof: By induction. See p.331~332. Space: S lucky = O( ln n ); S worst = O( n ).
32
Trees, Lists, and Heaps If the root of the tree is in position 0 of the list, then the left and right children of the node in position k are in positions 2k + 1 and 2k + 2 of the list, respectively. If these positions are beyond the end of the list, then these children do not exist.
33
The Definition of Heaps DEFINITION A heap is a list in which each entry contains a key, and, for all positions k in the list, the key at position k is at least as large as the keys in positions 2k and 2k + 1, provided these positions exist in the list. 完全二叉树顺序表示 K i K 2i+1 K i K 2i+2 完全二叉树顺序表示 K i K 2i+1 && K i K 2i+2
34
Heap Sort b da c List [1] [2][3] [4] Adjust the list into a max heap d c b Exchange the 1st record with the last one d b Decrement the heap size and go to step 1 c bc a a b b a
35
Trace of heap_sort
37
Insertion into a heap
38
Heap Sort Analysis of heapsort : Therefore T p = O( n ln n ). With a fix amount of additional storage, it is slightly slower than merge sort using O( n ) additional space, but is faster than merge sort using O( 1 ) additional space.
39
Radix Sort 基数排序 — sorting records that have several keys K i j ::= the j-th key of record R i K i 0 ::= the most significant key of record R i Suppose that the record R i has r keys. K i r 1 ::= the least significant key of record R i A list of records R 0,..., R n 1 is lexically sorted with respect to the keys K 0, K 1,..., K r 1 iff That is, K i 0 = K i+1 0,..., K i l = K i+1 l, K i l+1 < K i+1 l+1 for some l < r 1. 〖 Example 〗 A deck of cards sorted on 2 keys K 0 [Suit] < < < K 1 [Face value]2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < J < Q < K < A Sorting result : 2 ... A 2 ... A 2... A 2 ... A
40
§8 Radix Sort 1. MSD ( Most Significant Digit ) Sort 最高位优先 Sort on K 0 : for example, create 4 bins for the suits 33 33 55 55 AA AA 4 4 Sort each bin independently (using insertion sort) Stack the 4 bins
41
§8 Radix Sort 2. LSD ( Least Significant Digit ) Sort 最低位优先 Sort on K 1 : for example, create 13 bins for the face values 22 22 33 33 4 4 55 55 AA AA ... Reform them into a single pile AA AA 33 33 22 22 Create 4 bins and resort using any stable method Note: If the number of the least significant keys is O( n ), then a bin sort requires only O( n ) time, thus making it a very fast sorting technique.
42
Radix Sort Remark: MSD or LSD can be used to sort a single key, if we interpret this key as a composite of several keys. For example, for 0 K 999 we can break it into 3 keys: K = K 0 100 + K 1 10 + K 2 1. 3. Radix Sort —— decompose the (integer) key into digits using a radix r 〖 Example 〗 K = 12, r = 10 K 0 = 1, K 1 = 2 K = 1100, r = 10 K 0 = K 1 = 1, K 2 = K 3 = 0
44
基数排序的 “ 分配 ” 与 “ 收集 ” 过程 第一趟 614 921 485 637 738 101 215 530 790 306 第一趟分配(按最低位 i = 3 ) re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9] 614 738 921 485 637 101 215 530 790 306 fr[0] fr[1] fr[2] fr[3] fr[4] fr[5] fr[6] fr[7] fr[8] fr[9] 第一趟收集 530 790 921 101 614 485 215 306 637 738
45
基数排序的 “ 分配 ” 与 “ 收集 ” 过程 第二趟 614 921 485 637 738 101 215 530 790 306 第二趟分配(按次低位 i = 2 ) re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9] 614 738 921 485 637 101 215 530 790 306 fr[0] fr[1] fr[2] fr[3] fr[4] fr[5] fr[6] fr[7] fr[8] fr[9] 第二趟收集 530 790 921 101 614 485 215 306 637 738
46
基数排序的 “ 分配 ” 与 “ 收集 ” 过程 第三趟 614 921 485 637 738 101 215 530 790 306 第三趟分配(按最高位 i = 1 ) re[0] re[1] re[2] re[3] re[4] re[5] re[6] re[7] re[8] re[9] 614 738 921 485 637 101 215 530 790 306 fr[0] fr[1] fr[2] fr[3] fr[4] fr[5] fr[6] fr[7] fr[8] fr[9] 第三趟收集 530 790 921 101 614 485 215 306 637 738
48
Linked Implementation of Radix Sort
49
Sorting Method, Linked Radix Sort
50
Auxiliary Functions, Linked Radix Sort
51
The time used by radix sort is O(nk), where n is the number of items being sorted and k is the number of characters in a key. The relative performance of radix sort to other methods will relate to the relative sizes of nk and n lg n; that is, of k and lg n. If the keys are long but there are relatively few of them, then k is large and lg n relatively small, and other methods (such as mergesort) will outperform radix sort. If k is small (the keys are short) and there are a large number of keys, then radix sort will be faster than any other method we have studied. Analysis of Radix Sort
52
若每个排序码有 d 位, 需要重复执行 d 趟 “ 分 配 ” 与 “ 收集 ” 。每趟对 n 个对象进行 “ 分配 ” , 对 radix 个队列进行 “ 收集 ” 。总时间复杂度为 O ( d ( n+radix ) ) 。 若每个排序码有 d 位, 需要重复执行 d 趟 “ 分 配 ” 与 “ 收集 ” 。每趟对 n 个对象进行 “ 分配 ” , 对 radix 个队列进行 “ 收集 ” 。总时间复杂度为 O ( d ( n+radix ) ) 。 若基数 radix 相同, 对于对象个数较多而排序 码位数较少的情况, 使用链式基数排序较好。 若基数 radix 相同, 对于对象个数较多而排序 码位数较少的情况, 使用链式基数排序较好。 基数排序需要增加 n+2radix 个附加链接指针。 基数排序需要增加 n+2radix 个附加链接指针。 基数排序是稳定的排序方法。 基数排序是稳定的排序方法。
53
各种排序方法的比较
54
Homework P327 Exercises 8.2 E1(d), E2, E3, E4, E5 P332 Exercises 8.3 E1(d), E2 P335 Exercises 8.4 E2 P338 Exercises 8.5 E3 P343 Exercises 8.6 E1, E2 P360 Exercises 8.8 E1, E3, E4, E8 P370 Exercises 8.9 E1(f)(h), E2(d) P396 Exercises 9.5 E2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.