6 -1 Chapter 6 Sorting
6 -2 Sorting A file of size n is a sequence of n items. Each item in the file is called a record. KeyOther fields Record 14DDD Record 22BBB Record 31AAA Record 45EEE Record 53CCC 1AAA 2BBB 3CCC 4DDD 5EEE original file sorted file
6 -3 Original pointer table 4DDD 2BBB 1AAA 5EEE 3CCC Record 1 Record 2 Record 3 Record 4 Record 5 File Sorted pointer table It is easier to search a particular element after sorting. (e.g. binary search)
6 -4 internal sorting: data stored in main memory ( more than 20 algorithms ) external sorting: data stored in auxiliary storage. stable sorting : the records with the same key have the same relative order as they have before sorting. Types of sorting
6 -5 n ,000 5,000 10,000 50, , ,000 a = 0.01n ,500 10, ,000 1,000,000 25,000, ,000,000 2,5000,000,000 b = 10n ,000 5,000 10,000 50, , ,000 1,000,000 5,000,000 a+b ,100 7,500 20, ,000 1,100,000 25,500, ,000,000 2,505,000,000 (a+b) n Time and space efficiency
6 -6 O notation f(n) is O(g(n)) if there exist positive integers a and b such that f(n) ≦ a . g(n) for all n ≧ b e.g. 4n n = O(n 2 ) ∵ n ≧ 100, 4n n ≦ 5n 2 4n n = O(n 3 ) ∵ n ≧ 10, 4n n ≦ 2n 3 f(n)= c 1 n k + c 2 n k-1 +…+ c k n + c k+1 = O(n k+j ), for any j ≧ 0 f(n)= c = O(1), c is a constant log m n = log m k . log k n,for some constants m and k log m n = O(log k n) = O(log n)
6 -7 Time complexity polynomial order: O(n k ), for some constant k. exponential order: O(d n ), for some d >1. NP-complete(intractable) problem: requiring exponential time algorithms. best sorting algorithm with comparisons: O(nlogn)
6 -8 nlog 10 n and n 2 n 1 × × × × × × × × × × × × × 10 7 nlog 10 n 1.0 × × × × × × × × × × × × × 10 7 n × × × × × × × × × × × × × 10 14
6 -9 Bubble sort 相鄰兩個資料相比, 若未符合順序, 則對調 (exchange) 之. e.g ( 由大而小 sort) pass 1 pass 2 pass 3 decreasing order nonincreasing order
6 -10 如果在某一個 pass 中,沒有任何相鄰兩項資料對調, 表示已經 sort 完畢 best case : 未 sort 之前, 已按順序排好, 需 1 pass worst case: 需 n-1 個 pass ( n 為資料量 ) 比較 (comparison) 次數最多為 : (n-1)+(n-2) = = O(n 2 ) Time complexity: O(n 2 ) n(n-1) 2 Time complexity of bubble sort
6 -11 void bubble(int x[], int n) { int hold, j, pass; int switched = TRUE; for (pass=0; pass < n-1 && switched == TRUE; pass++){ /*outer loop controls the number of passes */ switched = FALSE; /* initially no interchanges have */ /* been made on this pass */ for (j = 0; j < n-pass-1; j++) /* inner loop governs each individual pass */ if (x[j] > x[j+1]){ /* elements out of order */ /* an interchange is necessary */ switched = TRUE; hold = x[j]; x[j] = x[j+1]; x[j+1] = hold; } /* end if */ } /* end for */ } /* end bubble */
6 -12 Quicksort (partition exchange sort) e.g. 由小而大 sort (nondecreasing order) [ ] [ ] [ ] [ ] 26 [ ] [ ] 26 [ ] [ 1 5] 11 [19 15] 26 [ ] [ ] [ ] [48 37] 59 [61]
6 -13 Quicksort 方法 : 每組的第一個資料為基準 (pivot), 比它小的 資料放在左邊, 比它大的資料放在右邊, 然後以 pivot 中心, 將這組資料分成兩部份. worst case: 每次的基準資料恰為最大, 或最小 比較次數 :
log 2 n n ×2 = n n 2 ×4 = n n 4 Best case of quicksort best case: 每次分割 (partition) 時, 均分成大約相同個數的兩 部份.
6 -15 T(n): n 個資料所需時間 T(n) ≦ cn+2T(n/2), for some constant c. ≦ cn+2(c . n/2 + 2T(n/4)) ≦ 2cn + 4T(n/4) ≦ cnlog 2 n + nT(1) = O(nlogn)... Mathematical analysis of best case
6 -16 void partition(int x[], int lb, int ub, int *pj) { int a, down, temp, up; a = x[lb]; /* a is the element whose final position */ /* is sought */ up = ub; down = lb; while (down < up){ while (x[down] <= a && down < ub) down++; /* move up the array */ while (x[up] > a) up--; /* move down the array */ if (down < up){ /* interchange x[down] and x[up] */ temp = x[down]; x[down] = x[up]; x[up] = temp; } /* end if */ } /* end while */ x[lb] = x[up]; x[up] = a; *pj = up; } /* end partition */
6 -17 main program: if (lb >= ub) return; // array is sorted partition(x, lb, ub, j); // partition the elements of the // subarray such that one of the // elements(possibly x[lb]) is // now at x[j] (j is an output // parameter) and: // 1. x[i] <= x[j] for lb <= i < j // 2. x[i] >= x[j] for j < i <= ub // x[j] is now at its final position quick(x, lb, j-1); // recursively sort the subarray // between posiitons lb and j-1 quick(x, j+1, ub); // recursively sort the subarray // between positions j+1 and ub
6 -18 Selection sort e.g. 由大而小 sort pass 1 pass 2 pass 4 pass 3 方法 : 每次均從剩餘未 sort 部份之資料, 找出最大者 ( 或最 小者 ), 然後對調至其位置 比較次數 : (n-1)+(n-2) = =O(n 2 ) Time complexity: O(n 2 ) n(n-1) 2
6 -19 Binary tree sort e.g. input data: 8, 2, 9, 5, 6 建立 binary search tree: inorder traversal: output: worst case: input data: 2, 5, 6, 8, 9 比較次數 : best case: time complexity: i*2 i = O(nlogn), d log 2 n
6 -20 Heapsort e.g. input data: 將 input data 存入 almost complete binary tree Step 1: Construct a heap
Step 2: Adjust the heap (a) x[7] = 最大值 (b) x[6] = 第二大
(c) x[5] = 第三大 (d) x[4] = 第四大 (e) x[3] = 第五大 (f) x[2] = 第六大
6 -23 Final: x[0] x[2] x[6] x[5]x[4] x[7] x[3] x[1] The heapsort should be implemented by an array, not by a binary tree time complexity: O(nlogn) (in the worst case) (g) x[1] = 第七大
6 -24 Insertion sort e.g. ( 由大而小 sort) pass 1 pass 2 pass 3 pass 4
6 -25 方法 : 每次處理一個新的資料時, 一定 insert 至適當 的位置才停止. 需要 n-1 個 pass best case: 未 sort 前, 已按順序排好, 每個 pass 僅需一次比較, 共需 (n-1) 次比較. worst case: 未 sort 前, 按相反順序排好, 比較次數 為 : Time complexity: O(n 2 ) Insertion sort
6 -26 void insertsort(int x[], int n) { int i, k, y; //initially x[0] may be thought of as a sorted //file of one element. After each repetition of //the following loop, the elements x[0] through //x[k] are in order for (k = 1; k < n; k++){ /* Insert x[k] into the sorted file */ y = x[k]; /* Move down 1 position all elements greater*/ /* than y */ for (i = k-1; i >= 0 && y < x[i]; i--) x[i+1] = x[i]; /* Insert y at proper position */ x[i+1] = y; } /* end for */ } /* end insertsort */
6 -27 Shell sort (diminishing increment sort) 方法 : insertion sort 是相鄰兩個資料做比較, 再決定是否互換. Shell sort 則是相距為 d 的 兩個 " 比較與互換 "(compare and exchange). d 為任意大於 1 的整數, 但在最後一個 pass, d 必須為 1.
6 -28 e.g. 由大到小 sort pass 1: d 1 = = d1d pass 2: d 2 = = = 3 d d2d2 pass 3: d 3 = = = 2 d d3d3 pass 4: d 4 = = = 1 d d4d4
6 -29 每個 pass 均進行多組的 insertion sort. 若一開始 d=1, 則與 insertion sort 完全一樣 Knuth 證明 : d i-1 = 3d i +1, 即 d i = 為最好 d i time complexity: O(nlog 2 n)~O(n 3/2 ) 適合數百個資料之 sorting
6 -30 void shellsort (int x[], int n, int incrmnts[], int numinc) { int incr, j, k, span, y; for (incr = 0;incr < numic; incr++){ /* span is the size of the increment */ span = incrmnts[incr]; for (j = span; j< n;j++){ /* Insert element x[j] into its proper */ /* position within its subfile */ y = x[j]; for (k = j-span; k >= 0 && y < x[k]; k -= span) x[k+span] = x[k]; x[k+span] = y; } /* end for */ } /* end shellsort */
6 -31 address caculation sort (sorting by hashing) e.g. 由小到大 sort input data: 分成 10 個 subfile, 每個 subfile 是一個 linked list, 其資料由小而大排列 假設有 n 個資料, m 個 subfile best case: 1, 且 uniform distribution time complexity: O(n) n m n m worst case: >>1, 或 not uniform distribution time complexity: O(n 2 ) ~ ~
6 -32 Two-way merge Merge two sorted sequences into a single one. e.g. [ ][ ] [ ] merge 設兩個 sorted lists 長度各為 m, n time complexity: O(m+n)
6 -33 Merge sort e.g. ( 由小而大 ) [25] [57] [48] [37] [12] [92] [86] [33] [25 57] [37 48] [12 92] [33 86] [ ] [ ] [ ] pass 1 pass 2 pass 3 需要 log 2 n 個 pass time complexity: O(nlogn) It can be implemented by a recursive function.
6 -34 Radix sort e.g. 由小到大的 sort ) 1) 01,31,11,21 2) 02 3) 13 4) 5) 05 6) 26,16 7) 27 8) 9) 19, ) 01,02,05,09 1) 11,13,16,19 2) 21,26,27 3) 31 4) 5) 6) 7) 8) 9) input data pass 1 merge pass 2 merge
6 -35 方法 : (1) 每個資料不與其它資料做比較, 只看自己放在 何處 (2)pass 1: 從個位數開始處理, 若是個位數為 1, 則放在 bucket 1, 以此類推 (3)pass 2: 處理十位數 好處 : 速度快, time complexity: O(nlog p k) k: input data 之最大數 p: 以 p 為基底 log p k: 位數之長度 缺點 : 需要額外的 memory( 可使用 linked list, 將所需 memory 減至最少, 但會增加時間 ).