Data Structures Chapter 7: Sorting 7-1. Sorting (1) List: a collection of records –Each record has one or more fields. –Key: used to distinguish among.

Slides:



Advertisements
Similar presentations
Homework - Chapter 03 Solution.
Advertisements

2010/12/12
Chapter 7 Sorting Part I. 7.1 Motivation list: a collection of records. keys: the fields used to distinguish among the records. One way to search for.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Divide-and-Conquer. 什麼是 divide-and-conquer ? Divide 就是把問題分割 Conquer 則是把答案結合起來.
第七章 抽樣與抽樣分配 蒐集統計資料最常見的方式是抽查。這 牽涉到兩個問題: 抽出的樣本是否具有代表性?是否能反應出母體的特徵?
資料的搜尋 搜尋的基本概念 循序搜尋法(Sequential Search) 二元搜尋法(Binary Search)
: A-Sequence 星級 : ★★☆☆☆ 題組: Online-judge.uva.es PROBLEM SET Volume CIX 題號: Problem D : A-Sequence 解題者:薛祖淵 解題日期: 2006 年 2 月 21 日 題意:一開始先輸入一個.
:Word Morphing ★★☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 10508:word morphing 解題者:楊家豪 解題日期: 2006 年 5 月 21 日 題意: 第一行給你兩個正整數, 第一個代表下面會出現幾個字串,
1 Q10276: Hanoi Tower Troubles Again! 星級 : ★★★ 題組: Online-judge.uva.es PROBLEM SET Volume CII 題號: Q10276: Hanoi Tower Troubles Again! 解題者:薛祖淵 解題日期: 2006.
Section 1.2 Describing Distributions with Numbers 用數字描述分配.
Instructor: Ching-Chi Lin 林清池 助理教授
指導教授:陳淑媛 學生:李宗叡 李卿輔.  利用下列三種方法 (Edge Detection 、 Local Binary Pattern 、 Structured Local Edge Pattern) 來判斷是否為場景變換,以方便使用者來 找出所要的片段。
五小專案 黃詩晴 章乃云. 目錄 計算機 智慧盤 拼圖 記憶大挑戰 數學題庫 心得 參考文獻.
Lecture 8 Median and Order Statistics. Median and Order Statistics2 Order Statistics 問題敘述 在 n 個元素中,找出其中第 i 小的元素。 i = 1 ,即為找最小值。 i = n ,即為找最大值。 i = 或 ,即為找中位數。
: ShellSort ★★☆☆☆ 題組: Problem D 題號: 10152: ShellSort 解題者:林一帆 解題日期: 2006 年 4 月 10 日 題意:烏龜王國的烏龜總是一隻一隻疊在一起。唯一改變烏龜位置 的方法為:一隻烏龜爬出他原來的位置,然後往上爬到最上方。給 你一堆烏龜原來排列的順序,以及我們想要的烏龜的排列順序,你.
STAT0_sampling Random Sampling  母體: Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣 本,若每一樣本被抽出的機率是一樣的,這樣本稱 為隨機樣本 (random sample)
: Matrix Decompressing ★★★★☆ 題組: Contest Volumes with Online Judge 題號: 11082: Matrix Decompressing 解題者:蔡權昱、劉洙愷 解題日期: 2008 年 4 月 18 日 題意:假設有一矩陣 R*C,
JAVA 程式設計與資料結構 第十四章 Linked List. Introduction Linked List 的結構就是將物件排成一列, 有點像是 Array ,但是我們卻無法直接經 由 index 得到其中的物件 在 Linked List 中,每一個點我們稱之為 node ,第一個 node.
期中考參考解答 Date: 2005/12/14 Multimedia Information Systems.
8.1 何謂高度平衡二元搜尋樹 8.2 高度平衡二元搜尋樹的加入 8.3 高度平衡二元搜尋樹的刪除
: The Playboy Chimp ★★☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 10611: The Playboy Chimp 解題者:蔡昇宇 解題日期: 2010 年 2 月 28 日 題意:給一已排序的數列 S( 升冪.
Monte Carlo Simulation Part.2 Metropolis Algorithm Dept. Phys. Tunghai Univ. Numerical Methods C. T. Shih.
Greedy Algorithms. 2 Greedy Methods ( 描述 1) * 解最佳化問題的演算法, 其解題過程可看成是由一 連串的決策步驟所組成, 而每一步驟都有一組選擇 要選定. * 一個 greedy method 在每一決策步驟總是選定那目 前看來最好 的選擇. *Greedy.
Introduction to Java Programming Lecture 17 Abstract Classes & Interfaces.
:Problem D: Bit-wise Sequence ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 10232: Problem D: Bit-wise Sequence 解題者:李濟宇 解題日期: 2006 年 4 月 16.
: The largest Clique ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11324: The largest Clique 解題者:李重儀 解題日期: 2008 年 11 月 24 日 題意: 簡單來說,給你一個 directed.
計算機概論 - 排序 1 排序 (Sorting) 李明山 編撰 ※手動換頁.
Chapter 2 Getting Started Insertion Sort: 能有效率地排序小數字的演算法 範例 :
: Multisets and Sequences ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11023: Multisets and Sequences 解題者:葉貫中 解題日期: 2007 年 4 月 24 日 題意:在這個題目中,我們要定義.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Internal.
:Nuts for nuts..Nuts for nuts.. ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 10944:Nuts for nuts.. 解題者:楊家豪 解題日期: 2006 年 2 月 題意: 給定兩個正整數 x,y.
6 -1 Chapter 6 Sorting Sorting A file of size n is a sequence of n items. Each item in the file is called a record. KeyOther fields Record 14DDD.
資料結構實習-一 參數傳遞.
第六章 陣列.
Lecture 7 Sorting in Linear Time. Sorting in Linear Time2 7.1 Lower bounds for sorting 本節探討排序所耗用的時間複雜度下限。 任何一個以比較為基礎排序的演算法,排序 n 個元 素時至少耗用 Ω(nlogn) 次比較。
1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2008.
公用品.  該物品的數量不會因一人的消費而受到 影響,它可以同時地被多人享用。 角色分配  兩位同學當我的助手,負責:  其餘各人是投資者,每人擁有 $100 , 可以投資在兩種資產上。  記錄  計算  協助同學討論.
: Beautiful Numbers ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11472: Beautiful Numbers 解題者:邱經達 解題日期: 2011 年 5 月 5 日 題意: 若一個 N 進位的數用到該.
Section 4.2 Probability Models 機率模式. 由實驗看機率 實驗前先列出所有可能的實驗結果。 – 擲銅板:正面或反面。 – 擲骰子: 1~6 點。 – 擲骰子兩顆: (1,1),(1,2),(1,3),… 等 36 種。 決定每一個可能的實驗結果發生機率。 – 實驗後所有的實驗結果整理得到。
Teacher : Ing-Jer Huang TA : Chien-Hung Chen 2015/6/25 Course Embedded Systems : Principles and Implementations Weekly Preview Question CH 2.4~CH 2.6 &
JAVA 程式設計與資料結構 第二十章 Searching. Sequential Searching Sequential Searching 是最簡單的一種搜尋法,此演 算法可應用在 Array 或是 Linked List 此等資料結構。 Sequential Searching 的 worst-case.
資料結構實習-二.
演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.
Review of Chapter 9 張啟中. Inheritance Hierarchy Min PQ Max PQ Min Heap Mergeable Min PQ DeapMin-Max Min-LeftistMin-SkewMinFHeap MinBHeap DEPQMax Heap Symmetric.
: Flip Sort ★★☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 10327: Flip Sort 解題者:歐子揚 解題日期: 2010 年 2 月 26 日 題意:在這個問題中使用一種排序方式 (Flip) ,意思就是 只能交換相鄰的.
845: Gas Station Numbers ★★★ 題組: Problem Set Archive with Online Judge 題號: 845: Gas Station Numbers. 解題者:張維珊 解題日期: 2006 年 2 月 題意: 將輸入的數字,經過重新排列組合或旋轉數字,得到比原先的數字大,
Chapter 10 m-way 搜尋樹與B-Tree
1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2009.
2005/7 Linear system-1 The Linear Equation System and Eliminations.
INFORMATION RETRIEVAL AND EXTRACTION 作業: Program 1 第十四組 組員:林永峰、洪承雄、謝宗憲.
: Problem E Antimatter Ray Clearcutting ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11008: Problem E Antimatter Ray Clearcutting 解題者:林王智瑞.
資料結構實習-六.
1 Introduction to Java Programming Lecture 3 Mathematical Operators Spring 2008.
1 Introduction to Java Programming Lecture 2: Basics of Java Programming Spring 2010.
:Count the Trees ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 10007:Count the Trees 解題者:楊家豪 解題日期: 2006 年 3 月 題意: 給 n 個點, 每一個點有自己的 Label,
: Finding Paths in Grid ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11486: Finding Paths in Grid 解題者:李重儀 解題日期: 2008 年 10 月 14 日 題意:給一個 7 個 column.
:Problem E.Stone Game ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 10165: Problem E.Stone Game 解題者:李濟宇 解題日期: 2006 年 3 月 26 日 題意: Jack 與 Jim.
第 1 章 PC 的基本構造. 本章提要 PC 系統簡介 80x86 系列 CPU 及其暫存器群 記憶體: Memory 80x86 的分節式記憶體管理 80x86 的 I/O 結構 學習組合語言的基本工具.
: How many 0's? ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 11038: How many 0’s? 解題者:楊鵬宇 解題日期: 2007 年 5 月 15 日 題意:寫下題目給的 m 與 n(m
Data Structures Chapter 1: Basic Concepts 1-1. bit: basic unit of information. data type: interpretation of a bit pattern. e.g integer 65 ASCII.
單元7: Heap 定義 for priority queues Leftist trees Binomial heap
Data Structures (1st Exam). 1.[5] Suppose there are only two constructors for a class, one that passes in a single integer parameter called amount, for.
The 2-Way Merge Problem 學生:曾羽銘.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Data Structures -3 rd test- 2015/06/01 授課教授:李錫智. Question 1 Suppose we use an array to implement a sorted list. Please answer the following questions.
Data Structures -3 st exam- 授課教師 : 李錫智 教授. 1. [5] Assume we have a binary tree which is implemented in a pointer-based scheme. Describe how to know the.
Advanced Sorting 7 2  9 4   2   4   7
Presentation transcript:

Data Structures Chapter 7: Sorting 7-1

Sorting (1) List: a collection of records –Each record has one or more fields. –Key: used to distinguish among the records. KeyOther fields Record 14DDD Record 22BBB Record 31AAA Record 45EEE Record 53CCC KeyOther fields 1AAA 2BBB 3CCC 4DDD 5EEE original list sorted list 7-2

Sorting (2) It is easier to search a particular element after sorting. (e.g. binary search) best sorting algorithm with comparisons: O(nlogn) 7-3 Original pointer list 4DDD 2BBB 1AAA 5EEE 3CCC Record 1 Record 2 Record 3 Record 4 Record 5 File Sorted pointer list

Sequential Search Applied to an array or a linked list. Data are not sorted. Example: – search 6: successful search – search 4: unsuccessful search # of comparisons for a successful search on record key i is i. Time complexity –successful search: (n+1)/2 comparisons = O(n) –unsuccessful search: n comparisons = O(n) 7-4

Code for Sequential Search template int SeqSearch (E *a, const int n, const K& k) {// K: key, E: node element // Search a[1:n] from left to right. Return least i such that // the key of a[i] equals k. If there is no i, return 0. int i; for (i = 1 ; i <= n && a[i] != k ; i++ ); if (i > n) return 0; return i; } 7-5

Motivation of Sorting A binary search needs O(log n) time to search a key in a sorted list with n records. Verification problem: To check if two lists are equal. – – Sequential searching method: O(mn) time, where m and n are the lengths of the two lists. Compare after sort: O(max{nlogn, mlogm}) –After sort: and –Then, compare one by one. 7-6

Categories of Sorting Methods stable sorting : the records with the same key have the same relative order as they have before sorting. –Example: Before sorting a b –After sorting a 7 b 9 internal sorting: All data are stored in main memory (more than 20 algorithms). external sorting: Some of data are stored in auxiliary storage. 7-7

Insertion Sort 7-8 e.g. ( 由小而大 sort , nondecreasing order ) pass 1 pass 2 pass 3 pass 4

Insertion Sort 方法 : 每次處理一個新的資料時,由右而左 insert 至其適當的位置才停止。 需要 n-1 個 pass best case: 未 sort 前,已按順序排好。每個 pass 僅需一次比較, 共需 (n-1) 次比較 worst case: 未 sort 前, 按相反順序排好。比 較次數為: Time complexity: O(n 2 ) 7-9

Insertion into a Sorted List template void Insert(const T& e, T* a, int i) // Insert e into the nondecreasing sequence a[1], …, a[i] such that the resulting sequence is also ordered. Array a must have space allocated for at least i+2 elements { a[0] = e; // Avoid a test for end of list (i<1) while (e < a[i]) { a[i+1] = a[i]; i--; } a[i+1] = e; } 7-10

Insertion Sort template void InsertionSort(T* a, const int n) // Sort a[1:n] into nondecreasing order { for (int j = 2; j <= n; j++) { T temp = a[j]; Insert(list[j], a, j-1); } 7-11

Quick Sort Input: 26, 5, 37, 1, 61, 11, 59, 15, 48, 19 R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 LeftRight [ ]110 [ ]110 [ ]110 [ ]26[ ]15 [15]11[1915]26[ ] [1915]26[ ] [ ] [4837]59[61] [61]

Quick Sort Quick sort 方法: 以每組的第一個資料為 基準 (pivot) ,把比它小的資料放在左邊, 比它大的資料放在右邊,之後以 pivot 中心, 將這組資料分成兩部份。然後,兩部分資 料各自 recursively 執行相同方法。 平均而言, Quick sort 有很好效能。 7-13

Code for Quick Sort void QuickSort(Element* a, const int left, const int right) // Sort a[left:right] into nondecreasing order. // Key pivot = a[left]. // i and j are used to partition the subarray so that // at any time a[m] = pivot, m > j. // It is assumed that a[left] <= a[right+1]. { if (left < right) { int i = left, j = right + 1, pivot = a[left]; do { do i++; while (a[i] < pivot); do j--; while (a[j] > pivot); if (i<j) swap(a[i], a[j]); } while (i < j); swap(a[left], a[j]); QuickSort(a, left, j–1); QuickSort(a, j+1, right); } 7-14

Time Complexity of Quick Sort Worst case time complexity: 每次的基準恰為最大, 或最小。所需比較次數: Best case time complexity : O(nlogn) – 每次分割 (partition) 時, 都分成大約相同數量的 兩部份 。 log 2 n n ×2 = n n 2 ×4 = n n 4

Mathematical Analysis of Best Case T(n): Time required for sorting n data elements. T(1) = b, for some constant b T(n) ≤ cn + 2T(n/2), for some constant c ≤ cn + 2(cn/2 +2T(n/4)) ≤ 2cn + 4T(n/4) : ≤ cn log 2 n + T(1) = O(n log n) 7-16

Variations of Quick Sort Quick sort using a median of three: –Pick the median of the first, middle, and last keys in the current sublist as the pivot. Thus, pivot = median{K l, K (l+n)/2, K n }. Use the selection algorithm to get the real median element. –Time complexity of the selection algorithm: O(n). 7-17

Two-way Merge Merge two sorted sequences into a single one. 設兩個 sorted lists 長度各為 m, n Time complexity: O(m+n) 7-18 [ ][ ] merge [ ]

Iterative Merge Sort

Iterative Merge Sort template void MergeSort(T* a, const int n) // Sort a[1:n] into nondecreasing order. { T* tempList = new T[n+1]; // l: length of the sublist currently being merged. for (int l = 1; l < n; l *= 2) { MergePass(a, tempList, n, l); l *= 2; MergePass(tempList, a, n, l); //interchange role of a and tempList } delete[ ] tempList; // free an array } 7-20

Code for Merge Pass template void MergePass(T *initList, T *resultList, const int n, const int s) {// Adjacent pairs of sublists of size s are merged from // initList to resultList. n: # of records in initList. for (int i = 1; // i is first position in first of the sublists being merged i <= n-2*s+1; // enough elements for two sublists of length s? i+ = 2*s) Merge(initList, resultList, i, i+s-1, i+2*s-1); // merge [i:i+s-1] and [i+s:i+2*s-1] // merge remaining list of size < 2 * s if ((i + s -1) < n ) Merge(initList, resultList, i, i + s -1, n); else copy(initList + i, initList + n + 1, resultList + i); } 7-21

Analysis of Merge Sort Merge sort is a stable sorting method. Time complexity: O(n log n) – passes are needed. –Each pass takes O(n) time. 7-22

Recursive Merge Sort Divides the list to be sorted into two roughly equal parts: –left sublist [left : (left+right)/2] –right sublist [(left+right)/2 +1 : right] These two sublists are sorted recursively. Then, the two sorted sublists are merged. To eliminate the record copying, we associate an integer pointer (instead of real link) with each record. 7-23

Recursive Merge Sort

Code for Recursive Merge Sort template int rMergeSort(T* a, int* link, const int left, const int right) {// a[left:right] is to be sorted. link[i] is initially 0 for all i. // rMergSort returns the index of the first element in the sorted chain. if (left >= right) return left; int mid = (left + right) /2; return ListMerge(a, link, rMergeSort(a, link, left, mid), // sort left half rMergeSort(a, link, mid + 1, right)); // sort right half } 7-25

Merging Sorted Chains tamplate int ListMerge(T* a, int* link, const int start1, const int start2) {// The Sorted chains beginning at start1 and start2, respectively, are merged. // link[0] is used as a temporary header. Return start of merged chain. int iResult = 0; // last record of result chain for (int i1 = start1, i2 =start2; i1 && i2; ) if (a[i1] <= a[i2]) { link[iResult] = i1; iResult = i1; i1 = link[i1]; } else { link[iResult] = i2; iResult = i2; i2 =link[i2]; } // attach remaining records to result chain if (i1 = = 0) link[iResult] = i2; else link[iResult] = i1; return link[0]; } 7-26

Heap Sort (1) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] (a) Input array 7-27 Phase 1: Construct a max heap [1] [2] [3] [5] [6] [7] [8] [9] [10] Step 1Step 2 [4] (b) After Adjusting 61, 1

Heap Sort (2) [2] [3] [4] [5] [6] [7] [8] [9] [10] [1] (d) Max heap after constructing [1] [2] [3] [5] [6] [7] [8] [9] [10] Step 4 [4] Step 3 19 (c) After Adjusting 77, 5

Heap Sort (3) [2] [3] [4] [5] [6] [7] [8] [9] [1] Heap size = 9 a[10] = [ 2] [ 3] [5] [6] [7] [8] [1] Heap size = 8 a[9]=61, a[10]= Phase 2: Output and adjust the heap. Time complexity: O(nlogn)

Adjusting a Max Heap template void Adjust(T *a, const int root, const int n) {// Adjust binary tree with root root to satisfy heap property. The left and right // subtrees of root already satisfy the heap property. No node index is > n. T e = a[root]; // find proper place for e for (int j =2*root; j <= n; j *=2) { if (j < n && a[j] < a[j+1]) j++; // j is max child of its parent if (e >= a[j]) break; // e may be inserted as parent of j a[j / 2] = a[j]; // move jth record up the tree } a[j / 2] = e; } 7-30

Heap Sort template void HeapSort(T *a, const int n) {// Sort a[1:n] into nondecreasing order for (int i = n/2; i >= 1; i--) // heapify Adjust(a, i, n); for (i = n-1; i >= 1; i--) // sort { swap(a[1], a[i+1]); // swap first and last of current heap Adjust(a, 1, i); // heapify } 7-31

Radix Sort 基數排序 : Pass 1 (nondecreasing) a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] 7-32

Radix Sort: Pass 2 e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] 7-33

Radix Sort: Pass 3 e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10]

Radix Sort 方法: least significant digit first (LSD) (1) 每個資料不與其它資料比較,只看自己放 在何處 (2)pass 1 :從個位數開始處理。若是個位數 為 1 ,則放在 bucket 1 ,以此類推 … (3)pass 2 :處理十位數, pass 3 :處理百位數.. 好處:若以 array 處理,速度快 Time complexity: O((n+r)log r k) –k: input data 之最大數 –r: 以 r 為基數 (radix) , log r k: 位數之長度 缺點 : 若以 array 處理需要較多記憶體。使用 linked list ,可減少所需記憶體,但會增加時間 7-35

List Sort All sorting methods require excessive data movement. The physical data movement tends to slow down the sorting process. When sorting lists with large records, we have to modify the sorting methods to minimize the data movement. Methods such as insertion sort or merge sort can be modified to work with a linked list rather than a sequential list. Instead of physically moving the record, we change its additional link field to reflect the change in the position of the record in the list. After sorting, we may physically rearrange the records in place. 7-36

Rearranging Sorted Linked List (1) iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linka iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linka linkb Sorted linked list, first=4 Add backward links to become a doubly linked list, first=4

Rearranging Sorted Linked List (2) iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linka linkb iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linka linkb R 1 is in place. first=2 R 1, R 2 are in place. first=6

Rearranging Sorted Linked List (3) iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linka linkb iR1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key linkb linkb R 1, R 2, R 3 are in place. first=8 R 1, R 2, R 3, R 4 are in place. first=10

Rearranging Records Using A Doubly Linked List template void List1(T* a, int *lnka, const int n, int first) { int *linkb = new int[n]; // backward links int prev = 0; for (int current = first; current; current = linka[current]) { // convert chain into a doubly linked list linkb[current] = prev; prev = current; } for (int i = 1; i < n; i++) {// move a[first] to position i if (first != i) { if (linka[i]) linkb[linka[i]] = first; linka[linkb[i]] = first; swap(a[first], a[i]); swap(linka[first], linka[i]); swap(linkb[first], linkb[i]); } first = linka[i]; } 7-40

Table Sort The list-sort technique is not well suited for quick sort and heap sort. One can maintain an auxiliary table, t, with one entry per record, an indirect reference to the record. Initially, t[i] = i. When a swap are required, only the table entries are exchanged. After sorting, the list a[t[1]], a[t[2]], a[t[3]]…are sorted. Table sort is suitable for all sorting methods. 7-41

Permutation Cycle After sorting: Permutation [ ] Every permutation is made up of disjoint permutation cycles: –(1, 3, 8, 6) nontrivial cycle R1 now is in position 3, R3 in position 8, R8 in position 6, R6 in position 1. –(4, 5, 7) nontrivial cycle –(2) trivial cycle 7-42 R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 key t

Table Sort Example R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 key t key t key t Initial configuration after rearrangement of first cycle after rearrangement of second cycle

Code for Table Sort template void Table(T* a, const int n, int *t) { for (int i = 1; i < n; i++) { if (t[i] != i) {// nontrivial cycle starting at i T p = a[i]; int j = i; do { int k = t[j]; a[j] = a[k]; t[j] = j; j = k; } while (t[j] != i) a[j] = p; // j is the position for record p t[j] = j; } 7-44

Summary of Internal Sorting No one method is best under all circumstances. –Insertion sort is good when the list is already partially ordered. And it is the best for small n. –Merge sort has the best worst-case behavior but needs more storage than heap sort. –Quick sort has the best average behavior, but its worst-case behavior is O(n 2 ). –The behavior of radix sort depends on the size of the keys and the choice of r. 7-45

Complexity Comparison of Sort Methods MethodWorstAverage Insertion Sortn2n2 n2n2 Heap Sortn log n Merge Sortn log n Quick Sortn2n2 n log n Radix Sort(n+r)log r k 46 k: input data 之最大數 r: 以 r 為基數 (radix)

Average Execution Time Average execution time, n = # of elements, t=milliseconds 7-47 n t

External Sorting The lists to be sorted are too large to be contained totally in the internal memory. So internal sorting is impossible. The list (or file) to be sorted resides on a disk. Block: unit of data read from or written to a disk at one time. A block generally consists of several records. read/write time of disks: –seek time 搜尋時間:把讀寫頭移到正確磁軌 (track, cylinder) –latency time 延遲時間:把正確的磁區 (sector) 轉到讀寫頭下 –transmission time 傳輸時間:把資料區塊傳入 / 讀出磁碟 7-48

Merge Sort as External Sorting The most popular method for sorting on external storage devices is merge sort. Phase 1: Obtain sorted runs (segments) by internal sorting methods, such as heap sort, merge sort, quick sort or radix sort. These sorted runs are stored in external storage. Phase 2: Merge the sorted runs into one run with the merge sort method. 7-49

Merging the Sorted Runs run1 run2 run3 run4 run5 run6 7-50

Optimal Merging of Runs weighted external path length = 2*3 + 4*3 + 5*2 + 15*1 = 43 weighted external path length = 2*2 + 4*2 + 5*2 + 15*2 = In the external merge sort, the sorted runs may have different lengths. If shorter runs are merged first, the required time is reduced.

Huffman Algorithm External path length: sum of the distances of all external nodes from the root. Weighted external path length: Huffman algorithm: to solve the problem of finding a binary tree with minimum weighted external path length. Huffman tree: –Solve the 2-way merging problem –Generate Huffman codes for data compression 7-52

Construction of Huffman Tree (a) [2,3,5,7,9,13] (b) [5,5,7,9,13] (c) [7, 9, 10,13] (d) [10,13,16] (e) [16, 23] 7-53 Min heap is used. Time: O(n log n)

Huffman Code (1) Each symbol is encoded by 2 bits (fixed length) symbol code A 00 B 01 C 10 D 11 Message A B A C C D A would be encoded by 14 bits:

Huffman Code (2) Huffman codes (variable-length codes) symbol code A 0 B 110 C 10 D 111 Message A B A C C D A would be encoded by 13 bits: A frequently used symbol is encoded by a short bit string. 7-55

Huffman Tree 7-56 ACBD,7 B,1 BD,2 CBD,4 A,3 C,2 D,

IHFBDEGCA,91 IHFBD, 38EGCA, 53 B, 6HF, 5 D, 12HFB, 11 HFBD, 23I, 15 F, 4H, 1 E, 25GCA, 28 G, 6 A, 15GC, 13 C, 7 Sym Freq Code Sym Freq Code Sym Freq Code A D G B E H C F I A H E A D encode decode