1 6.3.1 Heapsort Idea: two phases: 1. Construction of the heap 2. Output of the heap For ordering number in an ascending sequence: use a Heap with reverse.

Slides:



Advertisements
Similar presentations
CS 400/600 – Data Structures External Sorting.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
COP 3502: Computer Science I (Note Set #21) Page 1 © Mark Llewellyn COP 3502: Computer Science I Spring 2004 – Note Set 21 – Balancing Binary Trees School.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Heapsort By: Steven Huang. What is a Heapsort? Heapsort is a comparison-based sorting algorithm to create a sorted array (or list) Part of the selection.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting “There it was, hidden in alphabetical order.” Rita Holt R&G Chapter 13.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Heapsort Idea: two phases: 1. Construction of the heap 2. Output of the heap For ordering number in an ascending sequence: use a Heap with reverse.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Cosequential Processing Chapter 8. Cosequential processing model Two or more input files sorted the same way on the same keys set current record to first.
E.G.M. Petrakissorting1 Sorting  Put data in order based on primary key  Many methods  Internal sorting:  data in arrays in main memory  External.
External Sorting R & G Chapter 13 One of the advantages of being
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses.
Priority Queues1 Part-D1 Priority Queues. Priority Queues2 Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is.
1 Chapter 8 Priority Queues. 2 Implementations Heaps Priority queues and heaps Vector based implementation of heaps Skew heaps Outline.
Sorting Chapter 12 Objectives Upon completion you will be able to:
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
CSE 373 Data Structures Lecture 15
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 Chapter 6: Searching trees and more Sorting Algorithms 6.1 Binnary Tree The Bin Tree class with traversing methods 6.2 Searching Trees AVL Trees.
Advanced Algorithm Design and Analysis (Lecture 2) SW5 fall 2004 Simonas Šaltenis E1-215b
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Sorting.
 … we have been assuming that the data collections we have been manipulating were entirely stored in memory.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 8 Sorting and Searching Goals: 1.Java implementation of sorting algorithms 2.Selection and Insertion Sorts 3.Recursive Sorts: Mergesort and Quicksort.
8 January Heap Sort CSE 2011 Winter Heap Sort Consider a priority queue with n items implemented by means of a heap  the space used is.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
Internal and External Sorting External Searching
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
CS6045: Advanced Algorithms Sorting Algorithms. Heap Data Structure A heap (nearly complete binary tree) can be stored as an array A –Root of tree is.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Sorting & Searching Geletaw S (MSC, MCITP). Objectives At the end of this session the students should be able to: – Design and implement the following.
Arrays Department of Computer Science. C provides a derived data type known as ARRAYS that is used when large amounts of data has to be processed. “ an.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CENG 3511 External Sorting. CENG 3512 Outline Introduction Heapsort Multi-way Merging Multi-step merging Replacement Selection in heap-sort.
Sorting With Priority Queue In-place Extra O(N) space
External Sorting Chapter 13
Sorting.
Disk Storage, Basic File Structures, and Buffer Management
External Sorting Chapter 13
Lecture 2- Query Processing (continued)
Database Design and Programming
CENG 351 Data Management and File Structures
External Sorting Chapter 13
Presentation transcript:

Heapsort Idea: two phases: 1. Construction of the heap 2. Output of the heap For ordering number in an ascending sequence: use a Heap with reverse order: the maximum number should be at the root (not the minimum). Heapsort is an in-situ-Procedure

2 Remembering Heaps: change the definition Heap with reverse order: For each node x and each successor y of x the following holds: m(x)  m(y), left-complete, which means the levels are filled starting from the root and each level from left to right, Implementation in an array, where the nodes are set in this order (from left to right).

3 Second Phase: 2. Output of the heap: take n-times the maximum (in the root, deletemax) and exchange it with the element at the end of the heap.  Heap is reduced by one element and the subsequence of ordered elements at the end of the array grows one element longer. cost: O(n log n). Heap Ordered elements Heap Ordered elements

4 First Phase: 1. Construction of the Heap: simple method: n-times insert Cost: O(n log n). making it better: consider the array a[1 … n ] as an already left-complete binary tree and let sink the elements in the following sequence ! a[n div 2] … a[2] a[1] (The elements a[n] … a[n div 2 +1] are already at the leafs.) HH The leafs of the heap

5 Formally: heap segment an array segment a[ i..k ] ( 1  i  k <=n ) is said to be a heap segment when following holds: for all j from {i,...,k} m(a[ j ])  m(a[ 2j ]) if 2j  k and m(a[ j ])  m(a[ 2j+1]) if 2j+1  k If a[i+1..n] is already a heap segment we can convert a[i…n] into a heap segment by letting a[i] sink.

6 Cost calculation Be k = [log n+1] (the height of the complete portion of the heap) cost: For an element at level j from the root: k – j. alltogether:  {j=0,…,k} (k-j)2 j = 2 k  {i=0,…,k} i/2 i =2 2 k = O(n).

7 advantage: The new construction strategy is more efficient ! Usage: when only the m biggest elements are required: 1. construction in O(n) steps. 2. output of the m biggest elements in O(mlog n) steps. total cost: O( n + mlog n).

8 Addendum: Sorting with search trees Algorithm: 1.Construction of a search tree (e.g. AVL-tree) with the elements to be sorted by n insert opeartions. 2.Output of the elements in InOrder-sequence.  Ordered sequence. cost: 1. O(n log n) with AVL-trees, 2. O(n). in total: O(n log n). optimal!

9 7.2 External Sorting Problem: Sorting big amount of data, as in external searching, stored in blocks (pages). efficiency: number of the access to pages should be kept low! Strategy: Sorting algorithm which processes the data sequentially (no frequent page exchanges): MergeSort!

General form for Merge mergesort(S) # retorna el conjunto S ordenado { if(S es vacío o tiene sólo 1 elemento) return(S); else { Dividir S en dos mitades A y B; A'=mergesort(A); B'=mergesort(B); return(merge(A',B')); } 10

13 Meregesort en Archivos: Start: se tienen n datos en un archivo g 1, divididos en páginas de tamaño b: Page 1: s 1,…,s b Page 2: s b+1,…s 2b … Page k: s (k-1)b+1,…,s n ( k = [n/b] + ) Si se procesan secuencialmente se hacen k accesos a paginas, no n.

14 Variacion de MergeSort para external sorting MergeSort: Divide-and-Conquer-Algorithm Para external sorting: sin el paso divide, solo merge. Definicion: run := subsecuencia ordenada dentro de un archivo. Estrategia: by merging increasingly bigger generated runs until everything is sorted.

15 Algoritmo 1. Step: Generar del input file g 1 „starting runs“ y distribuirlas en dos archivos f 1 and f 2, con el mismo numero de runs (  1) en cada uno (for this there are many strategies, later). Ahora: use 4 files f 1, f 2, g 1, g 2.

16 2. Step (main step): while (number of runs > 1) { Merge each two runs from f 1 and f 2 to a double sized run alternating to g 1 und g 2, until there are no more runs in f 1 and f 2. Merge each two runs from g 1 and g 2 to a double sized run alternating to f 1 and f 2, until there are no more runs in g 1 und g 2. } Each loop = two phases

17 Example: Start: g 1 : 64, 17, 3, 99, 79, 78, 19, 13, 67, 34, 8, 12, 50 1st. step (length of starting run= 1): f 1 : 64 | 3 | 79 | 19 | 67 | 8 | 50 f 2 : 17 | 99 | 78 | 13 | 34 | 12 Main step, 1st. loop, part 1 (1st. Phase ): g 1 : 17, 64 | 78, 79 | 34, 67 | 50 g 2 : 3, 99 | 13, 19 | 8, 12 1st. loop, part 2 (2nd. Phase): f 1 : 3, 17, 64, 99 | 8, 12, 34, 67 | f 2 : 13, 19, 78, 79 | 50 |

18 Example continuation 1st. loop, part 2 (2nd. Phase): f 1 : 3, 17, 64, 99 | 8, 12, 34, 67 | f 2 : 13, 19, 78, 79 | 50 | 2nd. loop, part 1 (3rd. Phase): g 1 : 3, 13, 17, 19, 64, 78, 79, 99 | g 2 : 8, 12, 34, 50, 67 | 2nd. loop, part 2 (4th. Phase): f 1 : 3, 8, 12, 13, 17, 19, 34, 50, 64, 67, 78, 79, 99 | f 2 :

19 Implementation: For each file f 1, f 2, g 1, g 2 at least one page of them is stored in principal memory (RAM), even better, a second one might be stored as buffer. Read/write operations are made page-wise.

24 Costs Page accesses during 1. step and each phase: O(n/b) In each phase we divide the number of runs by 2, thus: Total number of accesses to pages: O((n/b) log n), when starting with runs of length 1. Internal computing time in 1 step and each phase is: O(n). Total internal computing time: O( n log n ).

25 Two variants of the first step: creation of the start runs A) Direct mixing sort in primary memory („internally“) as many data as possible, for example m data sets  First run of a (fixed!) length m, thus r := n/m starting runs. Then we have the total number of page accesses: O( (n/b) log(r) ).

26 Two variants of the first step: creation of the start runs B) Natural mixing Creates starting runs of variable length. Advantage: we can take advantage of ordered subsequences that the file may contain Noteworthy: starting runs can be made longer by using the replacement-selection method by having a bigger primary storage !

27 Replacement-Selection Read m data from the input file in the primary memory (array). repeat { mark all data in the array as „now“. start a new run. while there is a „now“ marked data in the array { select the smallest (smallest key) from all „now“ marked data, print it in the output file, replace the number in the array with a number read from the input file (if there are still some) mark it „now“ if it is bigger or equal to the last outputted data, else mark it as „not now“. } Until there are no data in the input file.

28 Example: array in primary storage with capacity of 3 The input file has the following data: 64, 17, 3, 99, 79, 78, 19, 13, 67, 34, 8, 12, 50 In the array: („not now“ data written in parenthesis) Runs : 3, 17, 64, 78, 79, 99 | 13, 19, 34, 67 | 8, 12, (19)7999 (19)(13)99 (19)(13)(67) (8)3467 (8)(12)67 (8)(12)(50)

29 Implementation: In an array: At the front: Heap for „now“ marked data, At the back: refilled „not now“ data. Note: all „now“ elements go to the current generated run.

30 Expected length of the starting runs using the replace-select method: 2m (m = size of the array in the primary storage = number of data that fit into primary storage) by equally probabilities distribution Even bigger if there is some previous sorting!

31 Multi-way merging Instead of using two input and two output files (alternating f 1, f 2 and g 1, g 2 ) Use k input and k output files, in order to me able to merge always k runs in one. In each step: take the smallest number among the k runs and output it to the current output file.

32 Cost: In each phase: number of runs is devided by k, Thus, if we have r starting runs we need only log k (r) phases (instead of log 2 (r)). Total number of accesses to pages: O( (n/b) log k (r) ). Internal computing time for each phase: O(n log 2 (k)) Total internal computing time: O( n log 2 (k) log k (r)) = O( n log 2 (r) ).