Phil Tayco Slide version 1.0 May 7, 2018

Slides:



Advertisements
Similar presentations
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Advertisements

Heapsort.
Data Structures Advanced Sorts Part 2: Quicksort Phil Tayco Slide version 1.0 Mar. 22, 2015.
Heapsort. 2 Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n) Quicksort.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Heapsort Based off slides by: David Matuszek
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
CSE 250 September 29 – October 3, A NNOUNCEMENTS Homework 4 due 10/5 Project 1 posted for 10/6 Exam 2 10/8 No classes meet 10/9 Project 1 due 10/26.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
HeapSort 25 March HeapSort Heaps or priority queues provide a means of sorting: 1.Construct a heap, 2.Add each item to it (maintaining the heap.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Priority Queues and Heaps Tom Przybylinski. Maps ● We have (key,value) pairs, called entries ● We want to store and find/remove arbitrary entries (random.
Sorting With Priority Queue In-place Extra O(N) space
"Teachers open the door, but you must enter by yourself. "
Outline In this topic, we will: Define a binary min-heap
Andreas Klappenecker [partially based on the slides of Prof. Welch]
Multiway Search Trees Data may not fit into main memory
Priority Queues and Heaps
Lecture 22 Binary Search Trees Chapter 10 of textbook
Heap Sort Example Qamar Abbas.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Heap Chapter 9 Objectives Upon completion you will be able to:
7/23/2009 Many thanks to David Sun for some of the included slides!
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Dr. David Matuszek Heapsort Dr. David Matuszek
i206: Lecture 14: Heaps, Graphs intro.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
A Kind of Binary Tree Usually Stored in an Array
Tree Representation Heap.
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
Lecture 3 / 4 Algorithm Analysis
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
8/04/2009 Many thanks to David Sun for some of the included slides!
"Teachers open the door, but you must enter by yourself. "
Heaps Chapter 11 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates.
CS Data Structure: Heaps.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Heapsort.
Data Structures Sorted Arrays
Heaps Chapter 11 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates.
Sorting And Searching CSE116A,B 4/7/2019 B.Ramamurthy.
Data Structures Unsorted Arrays
Data Structures Advanced Sorts Part 1: Mergesort
Priority Queues (Heaps)
Heapsort.
CO4301 – Advanced Games Development Week 4 Binary Search Trees
Hash Maps: The point of a hash map is to FIND DATA QUICKLY.
B-Trees.
Heapsort.
Data Structures and Algorithm Analysis Priority Queues (Heaps)
1 Lecture 13 CS2013.
Heaps & Multi-way Search Trees
Heapsort.
CO 303 Algorithm Analysis and Design
Heaps Chapter 10 has several programming projects, including a project that uses heaps. This presentation shows you what a heap is, and demonstrates two.
Heaps.
Presentation transcript:

Phil Tayco Slide version 1.0 May 7, 2018 Data Structures Heaps Phil Tayco Slide version 1.0 May 7, 2018

Heaps Binary trees revisited again We’ve seen how binary trees can be used to sort data and store it dynamically In a traditional linked list, searches perform at O(n), which we are able to improve to O(log n) like an array with a fairly balanced binary tree Binary trees can also be used to handle other types of data and order Perhaps this structure can be used to revisit handling other concepts we’ve discussed

Heaps Back to priority queues Recall with priority queues the tradeoffs with inserts and removes O(1) insert meant there was no sort order in the array or linked list so the remove is O(n) O(1) remove meant that element to remove was always at the top of the queue, but maintaining that during insert meant O(n) If we are to improve on this performance, the side that works at O(n) will at least need to go to O(log n) The challenge will be to see if we can keep the other side at O(1)

Heaps How about those tree structures? Tree structures, when balanced, lead to divide and conquer algorithms that suggest O(log n) performance With a traditional binary tree, highest values representing priority would use a max() function which is O(log n) which could be used to do a priority queue remove Inserting records performs at O(log n) – this means while O(n) remove improves to O(log n), O(1) insert degrades to O(log n) This also only applies if the tree is balanced – if the tree becomes imbalanced, both insert and remove work at O(n)

Heaps What about O(1) remove? 2-3-4 trees can help address the performance by targeting the balance, but remove is not a simple function and leads to O(n) to maintain balance O(1) remove means the highest priority value is in a predictable location – in a tree, that’s the root A standard tree’s root value on a balanced tree is the middle sorted value What leads to a divide and conquer solution is the tree’s structure – we can still use it, but change the definitions on how values are maintained

Heaps The root is the highest To target O(1) remove, the root node needs to have the highest value With nodes in a tree with parent and child nodes, this suggests an order where nodes higher in the tree have higher values than the children The challenge is when a node is removed, you have to delete the root node and set up the new root as the next element with the highest value In addition, in order to optimize on structure, balance must also be maintained The stage is set. Given these concepts, we can define a new tree like structure

Heaps Define the conditions A heap is a complete tree structure where any node has a value higher value than its immediate children A complete tree means levels are filled left to right and no new level starts until a level is full Nodes can have a variety of values that appear to not have any order to them, but as long as the rule of parent-child values is maintained, it is considered to be a heap Note: when studying heaps, first look at the structural theory before thinking about implementation

Heaps Example 1: Incorrect heap – not a complete level (nodes 6 and 7 should be children of 5) root 50 5 20 6 7

Heaps Example 2: Incorrect heap – Complete, but values are incorrect (17 and 7 should switch places) root 50 7 2 6 17

Heaps Example 3: Correct heap – complete with nodes following value rules root 10 9 4 2 1

Heaps Insert In order to maintain the level completeness rule of a heap, new nodes are added at the next available spot on a level If the level is full, the new node is added on a new level on the far left

Heaps Insert 15: root 10 9 4 2 1 15

Heaps Insert Example The addition of node 15 is in the correct location maintaining the completeness rule of the heap The values rules, however, are violated because node 4 is less than 15 Assuming the heap is already correctly formed, when a new node is added, the only concern with the values rules will be between the new node and its parent In those cases, the new node will either be correctly position or not – if it is correct, no further action is required When they are incorrect, the solution is to trade places

Heaps Swap 15 and 4 root 10 9 15 2 1 4

Heaps Just keep swapping If a swap occurs, then the next pair of nodes to investigate is the next parent-child pair In this example, that means checking 15 with its parent, 10 Here, we need to do another swap

Heaps Swap 15 and 10 root 15 9 10 2 1 4

Heaps Eventually we’ll stop As swapping occurs, the higher values travel up to where they should be in the heap This swapping up the tree is called “trickling up” Trickling up stops when either no further swaps with a parent are needed or the last swap is with the root node

Heaps Insert 12 – new node maintains completeness but requires swapping with 10 root 15 9 10 2 1 4 12

Heaps Insert 12 – no further swaps are needed since 12’s new position does not need to swap with 15 – insert is done root 15 9 12 2 1 4 10

Heaps Insert 14 – 14 starts a new level under 2 root 15 9 12 2 1 4 10

Heaps Insert 14 – 14 will swap with 2 and then swap with 9 to complete its trickle up root 15 14 12 9 1 4 10 2

Heaps Insert efficiency Assuming the next location for adding a new node is known (more on that when we discuss implementation), that process is O(1) Trickling up in the worst case is one path up from a leaf level to the root Because of the nature of the tree structure, that path in the worst case is O(log n) For a priority queue, this actually makes the performance go from O(1) to O(log n) – is this an acceptable price to pay for the improvement on remove?

Heaps Remove efficiency goals Before accepting O(1) insert degrading to O(log n), we need to understand the performance for remove The root node is clearly the node to remove since it will have the highest value – O(1) operation However, the heap must be rebuilt effectively so the overall target of O(1) remove performance will be a challenge The target then is O(log n) since O(n) is not an improvement on using a heap for a priority queue

Heaps Remove algorithm With the root node removed, the vacant spot needs to be replaced with the correct node The node with the next best potential value will be one of its children The completeness rule, however, must also be supported – this means the node to truly remove is the rightmost node on the lowest level From this perspective, we can first start with swapping the root node to be removed with the node in the position that is designed to be deleted

Heaps Remove – 15 is the node to eliminated and 2 is the rightmost node on the lowest level root 15 14 12 9 1 4 10 2

Heaps Remove – 15 and 2 swap places. 15 can now be removed and the heap is still complete root 2 14 12 9 1 4 10 15

Heaps Fix the heap Since the heap is complete in structure, all that remains is to fix the values In this example, 2 is out of position Its current children were children of node 15 and because of the heap rules, one of the nodes will have the next highest value of the entire heap The process now is the opposite of the trickle up, except in this case, we trickle down In trickling down, the child node that is the higher of the 2 children and greater than the parent swaps with it This process continues until either no swaps are needed or the lowest level is reached

Heaps Remove – 14 is greater than 12 so it swaps with 2 root 2 14 12 9 10

Heaps Remove – After 14 and 2 swap, the next level is checked. Here 9 will swap with 2 root 14 2 12 9 1 4 10

Heaps Remove – After 9 and 2 swap, the heap is correct root 14 9 12 2 10

Heaps Remove – On the next remove, 14 will swap with 10 and be deleted. 10 will then trickle down from root and just swap with 12 root 12 9 10 2 1 4

Heaps Remove efficiency Again assuming the last spot on the heap is known, the swap with root is O(1) Trickle down from there is similar to trickle up in efficiency and is therefore O(log n) This makes both insert and remove performing at O(log n) – with the completeness rules in place as well, the O(log n) is guaranteed (no risk of imbalance)

Heaps Heaps as priority queues The remove function is always taking the highest value so it performs well as a priority queue Insert into the heap still ensures the highest value is at the root so it works for a priority queue as well The efficiencies are O(log n) – earlier implementations have one operation O(1) and the other O(n) You get a significant improvement with one and a significant degradation with the other The question is how significant is the overall change?

Heaps Run the numbers Recall that O() is a measure of performance as the structure grows in size At smaller values, O(1), O(log n), and O(n) are relatively close Larger values show the significance with is what O() demonstrates. With 1000 elements: O(n) means a value relative to 1000 O(1) means a constant value as low as 1 O(log n) could still be a value as low as 10 The key observation is the rate increase – as n doubles in size, O(n) performance also doubles while O(log n) performance only increments

Heaps Heaps reign supreme With both operations are O(log n), the gap between O(1) and O(log n) does not widen as much as between O(log n) and O(n) as n gets larger Thus, the hit for losing O(1) to O(log n) is worth it to gain the benefit of O(n) to O(log n) The O(log n) performance is dependent on knowing the exact location of the rightmost lowest level node This is where the implementation comes into play As with stacks and queues, static or dynamic structures can be used

Heaps Heaps as arrays It turns out heaps are generally implemented as arrays because the direct access to any element simplifies the code and caters to the O(1) part of finding that rightmost, lowest node Direct element access also allows for easy swaps between a parent and a child node The question is now how to implement a tree structure using an array We can start by taking a heap tree and “flatten” it into an array

Heaps Heap as an array – note color and index numbers The parent and child positions on the heap show a mathematical relationship in the array indexes root 14 1 2 3 4 5 6 9 12 14 9 12 2 1 4 10 2 1 4 10

Heaps Do the math Looking at children and parents at each level: Node 14 = 0, Left child (9) = 1, Right child (12) = 2 Node 9 = 1, Left child (2) = 3, Right child (1) = 4 Node 12 = 2, Left child (4) = 5, Right child (10) = 6 If node 2 had children, their indexes would be 7 and 8 Generalizing it, every given node at index n has children at the following array locations: Left child = 2n + 1 Right child = 2n + 2 This is useful for the coding and can also show how insert and remove functions can still be handled as an array

Heaps Insert 11: 11 will go as a new node on level 4 which will be index 7 in the array root 14 1 2 3 4 5 6 7 9 12 14 9 12 2 1 4 10 11 2 1 4 10 11

Heaps Insert 11: Trickle up will swap with 2 and then with 9. The array follows suit root 14 1 2 3 4 5 6 7 11 12 14 11 12 9 1 4 10 2 9 1 4 10 2

Heaps Remove example: Given this tree, remove first starts with swapping 14 with 10. Again this is a simple swap with index 0 and last array element root 14 1 2 3 4 5 6 9 12 14 9 12 2 1 4 10 2 1 4 10

Heaps Now 10 will have to trickle down and swap with 12. 14 can be considered removed as well root 12 1 2 3 4 5 6 9 10 12 9 10 2 1 4 14 2 1 4 14

Heaps Array considerations The array example here suggests the size of the array matches the size of the heap In reality, the heap array size has to be treated as static and as such, allocating an appropriate maximum size is still necessary The size in use (minus 1) also gives you the index location of the rightmost, lowest level node Swaps with parent and child nodes is a simple array element swap using the math relationship formulas discussed The array heap implementation supports the O(log n) efficiency but the price is static memory

Heaps Did you also notice this? When an element is “removed”, it actually is placed in the last spot in the array While the current active size would be reduced with each removal, you can also see this as an opportunity to effectively place the highest value at the end of the array so long as it follows the heap rules What would repeated “removals” lead to doing to the array?

Heaps Here’s the array after the last remove example. 14 is supposed to be “removed” so 4 represents the rightmost, lowest level node root 12 1 2 3 4 5 6 9 10 12 9 10 2 1 4 14 2 1 4 14

Heaps The next remove puts 12 where 4 is and 4 trickles down to swap with 10. Node 1 is next lowest root 10 1 2 3 4 5 6 9 4 10 9 4 2 1 12 14 2 1 12 14

Heaps The remove process continues with 10, with 1 swapping with 9 and then 2 root 9 1 2 3 4 5 6 2 4 9 2 4 1 10 12 14 1 10 12 14

Heaps And again with 9 down and 1 swapping with 4 root 4 1 2 3 4 5 6 2 1 2 3 4 5 6 2 1 4 2 1 9 10 12 14 9 10 12 14

Heaps Heapsort By using the heap structure implemented as an array, simulating running the remove operation for all elements results in sorting the array! Remove performs at O(log n) and would be done n times – this leads to a consistent O(n log n) sorting performance Quicksort is O(n log n) but can degrade to O(n2) for arrays with nearly sorted data Mergesort is O(n log n) and is consistent, but requires double the memory space Heapsort works well but its drawback is that the array elements need to first follow a heap How do you turn any array into a heap?

Heaps Twice the fun The simplest way to see turning an array into a heap is to walk through each element and perform the insert function The insert is handled within the array using swaps so no need for a second array of equal size like mergesort Once all the inserts are done, the removes could be done to sort the array Since insert and remove each perform at O(log n) and the process requires going through the array twice, overall performance is O(2(n log n)) This is still category O(n log n), but in the long run is less effective than quicksort (as long as the data is randomly distributed)

Heaps Recursive heapify Turning an array into a heap can also be done recursively While the efficiency is not improved, the algorithm is clever and is another opportunity to practice understanding recursive algorithms Base case idea: A node with no children is already a heap Recursive case: A node with children must turn heapify its right subtree and then heapify its left subtree Once the children are valid heaps, the parent node must trickle down

Heaps Summary The heap structure takes advantage of the binary tree concepts to get to consistent O(log n) insert and remove This is prime for a priority queue and best used to for large data amounts The implementation is typically done as an array so its drawback is static memory usage The same could be leveraged for using the heapsort algorithm which is comparable to mergesort and quicksort

Priority Queues as Array or Linked List Priority Queues as Heap Summary: Worst Case Priority Queues as Array or Linked List Priority Queues as Heap Insert (Push) O(1) or O(n) O(log n) Remove (Pop) O(n) or O(1) If Insert is O(1), Remove is O(n) and vice versa Typically implemented as an array so memory usage is static

Requires 2x memory space because of temp array Advanced Sorts Comparisons Notes Merge O(n log n) Requires 2x memory space because of temp array Quick Faster and more space efficient than Merge, but degrades to O(n2) when the array is near or reverse sorted Heap Consistently at O(log n) for any data arrangement though slightly worse than Merge and Quick