Heaps and basic data structures David Kauchak cs161 Summer 2009
Administrative Homework 2 due date extended to Fri. 7/10 at 5pm Midterm 7/20 in class. Closed book, etc. Review sessions SCPD students Discussion board – thanks
Quicksort partitions – the good vs. the bad
Quicksort average case: take 2 cn “good” 50/50 split “bad” split We absorb the “bad” partition. In general, we can absorb any constant number of “bad” partitions
Quicksort partitions – the good vs. the bad For Quicksort to “absorb” the cost of bad partitions, as n grows, the proportion of bad to good partitions cannot grow Why? If as we increase the size of n, we proportionately increase the number of good and bad partitions, then there is still a constant number of “bad” partitions to be absorbed by a given “good” partition If, however, as we increase n the proportion of “bad” partitions increases, then we can no longer absorb the cost since of the “bad” partitions since it depends on n
Decision-tree model Full binary tree representing the comparisons between elements by a sorting algorithm Internal nodes contain indices to be compared Leaves contain a complete permutation of the input Tracing a path from root to leave gives the correct reordering/permutation of the input for an input 1:3 | 1,3,2 | ≤> | 2,1,3 | [3, 12, 7] [7, 3, 12] [3, 7, 12]
Comparison-based sorting Sorted order is determined based only on a comparison between input elements A[i] < A[j] A[i] > A[j] A[i] = A[j] A[i] ≤ A[j] A[i] ≥ A[j] This is why most built-in sorting approaches only require you to define the comparison operator (i.e. compareTo in Java) Can we do better than O(n log n)?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3|
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 12 ≤ 7 or is 12 > 7?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 12 ≤ 3 or is 12 > 3?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 12 ≤ 3 or is 12 > 3?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 12 ≤ 3 or is 12 > 3?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 7 ≤ 3 or is 7 > 3?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] Is 7 ≤ 3 or is 7 > 3?
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] 3, 2, 1
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [12, 7, 3] 3, 2, 1 [3, 7, 12]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3]
A decision tree model 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3| [7, 12, 3] [3, 7, 12]
How many leaves are in a decision tree? Leaves must have all possible permutations of the input Input of size n, n! leaves What if decision tree model didn’t? Some input would exist that didn’t have a correct reordering
A lower bound What is the worst-case number of comparisons for a tree? 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3|
A lower bound The longest path in the tree, i.e. the height 1:2 ≤> 2:3 ≤> 1:3 ≤> |1,2,3| 1:3 ≤ > 2:3 ≤ > |1,3,2||3,1,2| |2,3,1||3,2,1| |2,1,3|
A lower bound What is the maximum number of leaves a binary tree of height h can have? A complete binary tree has 2 h leaves log is monotonically increasing from hw1
Can we do better than O(n logn) for sorting? What if I told you the maximum value k that any number could take and k = O(n) In some situation (like above) we can sort in Θ(n) counting sort radix sort bucket sort Leverage additional knowledge about the data besides comparisons
Why don’t we hear about these more? Constants can be large and running times therefore may be larger for modest input sizes Cache friendliness Memory (Quicksort sorts in place) Hardware considerations
Data Structures What is a data structure? Way of storing data that facilitates particular operations Dynamic set operations: For a set S Search(S,k) – Does k exist in S? Insert(S,k) – Add k to S Delete(S,x) – Given a pointer/reference, x, to an elkement, delete it from S Min(S) – Return the smallest element of S Max(S) – Return the largest element of S
Array Sequential locations in memory in linear order Elements are accessed via index Cost of operations: Search(S,k) – Insert(S,k) – InsertIndex(S,k) – Delete(S,x) – Min(S) – Max(S) – O(n) Θ(n) Θ(1) Θ(n)
Array Uses? constant time access of particular indices
Linked list Elements are arranged linearly. An element in list points to the next element in the list Cost of operations: Search(S,k) – Insert(S,k) – InsertIndex(S,k) – Delete(S,x) – Min(S) – Max(S) – O(n) Θ(1) O(n) Θ(n)
Linked list Uses? constant time insertion at the cost of linear time access
Double linked list Elements are arranged linearly. An element in list points to the next element and previous element in the list What does the back link get us? Θ(1) deletion
Stack LIFO Picture the stack of plates at a buffet Can implement with an array or a linked list
Stack LIFO Picture the stack of plates at a buffet Can implement with an array or a linked list push(1) push(2) push(3) pop() top
Stack Empty – check if stack is empty Array: check if “top” is at index 0 Linked list: check if “top” pointer is null Runtime: Θ(1)
Stack Pop – removes the top element from the list check if empty, if so, “underflow” Array: return element at “top” and decrement “top” Linked list: return and remove at front of linked list Runtime: Push – add an element to the list Array: increment “top” and insert element. Must check for overflow! Linked list: insert element at front of linked list Runtime: Θ(1)
Stack Array or linked list? Array: more memory efficient Linked list: don’t have to worry about “overflow” Uses? runtime “stack” graph search algorithms (depth first search) syntactic parsing (i.e. compilers)
Queue FIFO Picture a line at the grocery store Can implement with array or double linked list Enqueue(1) Enqueue(2) Enqueue(3) Dequeue() headtail
Queue Operations Empty – Θ(1) Enqueue – add element to end of queue - Θ(1) Dequeue – remove element from the front of the queue - Θ(1) Uses? scheduling graph traversal (breadth first search)
Binary heap A binary tree where the value of a parent is greater than or equal to the value of it’s children Additional restriction: all levels of the tree are complete except the last Max heap vs. min heap
Binary heap - operations Maximum(S) - return the largest element in the set ExtractMax(S) – Return and remove the largest element in the set Insert(S, val) – insert val into the set IncreaseElement(S, x, val) – increase the value of element x to val BuildHeap(A) – build a heap from an array of elements
Binary heap - pointers parent ≥ child complete tree level does not indicate size all nodes in a heap are themselves heaps
Binary heap - array
Binary heap - array Left child of A[3]?
Binary heap - array Left child of A[3]? 2*3 = 6
Binary heap - array Parent of A[8]?
Binary heap - array Parent of A[8]?
Binary heap - array
Identify the valid heaps 8 [15, 12, 3, 11, 10, 2, 1, 7, 8] [20, 18, 10, 17, 16, 15, 9, 14, 13]
Heapify Assume left and right children are heaps, turn current set into a valid heap
Heapify Assume left and right children are heaps, turn current set into a valid heap
Heapify Assume left and right children are heaps, turn current set into a valid heap find out which is largest: current, left of right
Heapify Assume left and right children are heaps, turn current set into a valid heap
Heapify Assume left and right children are heaps, turn current set into a valid heap if a child is larger, swap and recurse
Heapify
Heapify
Heapify
Heapify
Heapify
Heapify
Heapify
Correctness of Heapify Remember both the children are valid heaps Three cases: Case 1: A[i] (current node) is the largest parent is greater than both children both children are heaps current node is a valid heap
Correctness of heapify Case 2: left child is the largest When Heapify returns: Left child is a valid heap Right child is unchanged and therefore a valid heap Current node is larger than both children since we selected the largest node of current, left and right current node is a valid heap Case 3: right child is largest similar to above
Running time of Heapify What is the cost of each call to Heapify? Θ(1) How many calls are made to Heapify? O(height of the tree) What is the height of the tree? Complete binary tree, except for the last level O(log n)
Binary heap - operations Maximum(S) - return the largest element in the set ExtractMax(S) – Return and remove the largest element in the set Insert(S, val) – insert val into the set IncreaseElement(S, x, val) – increase the value of element x to val BuildHeap(A) – build a heap from an array of elements
Maximum Return the largest element from the set Return A[1]
ExtractMax Return and remove the largest element in the set
ExtractMax Return and remove the largest element in the set ?
ExtractMax Return and remove the largest element in the set ?
ExtractMax Return and remove the largest element in the set ?
ExtractMax Return and remove the largest element in the set ?
ExtractMax Return and remove the largest element in the set
ExtractMax Return and remove the largest element in the set Heapify
ExtractMax Return and remove the largest element in the set
ExtractMax running time Constant amount of work plus one call to Heapify – O(log n)
IncreaseElement Increase the value of element x to val
IncreaseElement Increase the value of element x to val
IncreaseElement Increase the value of element x to val
IncreaseElement Increase the value of element x to val
IncreaseElement Increase the value of element x to val
IncreaseElement Increase the value of element x to val
Correctness of IncreaseElement Why is it ok to swap values with parent?
Correctness of IncreaseElement Stop when heap property is satisfied
Running time of IncreaseElement Follows a path from a node to the root Worst case O(height of the tree) O(log n)
Insert Insert val into the set
Insert Insert val into the set
Insert Insert val into the set propagate value up
Insert
Running time of Insert Constant amount of work plus one call to IncreaseElement – O(log n)
Building a heap Can we build a heap using the functions we have so far? Maximum(S) ExtractMax(S) Insert(S, val)| IncreaseElement(S, x, val)
Building a heap
Running time of BuildHeap1 n calls to Insert – O(n log n) Can we get a better bound? …
Building a heap: take 2 Start with n/2 “simple” heaps call Heapify on element n/2-1, n/2-2, n/2-3 … all children have smaller indices building from the bottom up, makes sure that all the children are heaps
heapify
heapify
heapify
heapify
heapify
heapify
heapify
heapify
heapify
Correctness of BuildHeap2 Invariant:
Correctness of BuildHeap2 Invariant: elements A[i+1…n] are all heaps Base case: i = floor(n/2). All elements i+1, i+2, …, n are “simple” heaps Inductive case: We know i+1, i+2,.., n are all heaps, therefore the call to Heapify(A,i) generates a heap at node i Termination?
Running time of BuildHeap2 n/2 calls to Heapify – O(n log n) Can we get a tighter bound?
Running time of BuildHeap all nodes at the same level will have the same cost How many nodes are at level d? 2d2d
Running time of BuildHeap2 ?
Nodes at height h h=0 h=1 h=2 h < ceil(n/2) nodes < ceil(n/4) nodes < ceil(n/8) nodes < ceil(n/2 h+1 ) nodes
Running time of BuildHeap2
BuildHeap1 vs. BuildHeap2 Runtime Both O(n) BuildHeap2 may have smaller constants (only n/2 calls) Memory Both O(n) BuildHeap1 requires an additional array, i.e. 2n memory Complexity/Ease of implementation
Heap uses Heapsort Build a heap Call ExtractMax for all the elements O(n log n) running time Priority queues scheduling tasks: jobs, processes, network traffic A* search algorithm
Other heaps