Data Structures Dynamic Sets Heaps Binary Trees & Sorting Hashing
Dynamic Sets Data structures that hold elements indexed with (usually unique) keys Support of some basic operations such as: Search for an element with a given key Insert a new element Delete an element Find the minimum-key element Find the maximum-key element Elements can have unique or non-unique keys, depending on the application Keys can be ordered (e.g., intergers)
Some operations on dynamic sets SEARCH(S, k) Given set S, key k, return x such that key(x) = k, or NIL if not found INSERT(S, x) Augment S by adding x to it DELETE(S, x) Delete x from S MINIMUM(S), MAXIMUM(S) Return element x in S with minimum (maximum) key(x) SUCCESSOR(S, x) Given x, return y in S with minimum key(y) > key(x), or NIL if key(x) is maximum PREDECESSOR(S, x)
Data Structures for Sets Several data structures can support sets Arrays Linked Lists Trees Heaps Hash Tables etc… Depending on the required set of operations, and on their frequency, different data structures are preferable
Example: Linked Lists A list L consists of a head, head[L], and a set of values Every list ends with NIL Lists can additionally point to objects indexed by the keys 91641NIL head[L] NIL
Variants of Linked Lists Simple Doubly linked Circular Notice the sentinel NIL 91641NIL head[L] NIL head[L] 9164NIL1 nil[L] 1641
Implementing Sets as Lists HEAD(L)return nil[L].next TAIL(L)return nil[L].prev INSERT(L, x) x.next = HEAD(L) HEAD(L).prev = x HEAD(L) = x x.prev = nil[L] nil[L] x
Implementing Sets as Lists HEAD(L)return nil[L].next TAIL(L)return nil[L].prev INSERT(L, x) x.next = HEAD(L) HEAD(L).prev = x HEAD(L) = x x.prev = nil[L] nil[L] x
Implementing Sets as Lists HEAD(L)return nil[L].next TAIL(L)return nil[L].prev INSERT(L, x) x.next = HEAD(L) HEAD(L).prev = x HEAD(L) = x x.prev = nil[L] nil[L] x
Implementing Sets as Lists HEAD(L)return nil[L].next TAIL(L)return nil[L].prev INSERT(L, x) x.next = HEAD(L) HEAD(L).prev = x HEAD(L) = x x.prev = nil[L] nil[L] x
Implementing Sets as Lists HEAD(L)return nil[L].next TAIL(L)return nil[L].prev INSERT(L, x) x.next = HEAD(L) HEAD(L).prev = x HEAD(L) = x x.prev = nil[L] nil[L] x
Implementing Sets as Lists LIST-DELETE(L, x) x.prev.next = x.next x.next.prev = x.prev LIST-SEARCH(L, k) x = HEAD(L) while x != nil(L) and x.key != k x = x.next return x How fast do INSERT and DELETE run? How fast does LIST-SEARCH run? 1641
Sorted Lists INSERT x Search for y such that y.key ≤ x.key ≤ y.next.key DELETE x Same as unsorted lists MINIMUM Return HEAD(L) MAXIMUM Return TAIL(L) LIST-SEARCH x Same as unsorted lists EXTRACT-MAX DELETE MAXIMUM
Running Times of Basic Operations UNSORTED LISTSORTED LIST INSERT constantO(N) DELETE constant SEARCH O(N) MINIMUM O(N)constant MAXIMUM O(N)constant EXTRACT-MAX O(N)constant
Heaps Heap: a very efficient binary tree data structure Heap operations INSERT DELETE EXTRACT-MAX Heapsort A priority queue based on heaps Heap operations DECREASE-KEY
Heaps: Definition A heap is an array A[1,…,length(A)] heap_size(A) ≤ length(A), is the size of the heap A[1], …, A[heap_size(A)] are elements in the heap A Contents of the heap heap_size(A) length(A)
Heaps: Definition A heap is also a binary tree PARENT(i) return i/2 LEFT(i) return 2i RIGHT(i) return 2i A
Heaps: Definition The heap property: For every i > 1, A[PARENT(i)] >= A[i] A
Maintaining the Heap Property Example: Heap property is violated at root Left and right trees are heaps HEAPIFY(A,1) will fix this A
Maintaining the Heap Property HEAPIFY Propagate the problem down A
Maintaining the Heap Property HEAPIFY Propagate the problem down At each step, replace problem node with largest child A
Maintaining the Heap Property HEAPIFY Propagate the problem down At each step, replace problem node with largest child A
Maintaining the Heap Property HEAPIFY(A, i) l = LEFT(i) r = RIGHT(i) if l A[i] then max = l else max = i if r A[max] then max = r if max != i then exchange(A[i], A[max]) HEAPIFY(A, max) A
Heaps are balanced Claim: The size of each child heap is < 2/3 N, where N is the size of the parent Proof: Let N: parent heap size A, B: child heap sizes Worst case: a = 2 k, b= 0 B = 1 + … + 2 k-1 = 2 k – 1 A = 1 + … + 2 k = 2B + 1 N = 3B+2 A/N = [2(B+1) – 1] / [3(B+1) – 1] < 2(B+1)/3(B+1) = 2/3 N = … + 2 k + a + b = A + B A = 1 +…+ 2 k-1 + aB = 1 +…+ 2 k-1 + b ab k
HEAPIFY runs in time O(log n) T(n) < T(2/3 n) + (1) = (log n) by the Master Theorem [[ Case 2: T(n) = 1 T(n/ (3/2) ) + c f(n) = c; a = 1; b = 3/2; n log b a = n log 3/2 1 = n 0 = 1 f(n) = (n log b a ) therefore T(n) = (n log b a log n) = (log n) ]] Alternatively, T(n) = O(h), where h is height of the heap
Building a Heap A Build a heap starting from an unordered array
Building a Heap A The leafs are already heaps of size 1
Building a Heap A Go up one-by-one to the root, fixing the heap property
Building a Heap A Go up one-by-one to the root, fixing the heap property Do that by running HEAPIFY
Building a Heap A Go up one-by-one to the root, fixing the heap property Do that by running HEAPIFY
Building a Heap A Each HEAPIFY takes time O(height)
Building a Heap A Each HEAPIFY takes time O(height)
Building a Heap A BUILD-HEAP(A) heap_size[A]= length(A) For i = length(A)/2 downto 1 HEAPIFY(A, i)
Running Time of BUILD-HEAP How fast does BUILD-HEAP run??? Here is a bound: At most N = heap_size(A) calls to HEAPIFY Each call takes at most O(log N) time Therefore, running time is O(N log N) Are we done?
Two lemmas left as exercises Lemma 1 An N-element heap has height lg N Lemma 2 An N-element heap has at most N / 2 h+1 nodes of height h height h =
Running Time of BUILD-HEAP Each HEAPIFY takes time O(h) HEAPIFY is called at most once/node T(N) = h = 0… lg N N / 2 h+1 O(h) ≤ O(N h = 0… lg N h/2 h ) h = 0… lg N h/2 h < h = 0… [ h×(1/2) h ] = (1/2)/(1-1/2) 2 = 2, by [A.8] = O(2N) = O(N) !
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1)
Heapsort A HEAPSORT(A) BUILD-HEAP(A) for I = length(A) downto 2 exchange(A[1], A[i]) heap_size(A)-- HEAPIFY(A, 1) RUNNING TIME?
Priority Queues A priority queue S is a data structure supporting: MAXIMUM(S) Returns maximum key in S EXTRACT-MAX(S) Removes maximum key in S, and returns it INCREASE-KEY(S, x ptr, x new ) Increases x old, stored in x ptr, into x new > x old INSERT(S, x) S = S {x} PRIORITY QUEUE
Heaps as Priority Queues Heaps can be efficient priority queues MAXIMUM is implemented in O(1) MAXIMUM(A) return A[1] EXTRACT-MAX is implemented in?? EXTRACT-MAX(A) if heap_size(A) < 1 then error(“underflow”) max = A[1] A[1] = A[heap_size(A)] heap_size(A)-- HEAPIFY(A, 1) return max
Heaps as Priority Queues INCREASE-KEY is implemented in O(log n) INCREASE-KEY(A, i, key) if key < A[i]then error(“key too small”) A[i] = key while i>1 && A[PARENT(i) < A[i]] exchange(A[i], A[PARENT(i)] i = PARENT(i)
Example of INCREASE-KEY A INCREASE-KEY(A, i, key) if key < A[i] then error(“key too small”) A[i] = key while i > 1 and A[PARENT(i)] < A[i] exchange (A[i], A[PARENT(i)]) i = PARENT(i)
Example of INCREASE-KEY A INCREASE-KEY(A, i, key) if key < A[i] then error(“key too small”) A[i] = key while i > 1 and A[PARENT(i)] < A[i] exchange (A[i], A[PARENT(i)]) i = PARENT(i)
Example of INCREASE-KEY A INCREASE-KEY(A, i, key) if key < A[i] then error(“key too small”) A[i] = key while i > 1 and A[PARENT(i)] < A[i] exchange (A[i], A[PARENT(i)]) i = PARENT(i)
Example of INCREASE-KEY A INCREASE-KEY(A, i, key) if key < A[i] then error(“key too small”) A[i] = key while i > 1 and A[PARENT(i)] < A[i] exchange (A[i], A[PARENT(i)]) i = PARENT(i)
Example of INCREASE-KEY A INCREASE-KEY(A, i, key) if key < A[i] then error(“key too small”) A[i] = key while i > 1 and A[PARENT(i)] < A[i] exchange (A[i], A[PARENT(i)]) i = PARENT(i)
Heaps as Priority Queues INSERT is implemented in O(log n) INSERT(A, key) Heap_size(A)++ A[heap_size(A)] = -Infinity INCREASE-KEY(A, heap_size(A), key) INSERT 11 --
Summary UNSORTED LISTSORTED LISTHEAP INSERT cO(N)O(log N) DELETE ccO(log N) SEARCH O(N) MAXIMUM O(N)cc EXTRACT-MAX O(N)cO(log N) INCREASE-KEY cO(N)O(log N)