1 Jeff Edmonds York University COSC 2011 Lecture 6 Balanced Trees Dictionary/Map ADT Binary Search Trees Insertions and Deletions AVL Trees Rebalancing.

Slides:



Advertisements
Similar presentations
COL 106 Shweta Agrawal and Amit Kumar
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
EECS 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Trees Types and Operations
1 prepared from lecture material © 2004 Goodrich & Tamassia COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Data Structures Lecture 11 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
AVL-Trees (Part 1) COMP171. AVL Trees / Slide 2 * Data, a set of elements * Data structure, a structured set of elements, linear, tree, graph, … * Linear:
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
© 2004 Goodrich, Tamassia, Dickerson Splay Trees v z.
Goodrich, Tamassia Search Trees1 Binary Search Trees   
CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties and maintenance.
CSC 213 Lecture 7: Binary, AVL, and Splay Trees. Binary Search Trees (§ 9.1) Binary search tree (BST) is a binary tree storing key- value pairs (entries):
Binary Search Trees1 Part-F1 Binary Search Trees   
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
1 COSC 2P03 Lecture #5 – Trees Part III, Heaps. 2 Today Take up the quiz Assignment Questions Red-Black Trees Binary Heaps Heap sort D-Heaps, Leftist.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Binary Search Trees1 ADT for Map: Map stores elements (entries) so that they can be located quickly using keys. Each element (entry) is a key-value pair.
CSC 212 Lecture 19: Splay Trees, (2,4) Trees, and Red-Black Trees.
Priority Queues1 Part-D1 Priority Queues. Priority Queues2 Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
COMP20010: Algorithms and Imperative Programming Lecture 4 Ordered Dictionaries and Binary Search Trees AVL Trees.
Search Trees Last Update: Nov 5, 2014 EECS2011: Search Trees1 “Grey Tree”, Piet Mondrian, 1912.
Search Trees. Binary Search Tree (§10.1) A binary search tree is a binary tree storing keys (or key-element pairs) at its internal nodes and satisfying.
Binary Search Trees (10.1) CSE 2011 Winter November 2015.
© 2004 Goodrich, Tamassia Binary Search Trees1 CSC 212 Lecture 18: Binary and AVL Trees.
Search Trees Chapter   . Outline  Binary Search Trees  AVL Trees  Splay Trees.
Chapter 10: Search Trees Nancy Amato Parasol Lab, Dept. CSE, Texas A&M University Acknowledgement: These slides are adapted from slides provided with Data.
Chapter 2: Basic Data Structures. Spring 2003CS 3152 Basic Data Structures Stacks Queues Vectors, Linked Lists Trees (Including Balanced Trees) Priority.
1 Joe Meehean.  We wanted a data structure that gave us... the smallest item then the next smallest then the next and so on…  This ADT is called a priority.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
CSE373: Data Structures & Algorithms Lecture 6: Priority Queues Kevin Quinn Fall 2015.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   
Heaps © 2010 Goodrich, Tamassia. Heaps2 Priority Queue ADT  A priority queue (PQ) stores a collection of entries  Typically, an entry is a.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
CSE373: Data Structures & Algorithms Lecture 8: AVL Trees and Priority Queues Catie Baker Spring 2015.
1 COMP9024: Data Structures and Algorithms Week Six: Search Trees Hui Wu Session 1, 2014
CSE373: Data Structures & Algorithms Lecture 8: AVL Trees and Priority Queues Linda Shapiro Spring 2016.
CSE373: Data Structures & Algorithms Priority Queues
Part-D1 Binary Search Trees
Communication & Entropy
Search Trees.
Lecture 15 AVL Trees Slides modified from © 2010 Goodrich, Tamassia & by Prof. Naveen Garg’s Lectures.
Priority Queues © 2010 Goodrich, Tamassia Priority Queues 1
Heaps © 2010 Goodrich, Tamassia Heaps Heaps
Chapter 2: Basic Data Structures
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Part-D1 Priority Queues
© 2013 Goodrich, Tamassia, Goldwasser
Copyright © Aiman Hanna All rights reserved
Tree Rotations and AVL Trees
1 Lecture 13 CS2013.
Dictionaries 二○一九年九月二十四日 ADT for Map:
Presentation transcript:

1 Jeff Edmonds York University COSC 2011 Lecture 6 Balanced Trees Dictionary/Map ADT Binary Search Trees Insertions and Deletions AVL Trees Rebalancing AVL Trees Union-Find Partition Heaps & Priority Queues Communication & Hoffman Codes (Splay Trees)

2 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  k1,v1k2,v2k3,v3k4,v4k1,v1k2,v2k3,v3k4,v4 Examples: key = word, value = definition key = social insurance number value = person’s data

3 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  k1,v1k2,v2k3,v3k4,v4k1,v1k2,v2k3,v3k4,v4 Insert Search Unordered ArrayO(n)O(1) Implementations: Array … k5,v5k5,v5

4 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  2,v34,v47,v19,v22,v34,v47,v19,v2 Insert Search Ordered Array Unordered Array O(n) O(1) O(logn) Implementations: Array … 6,v56,v5

5 trailer header nodes/positions entries Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  2,v34,v47,v19,v22,v34,v47,v19,v2 Insert Search Ordered Array Ordered Linked List Unordered Array O(n) O(1) O(logn) Implementations: O(n) 6,v56,v5 Inserting is O(1) if you have the spot. but O(n) to find the spot.

6 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  2,v34,v47,v19,v22,v34,v47,v19,v2 Insert Search Ordered Array Unordered Array Binary Search Tree O(n) O(logn) O(1) O(logn) Implementations:

7 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  2,v34,v47,v19,v22,v34,v47,v19,v2 Insert Search Ordered Array Unordered Array Binary Search Tree O(n) O(logn) O(1) O(logn) HeapsFaster: O(logn) O(n) Implementations: Max O(1) Heaps are good for Priority Queues.

8 Dictionary/Map ADT Problem: Store value/data associated with keys. Input  key, value  2,v34,v47,v19,v22,v34,v47,v19,v2 Hash TablesAvg: O(1) O(1) Next O(1) (Avg) O(1) O(n) Hash Tables are very fast, but keys have no order. Insert Search Ordered Array Unordered Array Binary Search Tree O(n) O(logn) O(1) O(logn) HeapsFaster: O(logn) Implementations: Max O(1)

9 Unsorted List Sorted List Balanced Trees Splay Trees HeapHash Tables Search Insert Delete Find Max Find Next in Order O(log(n)) Balanced Trees O(n) O(1) O(n) O(1) O(n) O(log(n)) O(1) O(n) Amortized O(1) (Priority Queue) O(1) O(n) Worst case O(n) Practice better O(log(n)) better (Dictionary) O(n) (Static)

I learned AVL trees from slides from Andy Mirzaian and James Elder and then reworked them

From binary search to Binary Search Trees 11

Binary Search Tree All nodes in left subtree ≤ Any node ≤ All nodes in right subtree ≤ ≤ ≤

key Algorithm TreeSearch(k, v) v = T.root() loop if T.isExternal (v) return “not there” if k  key(v) v = T.left(v) else if k  key(v) return v else { k  key(v) } v = T.right(v) end loop Move down the tree. Loop Invariant: If the key is contained in the original tree, then the key is contained in the sub-tree rooted at the current node. Iterative Algorithm

key Recursive Algorithm  If the key is not at the root, ask a friend to look for it in the appropriate subtree. Algorithm TreeSearch(k, v) if T.isExternal (v) return “not there” if k  key(v) return TreeSearch(k, T.left(v)) else if k  key(v) return v else { k  key(v) } return TreeSearch(k, T.right(v))

v w Insertions/Deletions  To insert(key, data):  We search for key.  Not being there, we end up in an empty tree.  Insert the key there. Insert 10

v w 2 10  Insertions/Deletions  To Delete(key del, data):  If it does not have two children,  point its one child at its parent. Delete 4 key del

w 2 10  Insertions/Deletions  To Delete(key del, data):  else find the next key next in order  right left left left …. to empty tree  Replace key del to delete with key next  point key next ’s one child at its parent. Delete 3 key next key del

Performance  find, insert and remove take O(height) time  In a balanced tree, the height is O(log n)  In the worst case, it is O(n)  Thus it worthwhile to balance the tree (next topic)!

 The AVL tree is the first balanced binary search tree ever invented.  It is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information.”G.M. Adelson-VelskiiE.M. Landis AVL Trees

 AVL trees are “mostly” balanced.  Tree is said to be an AVL Tree if and only if heights of siblings differ by at most 1.  height(v) = height of the subtree rooted at v.  balanceFactor(v) = height(rightChild(v)) - height(leftChild(v))  Tree is said to be an AVL Tree if and only if  v balanceFactor(v)  { -1,0,1 } = subtree height 3 subtree height balanceFactor = 2-3 =

Height of an AVL Tree  Claim: The height of an AVL tree storing n keys is ≤ O(log n).  Proof: Let N(h) be the minimum the # of nodes of an AVL tree of height h.  Observe that N(0) = 0 (  ) and N(1) = 1 (  )  For h ≥ 2, the minimal AVL tree contains  the root node,  one minimal AVL subtree of height h – 1,  another of height h - 2.  That is, N(h) = 1 + N(h - 1) + N(h - 2) > N(h - 1) + N(h - 2) = Fibonacci(h) ≈ 1.62 h.  n ≥ 1.62 h h ≤ log(n)/log(1.62) = 4.78 log(n)  Thus the height of an AVL tree is O(log n) balanceFactor ≤ 1 = (h-1)-(h-2) height = h At least one of its subtrees has height h-1 h-2

Rebalancing  Changes heights/balanceFactors  Subtree [..,5] raises up one  Subtree [5,10] height does not change  Subtree [10,..] lowers one  Does not change Binary Tree Ordering.

T1T1 T2T2 T3T3 Rebalancing after an Insertion  Inserting new leaf 2 in to AVL tree may create an imbalance Problem! 2 rotateR(7) 7 4 T1T1 T2T2 T3T No longer an AVL Tree 3 subtree height balanceFactor = 3-1 = 2 1 Rebalanced into an AVL tree again = = 0 1

T2T2 T1T1 T3T3 Rebalancing after an Insertion rotateR(7) 7 4 balanceFactor = 3-1 = 2  Try another example. Inserting new leaf 6 in to AVL tree 6 Problem! 3 T0T0 8 T2T2 5 6 T1T1 T3T = -2 Oops! Not an AVL Tree.

T2T2 T1T1 T3T3 Rebalancing after an Insertion rotateR(7) 7 4 balanceFactor  {-2,+2} 6 Problem! 3 T0T0 8 T2T2 5 6 T1T1 T3T3  There are 6 cases.  Two are easier balanceFactor  {-1,0,+1}

 There are 6 cases.  Two are easier  Half are symmetrically the same.  This leaves two. Rebalancing after an Insertion x z y height = h T0T0 T1T1 T2T2 T2T2 T3T3 h-1h-3 h-2 one is h-3 & one is h- 4 h-3 x z y height = h T1T1 T2T2 T0T0 T0T0 T3T3 h-1h-3 h-2 one is h-3 & one is h- 4 h x z y height = h T3T3 T2T2 T1T1 T1T1 T0T0 h-1h-3 h-2 one is h-3 & one is h- 4 h-3 x z y height = h T2T2 T1T1 T3T3 T3T3 T0T0 h-1h-3 h-2 one is h-3 & one is h- 4 h

Rebalancing after an Insertion  Inserting new leaf 2 in to AVL tree may create an imbalance in path from leaf to root Problem! Increases heights along path from leaf to root. 2 balanceFactor

Rebalancing after an Insertion  The repair strategy called trinode restructuring Problem! 2 +2  Denote  z = the lowest imbalanced node  y = the child of z with highest subtree  x = the child of y with highest subtree

T1T1 T2T2 T2T2 T3T3 Rebalancing after an Insertion z y x h-1  The repair strategy called trinode restructuring  y = the child of z with highest subtree Def n : h = height At least one of its subtrees has height h-1

T1T1 T2T2 T2T2 T3T3 Rebalancing after an Insertion z y x h-1 h-3 h-2 h-3  The repair strategy called trinode restructuring +2 balanceFactor +1 Assume  z = the lowest imbalanced node  x = the child of y with highest subtree Def n : h = height balanceFactor  {-1,0,1} but only got one worse with insertion balanceFactor  {-2,2} By way of symmetry assume 2. balanceFactor  {-1,0,1} Cases: balanceFactor = +1 we do now. balanceFactor = -1 we will do later. balanceFactor = 0 is the same as -1. balanceFactor

T1T1 T2T2 T2T2 T3T3 Rebalancing after an Insertion z y x h-3 h-2 h-3  The repair strategy called trinode restructuring This subtree is balanced. rotateR(z) h-2 T2T2 T2T2 T3T3 T1T1 x z y T 1 ≤ y ≤ T 2 ≤ z ≤ T 3 y ≤ z h-3 h-2 h-3 h-1

T1T1 T2T2 T2T2 T3T3 Rebalancing after an Insertion z y x h Is the rest of the tree sufficiently balanced to make it an AVL tree? Before the insert it was. Insert made this subtree one higher. Our restructuring made it Hence the whole tree is an AVL Tree h-1 T2T2 T2T2 T3T3 T1T1 x z y Rest of Tree This subtree is balanced. back to the original height.

Rebalancing after an Insertion  Try another example. Inserting new leaf 6 in to AVL tree

T2T2 T1T1 T3T3 Rebalancing after an Insertion z y rotateR(z) 3 z y T0T0 8 T2T2 5 6 T1T1 T3T3 balanceFactor = 1-3 = -2 Oops! Not an AVL Tree.  Try another example. Inserting new leaf 6 in to AVL tree

Rebalancing after an Insertion Problem! Increases heights along path from leaf to root. 6 balanceFactor  Try another example. Inserting new leaf 6 in to AVL tree

Rebalancing after an Insertion Problem!  The repair strategy called trinode restructuring  Denote  z = the lowest imbalanced node  y = the child of z with highest subtree  x = the child of y with highest subtree

Rebalancing after an Insertion z y Def n : h = height h-1 h-3 h-2 h-3 one is h-3 & one maybe h-4  The repair strategy called trinode restructuring +2 balanceFactor -1 Assume second case balanceFactor  z = the lowest imbalanced node  y = the child of z with highest subtree  x = the child of y with highest subtree x T2T2 T3T3 T4T4 T1T1

Rebalancing after an Insertion z y h-3 one is h-3 & one maybe h-4  The repair strategy called trinode restructuring x T2T2 T3T3 T4T4 T1T1 rotateL(y) z x y y ≤ x ≤ z

Rebalancing after an Insertion z y h-3 one is h-3 & one maybe h-4  The repair strategy called trinode restructuring x T2T2 T3T3 T4T4 T1T1 rotateL(y) rotateR(z) z x y y ≤ x ≤ z y z x

Rebalancing after an Insertion z y height = h h-3 one is h-3 & one maybe h-4  The repair strategy called trinode restructuring x T2T2 T3T3 T4T4 T1T1 y z x T1T1 T2T2 T3T3 T3T3 T4T4 T 1 ≤ y ≤ T 2 ≤ z ≤ T 3 h-1 h-2 This subtree is balanced. And shorter by one. Hence the whole is an AVL Tree one is h-3 & one maybe h-4 h-3 Rest of Tree

Rebalancing after an Insertion Example: Insert 12

w Step 1.1: top-down search Rebalancing after an Insertion Example: Insert 12

w Rebalancing after an Insertion Example: Insert 12

w imbalance Rebalancing after an Insertion Example: Insert 12

Step 2.2: trinode discovered (needs double rotation) x y z Rebalancing after an Insertion Example: Insert 12

Step 2.3: trinode restructured; balance restored. DONE! z y x Rebalancing after an Insertion Example: Insert 12

Rebalancing after a deletion  Very similar to before.  Unfortunately, trinode restructuring may reduce the height of the subtree, causing another imbalance further up the tree.  Thus this search and repair process must in the worst case be repeated until we reach the root.  See text for implementation.

Running Times for AVL Trees  a single restructure is O(1)  using a linked-structure binary tree  find is O(log n)  height of tree is O(log n), no restructures needed  insert is O(log n)  initial find is O(log n)  Restructuring is O(1)  remove is O(log n)  initial find is O(log n)  Restructuring up the tree, maintaining heights is O(log n)

Other Similar Balanced Trees  Red-Black Trees  Balanced because of rules about red and black nodes  (2-4) Trees  Balanced by having between 2 and 4 children  Splay Trees  Moves used nodes to the root.

Union-Find Partition Structures Andy Mirzaian50 Last Update: Dec 4, 2014

Partitions with Union-Find Operations makeSet(x): Create a singleton set containing the element x and return the position storing x in this set union(A,B ): Return the set A U B, destroying the old A and B find(p): Return the set containing the element at position p Andy Mirzaian51 Last Update: Dec 4, 2014

List-based Implementation Each set is stored in a sequence represented with a linked-list Each node should store an object containing the element and a reference to the set name Andy Mirzaian52 Last Update: Dec 4, 2014

Analysis of List-based Representation When doing a union, always move elements from the smaller set to the larger set  Each time an element is moved it goes to a set of size at least double its old set  Thus, an element can be moved at most O(log n) times Total time needed to do n unions and finds is O(n log n). Andy Mirzaian53 Last Update: Dec 4, 2014

Tree-based Implementation Each set is stored as a rooted tree of its elements: Each element points to its parent. The root is the “name” of the set. Example: The sets “1”, “2”, and “5”: Andy Mirzaian Last Update: Dec 4, 2014

Union-Find Operations To do a union, simply make the root of one tree point to the root of the other To do a find, follow set- name pointers from the starting node until reaching a node whose set-name pointer refers back to itself Andy Mirzaian Last Update: Dec 4, 2014

Union-Find Heuristic 1 Union by size: – When performing a union, make the root of smaller tree point to the root of the larger Implies O(n log n) time for performing n union-find operations: – Each time we follow a pointer, we are going to a subtree of size at least double the size of the previous subtree – Thus, we will follow at most O(log n) pointers for any find. Andy Mirzaian Last Update: Dec 4, 2014

Union-Find Heuristic 2 Path compression: – After performing a find, compress all the pointers on the path just traversed so that they all point to the root Implies O(n log * n) time for performing n union-find operations: – [Proof is somewhat involved and is covered in EECS 4101] Andy Mirzaian Last Update: Dec 4, 2014

Java Implementation Andy Mirzaian 58 Last Update: Dec 4, 2014

59 Heaps, Heap Sort, & Priority Queues J. W. J. Williams, 1964

60 Abstract Data Types Restricted Data Structure: Some times we limit what operation can be done for efficiency understanding Stack: A list, but elements can only be pushed onto and popped from the top. Queue: A list, but elements can only be added at the end and removed from the front. Important in handling jobs. Priority Queue: The “highest priority” element is handled next.

61 Priority Queues Sorted List Unsorted List Heap Items arrive with a priority. O(n)O(1)O(logn) Item removed is that with highest priority. O(1)O(n)O(logn)

62 Heap Definition Completely Balanced Binary Tree The value of each node  each of the node's children. Left or right child could be larger. Where can 1 go? Maximum is at root. Where can 8 go? Where can 9 go?

63 Heap Data Structure Completely Balanced Binary Tree Implemented by an Array

64 Make Heap Get help from friends

65 Heapify ? Maximum needs to be at root. Where is the maximum?

66 Find the maximum. Put it in place ? Repeat Heapify The 5 “bubbles” down until it finds its spot.

67 Heap Heapify The 5 “bubbles” down until it finds its spot.

68 Heap Running Time: Heapify

69 Iterative

70 Recursive

71 Make Heap Get help from friends T(n) = 2T(n/2) + log(n) Running time: =  (n) Heapify Heap Recursive

72 Heaps Heap ? Iterative

73 ? Heap Iterative

74 ? Heap Iterative

75 ? Iterative

76 Heap Iterative

77 Running Time: i log(n) -i 2 log(n) -i Iterative

78 Heap Pop/Push/Changes With Pop, a Priority Queue returns the highest priority data item. This is at the root. 21

79 Heap Pop/Push/Changes But this is now the wrong shape! To keep the shape of the tree, which space should be deleted?

80 Heap Pop/Push/Changes What do we do with the element that was there? Move it to the root. 3 3

81 Heap Pop/Push/Changes But now it is not a heap! The left and right subtrees still are heaps. 3 3

82 Heap Pop/Push/Changes But now it is not a heap! 3 3 The 3 “bubbles” down until it finds its spot. The max of these three moves up. Time = O(log n)

83 When inserting a new item, to keep the shape of the tree, which new space should be filled? 21 Heap Pop/Push/Changes

84 21 Heap Pop/Push/Changes But now it is not a heap! The 21 “bubbles” up until it finds its spot. The max of these two moves up. 30 Time = O(log n)

85 Adaptable Heap Pop/Push/Changes But now it is not a heap! The 39 “bubbles” down or up until it finds its spot. Suppose some outside user knows about some data item c and remembers where it is in the heap. And changes its priority from 21 to c 39 27

86 Adaptable Heap Pop/Push/Changes But now it is not a heap! The 39 “bubbles” down or up until it finds its spot. Suppose some outside user also knows about data item f and its location in the heap just changed. The Heap must be able to find this outside user and tell him it moved. 21 c f Time = O(log n)

87 Heap Implementation A location-aware heap entry is an object storing  key  value  position of the entry in the underlying heap In turn, each heap position stores an entry Back pointers are updated during entry swaps Last Update: Oct 23, 2014 Andy87 4 a 2 d 6 b 8 g 5 e 9 c

88 Selection Sort Largest i values are sorted on side. Remaining values are off to side. 6,7,8,9 < Exit 79 km75 km Exit Max is easier to find if a heap. Selection

89 Heap Sort Largest i values are sorted on side. Remaining values are in a heap. Exit 79 km75 km Exit

90 Heap Sort Largest i values are sorted on side. Remaining values are in a heap. Exit 79 km75 km Exit

91 Heap Data Structure HeapArray Heap Array

92 Heap Sort Largest i values are sorted on side. Remaining values are in a heap. Exit Put next value where it belongs. 79 km75 km Exit Heap ?

93 Heap Sort

94 Heap Sort ? ?? ?? ?? Sorted

95 Heap Sort Running Time:

96 Communication & Entropy Tell Uncle Lazare the location and the velocity of each particle. The log of number of possibilities equals the number of bits to communicate it In thermodynamics, entropy is a measure of disorder.thermodynamics Lazare Carnot (1803) It is a measured as the logarithm of the number of specific ways in which the micro world may be arranged, given the macro world. Few bits needed Low entropy Lots of bits needed High entropy

97 Communication & Entropy Tell Uncle Shannon which toy you want Bla bla bla bla bla bla No. Please use the minimum number of bits to communicate it Great, but we need a code … 011 Oops. Was that or Claude Shannon (1948)

98 Communication & Entropy Claude Shannon (1948) I follow the path and get Use a Huffman Code described by a binary tree.

99 Communication & Entropy Claude Shannon (1948) Use a Huffman Code described by a binary tree I first get, the I start over to get

100 Communication & Entropy Claude Shannon (1948) Objects that are more likely will have shorter codes. I get it. I am likely to answer. so you give it a 1 bit code.

101 Communication & Entropy Claude Shannon (1948) P i is the probability of the i th toy. L i is the length of the code for the i th toy. LiLi The expected number of bits sent is =  i p i  L i P i = 0.01 We choose the code lengths L i to minimized this. Then we call it the Entropy of the distribution on toys.

102 Ingredients: Instances: Probabilities of objects. Solutions: A Huffman code tree. Cost of Solution: The expected number of bits sent =  i p i  L i Goal: Given probabilities, find code with minimum number of expected bits. Communication & Entropy

103 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

104 Communication & Entropy 0.02 Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

105 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

106 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

107 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

108 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

109 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

110 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

111 Communication & Entropy Greedy Algorithm. Put the objects in a priority queue sorted by probabilities. Take the two objects with the smallest probabilities. They should have the longest codes. Put them in a little tree. Join them into one object, with the sum probability. Repeat

112 Communication & Entropy

113 Communication & Entropy

114 Communication & Entropy 1 Greedy Algorithm. Done when one object (of probability 1)

115 Communication & Entropy Claude Shannon (1948) P i is the probability of the i th toy. L i is the length of the code for the i th toy P i = 0.01 LiLi The expected number of bits sent is =  i p i  L i Huffman’s algorithm says how to choose the code lengths L i to minimize the expected number of bits sent. We want a nice equation for this number. What if relax the condition that L i is an integer?

116 Communication & Entropy Claude Shannon (1948) P i is the probability of the i th toy. L i is the length of the code for the i th toy P i = 0.01 LiLi The expected number of bits sent is =  i p i  L i This is minimized by setting L i = log(1/p i ) Why? Suppose all toys had probability p i = 0.031, Then there would be 1/p i = 32 toys, Then the codes would have length L i = log(1/p i )=5.

117 Communication & Entropy Claude Shannon (1948) P i is the probability of the i th toy. L i is the length of the code for the i th toy P i = 0.01 LiLi The expected number of bits sent is =  i p i  L i This is minimized by setting L i = log(1/p i ) giving the expected number of bits is H(p) =  i p i  log(1/p i ). (Entropy) (The answer given by Huffman Codes is at most one bit longer.)

118 Communication & Entropy Claude Shannon (1948) Let X, Y, and Z be random variables. i.e. they take on random values according to some probability distributions P i = 0.01 LiLi Once the values are chosen, the expected number of bits needed to communicate the value of X is … H(p) =  i p i  log(1/p i ). (Entropy) H(X) =  x pr(X=x)  log(1/pr(X=x)). X = toy chosen by this distribution.

119 Entropy The Entropy H(X) is the expected number of bits to communicate the value of X. It can be drawn as the area of this circle.

120 Entropy H(XY) then is the expected number of bits to communicate the value of both X and Y.

121 Entropy If I tell you the value of Y, then H(X|Y) is the expected number of bits to communicate the value of X. Note that if X and Y are independent, then knowing Y does not help and H(X|Y) = H(X)

122 Entropy I(X;Y) is the number of bits that are revealed about X by me telling you Y. Note that if X and Y are independent, then knowing Y does not help and I(X;Y) = 0. Or about Y by telling you X.

123 Entropy

Splay Trees  Self-balancing BST  Invented by Daniel Sleator and Bob Tarjan  Allows quick access to recently accessed elements  Bad: worst-case O(n)  Good: average (amortized) case O(log n)  Often perform better than other BSTs in practice D. Sleator R. Tarjan

Splaying  Splaying is an operation performed on a node that iteratively moves the node to the root of the tree.  In splay trees, each BST operation (find, insert, remove) is augmented with a splay operation.  In this way, recently searched and inserted elements are near the top of the tree, for quick access.

3 Types of Splay Steps  Each splay operation on a node consists of a sequence of splay steps.  Each splay step moves the node up toward the root by 1 or 2 levels.  There are 2 types of step:  Zig-Zig  Zig-Zag  Zig  These steps are iterated until the node is moved to the root.

Zig-Zig  Performed when the node x forms a linear chain with its parent and grandparent.  i.e., right-right or left-left y x T1T1 T2T2 T3T3 z T4T4 zig-zig y z T4T4 T3T3 T2T2 x T1T1

Zig-Zag  Performed when the node x forms a non-linear chain with its parent and grandparent  i.e., right-left or left-right zig-zag y x T2T2 T3T3 T4T4 z T1T1 y x T2T2 T3T3 T4T4 z T1T1

Zig  Performed when the node x has no grandparent  i.e., its parent is the root zig x w T1T1 T2T2 T3T3 y T4T4 y x T2T2 T3T3 T4T4 w T1T1

Splay Trees & Ordered Dictionaries  which nodes are splayed after each operation? use the parent of the internal node w that was actually removed from the tree (the parent of the node that the removed item was swapped with) remove(k) use the new node containing the entry inserted insert(k,v) if key found, use that node if key not found, use parent of external node where search terminated find(k) splay nodemethod

Recall BST Deletion  Now consider the case where the key k to be removed is stored at a node v whose children are both internal  we find the internal node w that follows v in an inorder traversal  we copy key(w) into node v  we remove node w and its left child z (which must be a leaf) by means of operation removeExternal( z )  Example: remove 3 – which node will be splayed? v w z v 2

Note on Deletion  The text (Goodrich, p. 463) uses a different convention for BST deletion in their splaying example  Instead of deleting the leftmost internal node of the right subtree, they delete the rightmost internal node of the left subtree.  We will stick with the convention of deleting the leftmost internal node of the right subtree (the node immediately following the element to be removed in an inorder traversal).

Performance  Worst-case is O(n)  Example:  Find all elements in sorted order  This will make the tree a left linear chain of height n, with the smallest element at the bottom  Subsequent search for the smallest element will be O(n)

Performance  Average-case is O(log n)  Proof uses amortized analysis  We will not cover this  Operations on more frequently-accessed entries are faster.  Given a sequence of m operations, the running time to access entry i is: where f(i) is the number of times entry i is accessed.