Balanced Search Trees Robert Tarjan, Princeton University & HP Labs

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Chapter 13. Red-Black Trees
COSC 2007 Data Structures II Chapter 12 Advanced Implementation of Tables II.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 18 L10.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 10 Prof. Erik Demaine.
CS16: Introduction to Data Structures & Algorithms
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
Introduction to Algorithms Red-Black Trees
CSE 4101/5101 Prof. Andy Mirzaian. Lists Move-to-Front Search Trees Binary Search Trees Multi-Way Search Trees B-trees Splay Trees Trees Red-Black.
10 -1 Chapter 10 Amortized Analysis A sequence of operations: OP 1, OP 2, … OP m OP i : several pops (from the stack) and one push (into the stack)
AVL Trees CSE 373 Data Structures Lecture 8. 12/26/03AVL Trees - Lecture 82 Readings Reading ›Section 4.4,
CSE Lecture 17 – Balanced trees
Rizwan Rehman Centre for Computer Studies Dibrugarh University
Foundations of Data Structures Practical Session #7 AVL Trees 2.
Part II. Delete an Node from an AVL Tree Consider to delete
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
AVL Tree Smt Genap Outline AVL Tree ◦ Definition ◦ Properties ◦ Operations Smt Genap
AVL Trees CS II – Fall /8/2010. Announcements HW#2 is posted – Uses AVL Trees, so you have to implement an AVL Tree class. Most of the code is provided.
Rank-Pairing Heaps Robert Tarjan, Princeton University & HP Labs Joint work with Bernhard Haeupler and Siddhartha Sen, ESA
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
Balanced Binary Search Trees
A balanced life is a prefect life.
CSE332: Data Abstractions Lecture 7: AVL Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer
Tirgul 5 AVL trees.
Rank-Balanced Trees Siddhartha Sen, Princeton University WADS 2009 Joint work with Bernhard Haeupler and Robert E. Tarjan 1.
CSC 213 Lecture 7: Binary, AVL, and Splay Trees. Binary Search Trees (§ 9.1) Binary search tree (BST) is a binary tree storing key- value pairs (entries):
Rank-Pairing Heaps Bernhard Haeupler, Siddhartha Sen, and Robert Tarjan, ESA
Dynamic Set AVL, RB Trees G.Kamberova, Algorithms Dynamic Set ADT Balanced Trees Gerda Kamberova Department of Computer Science Hofstra University.
Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search trees –Tree rotations –Red-black trees Move to front.
Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.
1 AVL-Trees: Motivation Recall our discussion on BSTs –The height of a BST depends on the order of insertion E.g., Insert keys 1, 2, 3, 4, 5, 6, 7 into.
New Balanced Search Trees Siddhartha Sen Princeton University Joint work with Bernhard Haeupler and Robert E. Tarjan.
AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.
Balancing Binary Search Trees. Balanced Binary Search Trees A BST is perfectly balanced if, for every node, the difference between the number of nodes.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
CS 253: Algorithms Chapter 13 Balanced Binary Search Trees (Balanced BST) AVL Trees.
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
© 2004 Goodrich, Tamassia Trees
Balanced Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.
AVL Trees AVL (Adel`son-Vel`skii and Landis) tree = – A BST – With the property: For every node, the heights of the left and right subtrees differ at most.
CSE332: Data Abstractions Lecture 7: AVL Trees
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
Lec 13 Oct 17, 2011 AVL tree – height-balanced tree Other options:
Data Structures – LECTURE Balanced trees
Self-Adjusting Search trees
Balancing Binary Search Trees
CS 201 Data Structures and Algorithms
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
AVL Trees "The voyage of discovery is not in seeking new landscapes but in having new eyes. " - Marcel Proust.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Red-Black Trees Motivations
Monday, April 16, 2018 Announcements… For Today…
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Multi-Way Search Trees
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Algorithms and Data Structures Lecture VIII
AVL Trees CSE 373 Data Structures.
CSE 332: Data Abstractions AVL Trees
Self-Balancing Search Trees
CS202 - Fundamental Structures of Computer Science II
CSE2331/5331 Topic 7: Balanced search trees Rotate operation
Richard Anderson Spring 2016
CSE 373 Data Structures Lecture 8
CS202 - Fundamental Structures of Computer Science II
Presentation transcript:

Balanced Search Trees Robert Tarjan, Princeton University & HP Labs (Joint work with Bernhard Haeupler and Siddhartha Sen)

Searching: Dictionary Problem Maintain a set of items, so that Access: find a given item Insert: add a new item Delete: remove an item are efficient Assumption: items are totally ordered, so that binary comparison is possible

Balanced Search Trees AVL trees red-black trees weight balanced trees binary B-trees 2,3 trees B trees etc. binary multiway

Topics Rank-balanced trees [WADS 2009] Ravl trees [SODA 2010] Example of exploring the design space Ravl trees [SODA 2010] Example of an idea from practice Splay trees [Sleator & Tarjan 1983]

Exploring the design space… Joint work with B. Haeupler and S. Sen Rank-Balanced Trees Exploring the design space… Joint work with B. Haeupler and S. Sen

Problem with BSTs: Imbalance How to bound the height? Maintain local balance condition, rebalance after insert or delete balanced tree Restructure after each access self-adjusting tree Store balance information in nodes, guarantee O(log n) height After insert/delete, restore balance bottom-up (top-down): Update balance information Restructure along access path a b c d e f

Restructuring primitive: Rotation y x right left x y C A A B B C Preserves symmetric order Changes heights Takes O(1) time 7

Known Balanced BSTs AVL trees red-black trees weight balanced trees binary B-trees etc. Goal: small height, little rebalancing, simple algorithms small height little rebalancing

Ranked Binary Trees Each node has an integer rank Convention: leaves have rank 0, missing nodes have rank -1 rank difference of a child = rank of parent  rank of child i-child: node of rank difference i i,j-node: children have rank differences i and j

Example of a ranked binary tree 2 b e 1 1 1 1 a 1 c f 1 1 If all rank differences are positive, rank  height

Rank-Balanced Trees AVL trees: every node is a 1,1- or 1,2-node Rank-balanced trees: every node is a 1,1-, 1,2-, or 2,2-node (rank differences are 1 or 2) Red-black trees: all rank differences are 0 or 1, no 0-child is the parent of another Each needs one balance bit per node.

Basic height bounds nk = minimum n for rank k AVL trees: n0 = 1, n1 = 2, nk = nk-1 + nk-2 + 1 nk = Fk+3 - 1  k  log n  1.44lg n Rank-balanced trees: n0 = 1, n1 = 2, nk = 2nk-2, nk = 2k/2  k  2lg n Same height bound for red-black trees Fk = kth Fibonacci number  = (1 + 5)/2 Fk+2 > k

Rank-balanced trees: Insertion A new leaf q has a rank of zero If the parent p of q was a leaf before, q is a 0-child and violates the rank rule

Insertion Rebalancing Non-terminal 14

Insertion example Insert e b > a c Demote c > d Promote d 2 > a c Demote c 2 1 1 > d Promote d Rotate left at d 1 1 > e Insert e 1 15

Insertion example Insert f Insert e Demote b b > a d Promote d 2 > a d 2 1 1 Promote d Rotate left at d 2 > c 1 2 e 1 Promote e 1 > f Insert f Insert e 1 16

Insertion example d 2 b e 1 1 1 1 a 1 c f 1 1 Insert f 17

Rank-balanced trees: Deletion If node has two children, swap with symmetric-order successor or predecessor Becomes a leaf (just delete) or node with one child (replace with child) If node q replaces the deleted node and p is its parent, a violation occurs if p is a leaf of rank one or q is a 3-child

Deletion Rebalancing Non-terminal 19

Deletion example Delete f Delete d Delete a d e Swap with successor Double demote e 2 Demote b b f d e 1 1 1 1 2 Delete a 1 c f 1 1 Double rotate at c Double promote c Delete f Delete d Delete a 20

Deletion example c 2 b e 2 2 Delete f 21

Rebalancing Time Theorem. A rank-balanced tree built by m insertions and d deletions does at most 3m + 6d rebalancing steps.

Proof idea: Make non-terminating cases release potential Add proof idea: exponential potential function (using discrete version, but can use continous)

Proof. Define the potential of a node: 1 if it is a 1,1-node 2 if it is a 2,2-node Zero otherwise Potential of tree = sum of potentials of nodes Non-terminating steps are free Terminating steps increase potential by O(1) Add proof idea: exponential potential function (using discrete version, but can use continous)

Rank-Balanced Trees Red-Black Trees height  2lg n 2 rotations per rebalancing O(1) amortized rebalancing time Red-Black Trees height  2lg n 3 rotations per rebalancing O(1) amortized rebalancing time

Tree Height Sequential Insertions: rank-balanced red-black height = lg n (best) height = 2lg n (worst)

Tree Height Theorem 1. A rank-balanced tree built by m insertions intermixed with arbitrary deletions has height at most log m. If m = n, same height as AVL trees Overall height is min{2lg n, log m}

Proof idea: Exponential potential function Exploit the exponential structure of the tree Add proof idea: exponential potential function (using discrete version, but can use continous)

When a node is deleted, add its count to parent Proof. Give a node a count of 1 when inserted. Define the potential of a node: Total count in its subtree When a node is deleted, add its count to parent k = minimum potential of a node of rank k Claim: 0 = 1, 1 = 2, k = 1 + k-1 + k-2 for k > 1  m  Fk+3  1  k Add proof idea: exponential potential function (using discrete version, but can use continous)

Show that k = 1 + k-1 + k-2 for k > 1 Easy to show for 1,1- and 1,2-nodes Harder for 2,2-nodes (created by deletions) But counts are inherited Add proof idea: exponential potential function (using discrete version, but can use continous)

Rebalancing Frequency How high does rebalancing propagate? O(m + d) rebalancing steps total, which implies  O((m + d)/k) insertions/deletions at rank k Actually, we can show something much stronger

Rebalancing Frequency Theorem. In a rank-balanced tree built by m insertions and d deletions, the number of rebalancing steps of rank k is at most O((m + d)/2k/3). Good for concurrent workloads

Proof. Define the potential of a node of rank k: bk if it is a 1,1- or 2,2-node bk2 if it is a 1,2-node where b = 21/3 Potential change in non-terminal steps telescopes Combine this effect with initialization and terminal step Add proof idea: exponential potential function (using discrete version, but can use continous)

Telescoping potential:  = -bk+3 … bk+2 - bk+3 1 bk+1 - bk+2 1 1 2 bk - bk+1 1 1 2 bk-1 - bk 1 1 2 1 1 2 … 34

Truncate growth of potential at rank k  3: Nodes of rank < k3 have same potential Nodes of rank  k3 have potential as if rank k3 Rebalancing step of rank k reduces the potential by bk3 Same idea should work for red-black trees (we think)

Summary Rank-balanced trees are a relaxation of AVL trees with behavior theoretically as good as red-black trees and better in important ways. Especially height bound of min{2lg n, log m} Exponential potential functions yield new insights into the efficiency of rebalancing

An idea from practice… Joint work with S. Sen Ravl Trees An idea from practice… Joint work with S. Sen

Balanced Search Trees AVL trees rank-balanced trees red-black trees weight balanced trees Binary B-trees 2,3 trees B trees etc. Common problem: Deletion is a pain! binary multiway

Deletion in balanced search trees Deletion is problematic May need to swap item with its successor/ predecessor Rebalancing is more complicated than during insertion Synchronization reduces available parallelism [Gray and Reuter]

Example: Rank-balanced trees Non-terminal Synchronization 

Deletion rebalancing: solutions? Don’t discuss it! Textbooks Don’t do it! Berkeley DB and other database systems Unnamed database provider…

Storytime…

Deletion Without Rebalancing Is this a good idea? Empirical and average-case analysis suggests yes for B+ trees (database systems) How about binary trees? Failed miserably in real application with red-black trees No worst-case analysis, probably because of assumption that it is very bad

Deletion Without Rebalancing We present such balanced search trees, where: Height remains logarithmic in m, the number of insertions Amortized time per insertion or deletion is O(1) Rebalancing affects nodes exponentially infrequently in their heights Binary trees: use (loglog m) bits of balance information per node Red-black, AVL, rank-balanced trees use only one bit! Similar results hold for B+ trees, easier [ISAAC 2009]

Ravl(relaxed AVL) Trees AVL trees: every node is a 1,1- or 1,2-node Rank-balanced trees: every node is a 1,1-, 1,2-, or 2,2-node (rank differences are 1 or 2) Red-black trees: all rank differences are 0 or 1, no 0-child is the parent of another Ravl trees: every rank difference is positive Any tree is a ravl tree; efficiency comes from design of operations

Ravl trees: Insertion Same as rank-balanced trees (AVL trees)!

Insertion Rebalancing Non-terminal

Ravl trees: Deletion  If node has two children, swap with symmetric-order successor or predecessor. Delete. Replace by child. Swapping not needed if all data in leaves (external representation). Mention that can store data in leaves, then don’t have to do swaps

Deletion example Delete f Delete d Delete a e d Swap with successor b 2 b e d f 1 1 1 2 Delete 1 a 1 c f 1 1 Delete f Delete d Delete a 49

Deletion example e 2 > b g 1 1 2 c 1 Insert g 50

Tree Height Theorem 1. A ravl tree built by m insertions intermixed with arbitrary deletions has height at most log m. Compared to standard AVL trees: If m = n, height is the same If m = O(n), height within an additive constant If m = poly(n), height within a constant factor

Proof idea: exponential potential function Exploit the exponential structure of the tree Add proof idea: exponential potential function (using discrete version, but can use continous)

Potential of tree = sum of potentials of nodes Proof. Let Fk be the kth Fibonacci number. Define the potential of a node of rank k: Fk+2 if it is a 0,1-node Fk+1 if it has a 0-child but is not a 0,1-node Fk if it is a 1,1 node Zero otherwise Potential of tree = sum of potentials of nodes Recall: F0 = 1, F1 = 1, Fk = Fk1 + Fk2 for k > 1 Fk+2 > k Add proof idea: exponential potential function (using discrete version, but can use continous)

Deletion does not increase potential Proof. Let Fk be the kth Fibonacci number. Define the potential of a node of rank k: Fk+2 if it is a 0,1-node Fk+1 if it has a 0-child but is not a 0,1-node Fk if it is a 1,1 node Zero otherwise Deletion does not increase potential Insertion increases potential by  1, so total potential is  m  1 Rebalancing steps don’t increase the potential

Consider a rebalancing step of rank k: Fk+1 + Fk+2 Fk+3 + 0 0 + Fk+2 Fk+2 + 0 Fk+2 + 0 0 + 0

Consider a rebalancing step of rank k: Fk+1 + 0 Fk + Fk-1

Consider a rebalancing step of rank k: Fk+1 + 0 + 0 Fk + Fk-1 + 0

If rank of root is r, there was a promotion of rank k that did not create a 1,1-node, for 0 < k < r1 Total decrease in potential: Since potential is always non-negative:

Rebalancing Frequency Theorem 2. In a ravl tree built by m insertions intermixed with arbitrary deletions, the number of rebalancing steps of rank k is at most  O(1) amortized rebalancing steps

Proof. Truncate the potential function: Nodes of rank < k have same potential Nodes of rank  k have zero potential (with one exception for rank = k) Deletion does not increase potential Insertion increases potential by  1, so total potential is  m  1 Rebalancing steps don’t increase the potential

Proof. Truncate the potential function: Nodes of rank < k have same potential Nodes of rank  k have zero potential (with one exception for rank = k) Step of rank k preceded by promotion of rank k  1, which reduces potential by: Fk+1 if stop or promotion at rank k Fk+1  Fk1 = Fk if (double) rotation at rank k Potential can decrease by at most (m1)/Fk

Disadvantage of Ravl Trees? Tree height may be (log n) Only happens when ratio of deletions to insertions approaches 1, but may be a concern for some applications Address by periodically rebuilding the tree

Periodic Rebuilding Rebuild the tree (all at once or incrementally) when rank r of root ( tree height) is too high Rebuild when r > log n + c for fixed c > 0: Rebuilding time is O(1/(c  1)) per deletion Then tree height is always log n + O(1)

Constant bits? Ravl tree stores (loglog n) balance bits per node Various methods that use O(1) bits fail (see counterexamples in paper) Main problem: deletion can increase the ranks of nodes; if we force all deletions to occur at leaves, then an O(1)-bit scheme exists But now a deletion may require multiple swaps

Summary Deletion without rebalancing in binary trees has good worst-case properties, including: Logarithmic height bound Exponentially infrequent node updates With periodic rebuilding, can maintain height logarithmic in n Open problem: Requires (loglog n) balance bits per node?

Experiments

Preliminary Experiments Compared three trees that achieve O(1) amortized rebalancing time Red-black trees Rank-balanced trees Ravl trees Performance in practice depends on the workload!

Preliminary Experiments Test Red-black trees Rank-balanced trees Ravl trees # rots  106 # bals avg. pLen max. Random 26.44 116.07 10.47 15.63 29.55 133.74 10.39 15.09 14.32 80.61 11.11 16.75 Queue 50.32 285.13 11.38 22.50 50.33 184.53 11.20 14.00 33.55 134.22 Working set 41.71 185.35 10.51 16.18 43.69 159.69 10.45 15.35 28.00 119.92 16.64 Static Zipf 25.24 112.86 10.41 15.46 28.27 130.93 10.34 15.05 13.48 78.03 11.12 17.68 Dynamic Zipf 23.18 103.48 10.48 15.66 26.04 125.99 10.40 15.16 12.66 74.28 16.84 213 nodes, 226 operations No periodic rebuilding in ravl trees No rebuilding

Preliminary Experiments Test Red-black trees Rank-balanced trees Ravl trees # rots  106 # bals avg. pLen max. Random 26.44 116.07 10.47 15.63 29.55 133.74 10.39 15.09 14.32 80.61 11.11 16.75 Queue 50.32 285.13 11.38 22.50 50.33 184.53 11.20 14.00 33.55 134.22 Working set 41.71 185.35 10.51 16.18 43.69 159.69 10.45 15.35 28.00 119.92 16.64 Static Zipf 25.24 112.86 10.41 15.46 28.27 130.93 10.34 15.05 13.48 78.03 11.12 17.68 Dynamic Zipf 23.18 103.48 10.48 15.66 26.04 125.99 10.40 15.16 12.66 74.28 16.84 rank-balanced: 8.2% more rots, 0.77% more bals ravl: 42% fewer rots, 35% fewer bals

Preliminary Experiments Test Red-black trees Rank-balanced trees Ravl trees # rots  106 # bals avg. pLen max. Random 26.44 116.07 10.47 15.63 29.55 133.74 10.39 15.09 14.32 80.61 11.11 16.75 Queue 50.32 285.13 11.38 22.50 50.33 184.53 11.20 14.00 33.55 134.22 Working set 41.71 185.35 10.51 16.18 43.69 159.69 10.45 15.35 28.00 119.92 16.64 Static Zipf 25.24 112.86 10.41 15.46 28.27 130.93 10.34 15.05 13.48 78.03 11.12 17.68 Dynamic Zipf 23.18 103.48 10.48 15.66 26.04 125.99 10.40 15.16 12.66 74.28 16.84 rank-balanced: 0.87% shorter apl, 10% shorter mpl ravl: 5.6% longer apl, 4.3% longer mpl

Ongoing/future experiments Trees: AVL trees Binary B-trees (Sedgewick’s implementation) Deletion schemes: Lazy deletion (avoids swapping, uses extra space) Tests: Real workloads! Degradation over time

The End