New Balanced Search Trees Siddhartha Sen Princeton University Joint work with Bernhard Haeupler and Robert E. Tarjan.

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

Balanced Search Trees Robert Tarjan, Princeton University & HP Labs
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
1 AVL Trees (10.2) CSE 2011 Winter April 2015.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
AVL Tree Smt Genap Outline AVL Tree ◦ Definition ◦ Properties ◦ Operations Smt Genap
AVL Trees CS II – Fall /8/2010. Announcements HW#2 is posted – Uses AVL Trees, so you have to implement an AVL Tree class. Most of the code is provided.
Rank-Pairing Heaps Robert Tarjan, Princeton University & HP Labs Joint work with Bernhard Haeupler and Siddhartha Sen, ESA
CS202 - Fundamental Structures of Computer Science II
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
Balanced Binary Search Trees
Balanced Search Trees. 2-3 Trees Trees Red-Black Trees AVL Trees.
November 5, Algorithms and Data Structures Lecture VIII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
CSE332: Data Abstractions Lecture 7: AVL Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer
Tirgul 5 AVL trees.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Rank-Balanced Trees Siddhartha Sen, Princeton University WADS 2009 Joint work with Bernhard Haeupler and Robert E. Tarjan 1.
Rank-Pairing Heaps Bernhard Haeupler, Siddhartha Sen, and Robert Tarjan, ESA
Self-Balancing Search Trees Chapter 11. Chapter 11: Self-Balancing Search Trees2 Chapter Objectives To understand the impact that balance has on the performance.
Fall 2007CS 2251 Self-Balancing Search Trees Chapter 9.
Self-Balancing Search Trees Chapter 11. Chapter Objectives  To understand the impact that balance has on the performance of binary search trees  To.
Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search trees –Tree rotations –Red-black trees Move to front.
Balanced Search Trees CS 3110 Fall Some Search Structures Sorted Arrays –Advantages Search in O(log n) time (binary search) –Disadvantages Need.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms1.
1 AVL-Trees: Motivation Recall our discussion on BSTs –The height of a BST depends on the order of insertion E.g., Insert keys 1, 2, 3, 4, 5, 6, 7 into.
AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Balancing Binary Search Trees. Balanced Binary Search Trees A BST is perfectly balanced if, for every node, the difference between the number of nodes.
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
1 Balanced Trees There are several ways to define balance Examples: –Force the subtrees of each node to have almost equal heights –Place upper and lower.
Binary SearchTrees [CLRS] – Chap 12. What is a binary tree ? A binary tree is a linked data structure in which each node is an object that contains following.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
CS 253: Algorithms Chapter 13 Balanced Binary Search Trees (Balanced BST) AVL Trees.
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
© 2004 Goodrich, Tamassia Trees
Four different data structures, each one best in a different setting. Simple Heap Balanced Heap Fibonacci Heap Incremental Heap Our results.
AVL trees1 AVL Trees Height of a node : The height of a leaf is 1. The height of a null pointer is zero. The height of an internal node is the maximum.
Balanced Binary Search Trees
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Red-Black Trees an alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A.
CSE373: Data Structures & Algorithms Lecture 8: AVL Trees and Priority Queues Linda Shapiro Spring 2016.
CSE332: Data Abstractions Lecture 7: AVL Trees
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
Data Structures – LECTURE Balanced trees
B/B+ Trees 4.7.
Self-Adjusting Search trees
Balancing Binary Search Trees
CS202 - Fundamental Structures of Computer Science II
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
AVL Trees "The voyage of discovery is not in seeking new landscapes but in having new eyes. " - Marcel Proust.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Red-Black Trees Motivations
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
Lecture 9 Algorithm Analysis
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Algorithms and Data Structures Lecture VIII
AVL Trees CSE 373 Data Structures.
CSE 332: Data Abstractions AVL Trees
CS202 - Fundamental Structures of Computer Science II
Richard Anderson Spring 2016
CSE 373 Data Structures Lecture 8
CS202 - Fundamental Structures of Computer Science II
Presentation transcript:

New Balanced Search Trees Siddhartha Sen Princeton University Joint work with Bernhard Haeupler and Robert E. Tarjan

Research Agenda Elegant solutions to fundamental problems – Systematically explore the design space – Keep design simple, allow complexity in analysis Theoretical justification for elegant solutions – Look at what people do in practice

Searching: Dictionary Problem Maintain a set of items, so that Access: find a given item Insert: add a new item Delete: remove an item are efficient Assumption: items are totally ordered, binary comparison is possible

Balanced Search Trees AVL trees red-black trees weight balanced trees LLRB trees, AA trees 2,3 trees B trees etc. multiway binary

Agenda Rank-balanced trees [WADS 2009] – Proof technique Ravl trees [SODA 2010] – Proofs Experiments

Problem with BSTs: Imbalance How to bound height? Maintain local balance condition, rebalance after insert/delete balanced tree Restructure after each access self-adjusting tree a b c d e f

Problem with BSTs: Imbalance How to bound height? Maintain local balance condition, rebalance after insert/delete balanced tree Restructure after each access self-adjusting tree Store balance information in nodes, rebalance bottom-up (or top-down) Update balance information Restructure along access path a b c d e f

Restructuring primitive: Rotation Preserves symmetric order Changes heights Takes O(1) time y x AB C x y BC A right left

Known Balanced BSTs AVL trees red-black trees weight balanced trees LLRB trees, AA trees etc. Goal: small height, little rebalancing, simple algorithms  small height  little rebalancing

Ranked Binary Trees Each node has integer rank Convention: leaves have rank 0, missing nodes have rank -1 rank difference of child = rank of parent  rank of child i-child: node of rank difference i i,j-node: children have rank differences i and j Estimate for height

Example of a ranked binary tree If all rank differences positive, rank  height 1 f 1 1 e d b 2 a c

Rank-Balanced Trees AVL trees: every node is a 1,1- or 1,2-node Rank-balanced trees: every node is a 1,1-, 1,2-, or 2,2- node (rank differences are 1 or 2) Red-black trees: all rank differences are 0 or 1, no 0- child is the parent of another All need one balance bit per node

Basic height bounds n k = minimum n for rank k Rank-balanced trees: n 0 = 1, n 1 = 2, n k = 2n k-2 + 1, n k = 2  k/2   k  2lg n Red-black trees: same AVL trees: k  log  n  1.44lg n  = (1 +  5)/2

Rank-Balanced Trees height  2lg n  2 rotations per rebalancing O(1) amortized rebalancing time Red-Black Trees height  2lg n  3 rotations per rebalancing O(1) amortized rebalancing time

Rank-Balanced Trees height  min{2lg n, log  m}  2 rotations per rebalancing O(1) amortized rebalancing time Red-Black Trees height  2lg n  3 rotations per rebalancing O(1) amortized rebalancing time I win

Tree Height Theorem. A rank-balanced tree built by m insertions intermixed with arbitrary deletions has height at most log  m. If m = n, same height as AVL trees Overall height is min{2lg n, log  m}

Rebalancing Frequency Theorem. In a rank-balanced tree built by m insertions and d deletions, the number of rebalancing steps of rank k is at most O((m + d)/2 k/3 ). Exponentially better than O((m + d)/k) Good for concurrent workloads Similar result for red-black trees (b = 2 1/2 )

Exponential analysis Exploit exponential structure of tree … use an exponential potential function!

Proof idea: Define potential of node of rank k b k ± c where b = fixed constant, c depends on node Insertion/deletion increases potential by  O(1), so total potential  O(m) Choose c so that potential change during rebalancing telescopes  no net increase

Show that rebalancing step of rank k reduces potential by b k ± c – At root, happens automatically – At non-root, need to truncate potential function Tree height:  b k ± c  O(m)  k  log b m ± c Rebalancing frequency:  b k ± c  O(m)    m/(b k ± c )

Summary Rank-balanced trees achieve AVL-type height bound, exponentially infrequent rebalancing Exponential analysis yields new insights into efficiency of rebalancing Bounds in terms of m only, not n… Can we exploit this flexibility?

Where’s the pain? AVL trees rank-balanced trees red-black trees weight balanced trees LLRB trees, AA trees 2,3 trees B trees etc. Common problem: Deletion is a pain! multiway binary

Deletion is problematic More complicated than insertion May need to swap item with successor/ predecessor Synchronization reduces available parallelism [Gray and Reuter]

Example: Rank-balanced trees Non-terminal Synchronization 

Solutions? Don’t discuss it! – Textbooks Don’t do it! – Berkeley DB and other database systems – Unnamed database provider…

Deletion Without Rebalancing Good idea? Yes for B+ trees (database systems), based on empirical and average-case analysis How about binary trees? Failed miserably in real app with red-black trees

Yes! Can apply exponential analysis: – Height logarithmic in m, number of insertions – Rebalancing exponentially infrequent in height Binary trees: use  (loglog m) bits of balance information per node Red-black, AVL, rank-balanced trees use only one bit Similar results hold for B + trees, easier [ISAAC 2009] Deletion Without Rebalancing

Ravl Trees AVL trees: every node is a 1,1- or 1,2-node Rank-balanced trees: every node is a 1,1-, 1,2-, or 2,2- node (rank differences are 1 or 2) Red-black trees: all rank differences are 0 or 1, no 0- child is the parent of another Ravl trees: every rank difference is positive Any tree is a ravl tree; efficiency comes from design of operations

Ravl trees: Insertion A new leaf q has a rank of zero If the parent p of q was a leaf before, q is a 0- child and violates the rank rule

Insertion Rebalancing Non-terminal Same as rank-balanced trees, AVL trees

Ravl trees: Deletion If node has two children, swap with symmetric- order successor or predecessor

e d b a c 2 Example Insert f > > > f Rotate left at d Demote b Promote e Promote d

33 1 Insert f f 1 1 e d b 2 Example a c

2 1 0 def e Delete aDelete fDelete d 1 Swap with successor Delete 1 f 1 d b 2 Example a c

Insert g e 1 b 2 Example c > g 2 0

Tree Height Theorem 1. A ravl tree built by m insertions intermixed with arbitrary deletions has height at most log  m. Compared to standard AVL trees: If m = n, height is same If m = O(n), height within additive constant If m = poly(n), height within constant factor

Proof. Let F k be k th Fibonacci number. Define potential of node of rank k: F k+2 if 0,1-node F k+1 if not 0,1-node but has 0-child F k if 1,1 node Zero otherwise Potential of tree = sum of potentials of nodes Recall: F 0 = 1, F 1 = 1, F k = F k  1 + F k  2 for k > 1 F k+2 >  k

Proof. Let F k be k th Fibonacci number. Define potential of node of rank k: F k+2 if 0,1-node F k+1 if not 0,1-node but has 0-child F k if 1,1 node Zero otherwise Deletion does not increase potential Insertion increases potential by  1, so total potential  m  1 Rebalancing steps don’t increase potential

Consider rebalancing step of rank k: F k+1 + F k+2 F k F k+2 F k F k

Consider rebalancing step of rank k: F k F k + F k-1

Consider rebalancing step of rank k: F k F k + F k-1 + 0

If rank of root is r, then increase of rank k did not create 1,1-node for 0 < k < r  1 Total decrease in potential: Since potential always non-negative:

Rebalancing Frequency Theorem 2. In a ravl tree built by m insertions intermixed with arbitrary deletions, the number of rebalancing steps of rank k is at most  O(1) amortized rebalancing steps

Proof. Truncate potential function: Nodes of rank < k have same potential Nodes of rank  k have zero potential (one exception for rank = k) Step of rank k reduces potential by: F k+1, or F k+1  F k  1 = F k At most (m  1)/F k such steps

Disadvantage of Ravl Trees? Tree height may be  (log n) Only happens when deletions/insertions ratio approaches 1, but may be concern for some apps Periodically rebuild tree

Periodic Rebuilding Rebuild tree (all at once or incrementally) when rank r of root too high Rebuild when r > log  n + c for fixed c > 0: O(1/(  c  1)) rebuilding time per deletion Tree height always log  n + O(1)

Summary Exponential analysis gives good worst-case properties of deletion without rebalancing – Logarithmic height bound in m – Exponentially infrequent node updates Periodic rebuilding keeps height logarithmic in n

Open problems Binary trees require  (loglog n) balance bits per node? Other applications of exponential analysis? – Average-case behavior

Teach rank-balanced trees and ravl trees!

Experiments

Preliminary Experiments Compared three trees with O(1) amortized rebalancing time – Red-black trees – Rank-balanced trees – Ravl trees Performance in practice depends on workload!

Preliminary Experiments 2 13 nodes, 2 26 operations No periodic rebuilding in ravl trees TestRed-black treesRank-balanced treesRavl trees # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen Random Queue Working set Static Zipf Dynamic Zipf

Preliminary Experiments rank-balanced: 8.2% more rots, 0.77% more bals ravl: 42% fewer rots, 35% fewer bals TestRed-black treesRank-balanced treesRavl trees # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen Random Queue Working set Static Zipf Dynamic Zipf

Preliminary Experiments rank-balanced: 0.87% shorter apl, 10% shorter mpl ravl: 5.6% longer apl, 4.3% longer mpl TestRed-black treesRank-balanced treesRavl trees # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen # rots  10 6 # bals  10 6 avg. pLen max. pLen Random Queue Working set Static Zipf Dynamic Zipf