Fast Trie Data Structures

Slides:



Advertisements
Similar presentations
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Advertisements

Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Balanced Binary Search Trees
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
Chapter 6: Transform and Conquer Trees, Red-Black Trees The Design and Analysis of Algorithms.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Analysis of Algorithms CS 477/677
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
COMP20010: Algorithms and Imperative Programming Lecture 4 Ordered Dictionaries and Binary Search Trees AVL Trees.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Analysis of Algorithms CS 477/677
Starting at Binary Trees
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
CSC317 1 Binary Search Trees (binary because branches either to left or to right) Operations: search min max predecessor successor. Costs? Time O(h) with.
Lecture 23 Red Black Tree Chapter 10 of textbook
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
Data Structures and Design in Java © Rick Mercer
Binary search trees Definition
BCA-II Data Structure Using C
Data Structures – LECTURE Balanced trees
Multiway Search Trees Data may not fit into main memory
UNIT III TREES.
Data Structures Using C++
B+-Trees.
B+-Trees.
B+-Trees.
AVL Tree.
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Binary Trees, Binary Search Trees
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Orthogonal Range Searching and Kd-Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Wednesday, April 18, 2018 Announcements… For Today…
Randomized Algorithms CS648
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Multi-Way Search Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B- Trees D. Frey with apologies to Tom Anastasio
Lectures on Graph Algorithms: searching, testing and sorting
B+-Trees (Part 1).
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
(2,4) Trees (2,4) Trees (2,4) Trees.
Algorithms and Data Structures Lecture VIII
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
B- Trees D. Frey with apologies to Tom Anastasio
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Binary Trees, Binary Search Trees
(2,4) Trees (2,4) Trees (2,4) Trees.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
Lecture 10 Oct 1, 2012 Complete BST deletion Height-balanced BST
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
Richard Anderson Spring 2016
B-Trees.
1 Lecture 13 CS2013.
Binary Trees, Binary Search Trees
Binary Search Trees < > = Dictionaries
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

Fast Trie Data Structures Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “New Trie Data Structures Which Support Very Fast Search Operations”

Agenda Problem statement Existing solutions and motivation for a new one P-Fast tries & their complexity Q-Fast tries & their complexity X-Fast tries & their complexity Y-Fast tries & their complexity JK 11/16/2018

Problem statement Let S be a set of N records with distinct integer keys in range [0, M], with the following operations: MEMBER(K) – does the key K belong to the set SUCCESSOR(K) – find the least element which is greater than K PREDECESSOR(K) – find the greatest element which is less than K SUBSET(K1, K2) – produce a list of elements whose keys lie between K1 and K2 The problem: efficient data structure supporting this definition JK 11/16/2018

Existing solutions AVL trees, 2-3 trees use O(N) space and O(log N) time in worst case With no restriction on the keys better performance is impossible Expected O(log log N) time is possible when keys are uniformly distributed Stratified trees use O(M * log log M) space and O(log log M) time in worst case for integer keys in range [0, M] Disadvantage: O(M * log log M) space is much larger when O(N), if M >> N JK 11/16/2018

Motivation for another solution More space-efficient data structure is wanted for restricted keys, which still maintains the time efficiency… JK 11/16/2018

The way to the solution We first define P-Fast Trie: O( ) time; O(N * * 2 ) space Then show Q-Fast Trie improvement to the space requirement to O(N) Then show X-Fast Trie O(log log M) time; O(N*log M) space; no dynamic operations Then show Y-Fast Trie O(log log M) time; O(N) space; no dynamic operations JK 11/16/2018

What’s Trie Trie of size (h, b) is a tree of height h and branching factor b All keys can be regarded as integers in range [0, bh] Each key K can be represented as h-digit number in base b: K1K2K3…Kh Keys are stored in the leaf level; path from the root resembles decomposition of the keys to digits root 20 22 24 31 32 42 43 2 3 4 1 JK 11/16/2018

Trivial Trie In each node store vector of branches MEMBER(K) – O(h) visits O(h) nodes, spends O(1) time in each SUCCESSOR(K)/PREDECESSOR(K) – O(h*b) visits O(h) nodes, spend O(b) time in each node this is too much time Observation: increasing b (the base of key representation, the branching factor) decreases h (number of digits required to represent a key, the height of the tree) and vice versa JK 11/16/2018

Example for worst case complexity root bh-1 b-1 JK 11/16/2018

P-Fast Trie Idea Improve SUCCESSOR(k)/PREDECESSOR(k) time by overcoming the linear search in every intermediate node JK 11/16/2018

P-Fast Trie Each internal node v has additional fields: LOWKEY(v) – leaf node containing the smallest key descending from v HIGHKEY(v) – leaf node containing the largest key descending from v INNERTREE(v) – binary tree of worst-case height O(log b) representing the set of digits directly descending from v Each leaf node points to its immediate neighbors on the left and on the right CLOSEMATCH(K) – query returning the node with key K if it exists in the trie; returning PREDECESSOR(K) or SUCCESSOR(K) otherwise JK 11/16/2018

CLOSEMATCH(k) Algorithm Intuitively Starting from Root, look for k=k1k2..kh If found, return it If not, then v is the node at depth j from which there’s no way down any more: kj Ï INNERTREE(v) Looking for kj in INNERTREE(v), find D – existing digit in INNERTREE(v) that is either: the least digit greater than kj the greatest digit less than kj If D > kj, then return LOWKEY(d’s child of v), else if D < kj, then return HIGHKEY(d’s child of v) JK 11/16/2018

P-Fast Trie Complexities CLOSEMATCH(K) time complexity is O(h + log b) Other queries require O(1) addition to the CLOSEMATCH(K) complexity Space complexity of such trie is O(h*b*N) Representing the input keys in base 2 requires digits, therefore with such h and b the desired complexities are achieved JK 11/16/2018

Q-Fast Trie Idea Improve space by splitting the set of keys into subsets How to split is the problem: To preserve the time complexity To decrease the space complexity JK 11/16/2018

Q-Fast Trie Let S’ denote the ordered list of keys from S: Define: 0 = K1 < K2 < K3 < … < KL < M Define: Si = {K Î S | Ki £ K £ Ki+1} for i < L SL = {K Î S | K ³ KL} S’ is a c-partition of S iff each Si has cardinality in range [c, 2c-1] Q-Fast Trie of size (h, b, c) is a two-level structure: Upper part: p-fast trie T of size (h, b) representing set S’ which is a c-partition of S Lower part: forest of 2-3 trees, where ith tree represents Si The leafs of 2-3 trees are connected to form an ordered list JK 11/16/2018

Example of Q-Fast Trie 71 35 10 17 33 70 77 81 95 99 JK 11/16/2018

CLOSEMATCH(k) Algorithm Intuitively Look for D=PREDECESSOR(k) in the upper part O(h + log b) Then search the D’s 2-3 tree for k O(log c) JK 11/16/2018

Q-Fast Trie Complexities CLOSEMATCH(K) time complexity is O(h + log b + log c) Other queries require O(1) addition to the CLOSEMATCH(K) complexity Space complexity is O(N+N*h*b/c) By choosing h = , b = 2 , c = h*b, the desired complexities are achieved JK 11/16/2018

P/Q-Fast Trie Insertion/Deletion P-fast trie Use AVL trees for INNERTREEs O(h + log b) for insertion/deletion Q-fast trie O(h + log b + log c) for insertion/deletion Maintenance of c-partition property through trees splitting/merging in O(log c) time JK 11/16/2018

X-Fast Trie Idea P/Q-Fast trie uses top-down search to get to the wanted level, making binary search in each node on the way. Thus, P/Q-Fast Trie relies on the balance between the height of the tree and the branching factor X-Fast trie idea: Use binary search of the wanted level Requires to be possible to find the wanted node by knowing its level without top-down pass For the purpose of worst case complexity the branching factor is not important any more, since it only affects the basis of the log JK 11/16/2018

X-Fast Trie Part 1: Trie of height h and branching factor 2 (representing all keys in binary) Each node has additional field DESCENDANT(v): If v has only right branch, it points to the largest leaf descending from v (thru the left branch) If v has only left branch, it points to the smallest leaf descending from v (thru the right branch) All leaves form doubly-linked list Node v at height j may have descending leaves only in range [(i-1)*2j+1, i*2j] for some integer i; this i is called ID(v) Node v at height j is called ancestor of key K, if K/2j=ID(v) BOTTOM(k) is the lowest ancestor of K JK 11/16/2018

X-Fast Trie Part 2: h+1 Level Search Structures (LSS), each of which uses perfect hashing as we have seen in the first lecture: Linear space & constant time JK 11/16/2018

BOTTOM(k) Algorithm Intuitively Make binary search among the h+1 different LSSs Searching each LSS is O(1) h = log M, therefore binary search of h+1 LSSs is O(log log M) JK 11/16/2018

X-Fast Trie Complexities BOTTOM(k) is O(log log M) All queries require O(1) addition to BOTTOM(k), with assistance of the DESCENDANT field and the doubly-linked list: BOTTOM(K) is either K itself, or its DESCENDANT is PREDECESSOR(K)/SUCCESSOR(K) Space is O(N * log M) No more than h * N nodes in the trie (h=log M) log M LSSs each using O(N) space JK 11/16/2018

Y-Fast Trie Idea Apply similar partitioning technique, as done for P-Fast trie to move to Q-Fast trie: c-partitioning of all the keys to L subsets each containing [c, 2c-1] keys Upper part: X-Fast trie representing S’ Lower part: forest of binary trees of height log c JK 11/16/2018

Y-Fast Trie Complexities Upper part can be searched within O(log log M) time and occupies no more than O((N/c) * log M) space Each binary tree can be searched within O(log c) and they all together occupy O(N) space Choosing c=log M: O(N) space; O(log log M) time JK 11/16/2018

X/Y-Fast Trie Insertion/Deletion LSSs have practically uncontrolled time complexity for dynamic operations At least at the time the article was presented Therefore, X/Y-Fast tries inherit this limitation JK 11/16/2018