Contents What is a trie? When to use tries

Slides:



Advertisements
Similar presentations
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Binary Trees. DCS – SWC 2 Binary Trees Sets and Maps in Java are also available in tree-based implementations A Tree is – in this context – a data structure.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
296.3: Algorithms in the Real World
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
The Trie Data Structure Basic definition: a recursive tree structure that uses the digital decomposition of strings to represent a set of strings for searching.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Binary Search Trees Briana B. Morrison Adapted from Alan Eugenio.
Binary Trees Terminology A graph G = is a collection of nodes and edges. An edge (v 1,v 2 ) is a pair of vertices that are directly connected. A path,
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
BST Data Structure A BST node contains: A BST contains
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Chapter 08 Binary Trees and Binary Search Trees © John Urrutia 2013, All Rights Reserved.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
§6 B+ Trees 【 Definition 】 A B+ tree of order M is a tree with the following structural properties: (1) The root is either a leaf or has between 2 and.
Lecture Objectives  To learn how to use a tree to represent a hierarchical organization of information  To learn how to use recursion to process trees.
Recursion Bryce Boe 2013/11/18 CS24, Fall Outline Wednesday Recap Lab 7 Iterative Solution Recursion Binary Tree Traversals Lab 7 Recursive Solution.
1 Trees Tree nomenclature Implementation strategies Traversals –Depth-first –Breadth-first Implementing binary search trees.
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
Information and Computer Sciences University of Hawaii, Manoa
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Binary Trees Chapter Definition And Application Of Binary Trees Binary tree: a nonlinear linked list in which each node may point to 0, 1, or two.
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
data ordered along paths from root to leaf
© M. Gross, ETH Zürich, 2014 Informatik I für D-MAVT (FS 2014) Exercise 12 – Data Structures – Trees Sorting Algorithms.
Binary Search Trees Nilanjan Banerjee. 2 Goal of today’s lecture Learn about Binary Search Trees Discuss the first midterm.
CS 206 Introduction to Computer Science II 10 / 05 / 2009 Instructor: Michael Eckmann.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Compsci 201 Recitation 10 Professor Peck Jimmy Wei 11/1/2013.
Binary Search Trees (BST)
TREES K. Birman’s and G. Bebis’s Slides. Tree Overview 2  Tree: recursive data structure (similar to list)  Each cell may have zero or more successors.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
CompSci 100e 8.1 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted Vector/array Sorted vector/array.
Copyright © 2012 Pearson Education, Inc. Chapter 20: Binary Trees.
Copyright © 2015, 2012, 2009 Pearson Education, Inc., Publishing as Addison-Wesley All rights reserved. Chapter 20: Binary Trees.
Week 15 – Friday.  What did we talk about last time?  Student questions  Review up to Exam 2  Recursion  Binary trees  Heaps  Tries  B-trees.
Week 14 - Wednesday.  What did we talk about last time?  Heapsort  Timsort  Counting sort  Radix sort.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20: Binary Trees.
(c) University of Washington20-1 CSC 143 Java Trees.
Search Radix search trie (RST) R-way trie (RT) De la Briandias trie (DLB)
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
15-853:Algorithms in the Real World
Data Structures and Analysis (COMP 410)
Recursive Objects (Part 4)
Mark Redekopp David Kempe
Digital Search Trees & Binary Tries
Binary Search Trees Why this is a useful data structure. Terminology
Binary Trees, Binary Search Trees
Chapter 20: Binary Trees.
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
Chapter 21: Binary Trees.
Find in a linked list? first last 7  4  3  8 NULL
Trees Lecture 9 CS2110 – Fall 2009.
Digital Search Trees & Binary Tries
Data Structures and Analysis (COMP 410)
Tries 2/27/2019 5:37 PM Tries Tries.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Trees Lecture 10 CS2110 – Spring 2013.
Presentation transcript:

Honors Track: Competitive Programming & Problem Solving Tries Frank Maurix

Contents What is a trie? When to use tries Implementation and some operations Alternatives for implementation Compression Suffix tree

What is a trie? Data Structure Digital tree, radix tree, prefix tree Stores set of strings (dictionary) Characters as nodes Position reflects prefix represented

When to use tries Pros O(L) and O(L*A) operations Form of radix sort x for all words with same prefix Suffix tree Cons O(N*A) space complexity Horrible for floating point numbers Not a standard library N = Number of nodes L = Length of the word A = Size of the alphabet

Implementation Keep track of root Array of children Store the number of children Alphabet = A, B,…, Z Uppercase only Change character into value 0,…,25 int c = someChar – 'A'; // - 'a' for lowercase

Implementation import java.util.*; public class ScaryProblem { TrieNode root; //Root of the trie void solve() { root = new TrieNode(null, false, null); //here is the place where you should do some magic with tries } public static void main(String[] args) { new ScaryProblem().solve(); class TrieNode { TrieNode[] children = new TrieNode[26]; Character ch; //last char of prefix, null for root TrieNode parent; //pointer to parent boolean inDictionary; //Prefix of this node in the dictionary? int nOC = 0; //Number of children TrieNode(Character ch, boolean used, TrieNode newParent) {...}

Operations Searching Insertion Word deletion Prefix deletion Retrieving in sorted order

Searching char[] word = {'N', 'A', 'S', 'A'}; root.search(word, 0); //Alternative: word as String and use word.charAt(index) class TrieNode { TrieNode search(char[] word, int index) { //index should be 0 on initial call if (index == word.length - 1) { //Node found or final node doesn’t exist return children[word[index] - 'A']; } else if (children[word[index] - 'A'] == null) { //Node doesn't exist return null; } else { //Keep searching return children[word[index] - 'A'].search(word, index + 1); }

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 1: inserting “SPACES”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 2: inserting “NSA”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 2: inserting “NSA”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 2: inserting “NSA”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 2: inserting “NSA”

Insertion Inserting nodes may be necessary, but doesn't need to be Example 2: inserting “NSA”

Insertion char[] word = {'N', 'S', 'A'}; root.insert(word, 0); class TrieNode { void insert(char[] word, int index) { //index 0 on initial call if (children[word[index]-'A'] == null) { //Next node doesn’t exist nOC++; if (index == word.length - 1) { children[word[index]-'A'] = new TrieNode(word[index], true, this); } else { children[word[index]-'A'] = new TrieNode(word[index], false, this); children[word[index]-'A'].insert(word, index + 1); } } else if (index == word.length - 1) { children[word[index] - 'A'].inDictionary = true; children[word[index] - 'A'].insert(word, index + 1);

Deletion Search for corresponding node Set inDictionary for corresponding node to false If the node isn’t a leaf, you’re done Else, one or more nodes can be removed Removing the nodes Don’t delete the root If the current node is a leaf and not in the dictionary Remove the node Recursive call to the parent and repeat

Deletion Example 1: deleting “SPACES”

Deletion Example 1: deleting “SPACES”

Deletion Example 1: deleting “SPACES”

Deletion Example 2: deleting “NSA”

Deletion Example 2: deleting “NSA”

Deletion Example 2: deleting “NSA”

Deletion Example 2: deleting “NSA”

Deletion Example 2: deleting “NSA”

Deletion void removeWord(char[] word) { TrieNode result = root.search(word, 0); if (result == null) { //word not in trie } else if (result.nOC == 0) { //node is a leaf result.trieCleanup(); } else { //node isn’t a leaf result.inDictionary = false; } class TrieNode { void trieCleanup() { //Delete current node & check if parent should be deleted if (parent != null) { //Never delete the root parent.children[ch - 'A'] = null; parent.nOC--; if (parent.nOC == 0 && !parent.inDictionary) { parent.trieCleanup();

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion Example: deleting all words with prefix ‘SPA’

Prefix deletion void removePrefix(char[] word) { TrieNode result = root.search(word, 0); if (result != null) { result.trieCleanup(); }

Retrieving in alphabetical order Pre-order tree traversal Only report prefixes in dictionary traverse(root); void traverse(TrieNode node) { if (node.inDictionary) { report(node); //Or other fancy stuff } for (int i = 0; i < 26; i++) { if (node.children[i] != null) { traverse(node.children[i]); To get prefix without too much extra time: When inserting, store prefix only for the node you insert (to stay in O(L)/O(A*L))

Alphabetic successor of a node If node isn’t a leaf Find minimum on the subtree rooted at node Stop as soon as you find a word in the dictionary Else If parent has a node with index higher than index of the current, then find minimum like before on that node Else repeat for parent

Alternatives for storing children A is the size of the alphabet L is the length of the word N is the number of nodes Operation Array HashMap LinkedList Insertion O(A . L) O(L)* O(A . L) sorted; O(L) unsorted Deletion O(L) O(L) (search already done) Search Trie traversal O(N)** O(N)** sorted, O(A . log(A) . N)** unsorted * : assuming simple uniform hashing * Expected O(L) time. Worst case O(A . L) ** Assuming report function takes O(1) time

Array vs HashMap vs LinkedList + Simple + Small constants + Good for simple alphabet - A pain with complex alphabet - Always a lot of space + Works with complex alphabet + Fast expected time - Worst case still slow - Worst case more space than array - Constants worse than array + Low space usage + Works with complex alphabet + Small constants - Very slow

Compression Merge nodes Adapt operations appropriately Why compress: less data usage! How to adapt search?

Compression After deletion, more compression may be possible Insertion after compression: Possibly split a node, insert and compress

Suffix tree Take all suffixes of a word Insert all into a trie Offers many fast string operations Worst case O(L2) nodes All suffixes of BANANA BANANA ANANA NANA ANA NA A

Applications of suffix trees Number of occurrences of a pattern in a text Search for the pattern, only consider that subtree Result = number of nodes in that subtree with inDictionary = true Longest Common Substring of two strings Insert both strings into a suffix tree For each node, store which strings represent them Find deepest node represented by both strings

Questions