Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,

Slides:



Advertisements
Similar presentations
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Advertisements

CS50 WEEK 6 Kenny Yu.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Dictionaries CS 105. L11: Dictionaries Slide 2 Definition The Dictionary Data Structure structure that facilitates searching objects are stored with search.
Dictionaries CS /02/05 L7: Dictionaries Slide 2 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Compsci 201 Recitation 10 Professor Peck Jimmy Wei 11/1/2013.
1 Lexicographic Search:Tries All of the searching methods we have seen so far compare entire keys during the search Idea: Why not consider a key to be.
Dictionaries CS 110: Data Structures and Algorithms First Semester,
Contents What is a trie? When to use tries
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
CSE332: Data Abstractions Lecture 7: AVL Trees
Searching, Maps,Tries (hashing)
Winter 2009 Tutorial #6 Arrays Part 2, Structures, Debugger
Data Structures and Analysis (COMP 410)
Pointers and Linked Lists
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
Pointers and Linked Lists
Week 7 - Friday CS221.
Higher Order Tries Key = Social Security Number.
May 17th – Comparison Sorts
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Hash table CSC317 We have elements with key and satellite data
Big-O notation.
Mark Redekopp David Kempe
Data Structures Interview / VIVA Questions and Answers
Cinda Heeren / Geoffrey Tien
Cse 373 April 24th – Hashing.
Disjoint Sets Chapter 8.
Week 7 Part 2 Kyle Dewey.
COSC160: Data Structures Linked Lists
Search by Hashing.
Radix search trie (RST) R-way trie (RT) De la Briandias trie (DLB)
Hash Tables Part II: Using Buckets
Strings: Tries, Suffix Trees
Prof. Neary Adapted from slides by Dr. Katherine Gibson
Introduction to Linked Lists
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Design and Analysis of Algorithms
Circular Buffers, Linked Lists
Simplifying Flow of Control
Searching.
Lesson Objectives Aims
Multi-Way Search Trees
B- Trees D. Frey with apologies to Tom Anastasio
CSE 373: Data Structures and Algorithms
B- Trees D. Frey with apologies to Tom Anastasio
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Higher Order Tries Key = Social Security Number.
Data Structures and Analysis (COMP 410)
A Simple Two-Pass Assembler
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
B- Trees D. Frey with apologies to Tom Anastasio
Advanced Implementation of Tables
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
How to use hash tables to solve olympiad problems
Data Structures Unsorted Arrays
Lists.
Data Structures & Algorithms
Strings: Tries, Suffix Trees
Richard Anderson Spring 2016
Hash Maps Introduction
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
CSE 326: Data Structures Lecture #14
Lecture-Hashing.
Presentation transcript:

Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes, the nodes in a trie are arrays. We might use a trie to store a dictionary of words, as this diagram suggests. In this trie, each index in the array stands for a letter of the alphabet. Each of those indices also points to another array of letters. In the above, we’re storing the names of famous scientists, e.g. Mendel, Mendeleev, Pavlov, Pasteur, and Turing. The Δ symbol denotes the end of a name. We have to keep track of where words end so that if one word actually contains another word (e.g. Mendeleev and Mendel), we know that both words exist. In code, the Δ symbol could be a Boolean flag in each node.

// marker for end of word bool is_word; // pointers to other nodes typedef struct node { // marker for end of word bool is_word; // pointers to other nodes struct node* children[27]; } node; Here’s a definition of a node. Note that we’re not actually storing words in the trie, we’re simply storing a series of pointers and a Boolean value.

// pointers to other nodes struct node* children[26]; }; is_word struct node { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o This simple trie contains two words: “bat” and “zoom”. Let’s walk through an example of looking up the key “bat” in this trie. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o We’ll begin our lookup at the root node. We’ll take the first letter of our key, “b,” and find the corresponding spot in our children array. We’ll look at the second index -- index 1 -- for “b”. In general, if we have an alphabetic character c, we could determine the corresponding spot in the children array using a calculation like this: int index = tolower(c) - ‘a’; In this case, the pointer in our children array at index 1 is not NULL, so we’ll continue looking up the key “bat”. If we ever encountered a NULL pointer at the proper spot in the children array while looking up a key, we’d have to say that we couldn’t find anything for that key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Now we’ll take the second letter of our key, “a”, and continue following pointers in this way, until we reach the end of our key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Now we’ll take the third letter of our key, “t”, and continue following pointers in this way, until we reach the end of our key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o If we reach the end of the key without hitting any dead ends (NULL pointers), as is the case here, then we only have to check one more thing: was that key actually stored in the trie? In our diagram, this is represented by a check mark signifying that bool is_word is true. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Note how the key “zoo” is not in the dictionary, even though we could reach the end of this key without hitting a dead end? m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Similarly, if we tried to look up the key “bath”, the second-to-last node’s array index corresponding to the letter H would have held a NULL pointer, so “bath” isn’t in the dictionary. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o So how do we insert something into a trie? Let’s insert “zoo”. In some ways, the process to insert something is similar to looking something up. We’ll start with the root node again, following pointers corresponding to the letters of our key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Luckily, we’re able to follow pointers all the way until we reach the end of the key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Luckily, we’re able to follow pointers all the way until we reach the end of the key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Since “zoo” is a prefix of the word “zoom”, which is stored in our dictionary, we don’t need to allocate any new nodes. We can just set is_word to true at the appropriate node. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o If we wanted to insert the key “bath” into the trie, we’d start at the root node and follow the pointers again. In this situation, we hit a dead end before we’re able to get to the end of the key. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o So, we’ll need to allocate one new node for each remaining letter of our key. In this case, we’ll just need to allocate one new node. m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o Then we’ll need to make the h index of the previous node reference this new node. h m

// pointers to other nodes struct node* children[26]; }; { /* data */ // pointers to other nodes struct node* children[26]; }; b z a o t o And lastly, we set the new node’s is_word bool to true. So, if we assume that key length is bounded by a fixed constant, both insertion and lookup are constant time operations for a trie. Notice that the number of items stored in the trie doesn’t affect the lookup or insertion time; it’s only impacted by the length of the key. By contrast, adding entries to a hash table tends to make future lookups slower. While this may sound appealing at first, we should keep in mind that a favorable asymptotic complexity doesn’t mean that in practice, the data structure is necessarily beyond reproach. We must also consider that to store a word in a trie, we need, in the worst case, a number of nodes proportional to the length of the word itself. Tries tend to use a lot of space. That’s in contrast to a data structure like a hash table, where we only need one new node to store some key-value pair. h m