Patricia Tries CMSC 420
Acronyms in Computer Science PATRICIA stands for Practical Algorithm to Retrieve Information Coded In Alphanumeric
Acronyms in Computer Science PATRICIA stands for Practical Algorithm to Retrieve Information Coded In Alphanumeric Pop quiz: What was Donald R. Morrison, inventor of PATRICIA tries back in 1968, smoking when he came up with this horrendous name? Weed Crack LSD Meth
Acronyms in Computer Science PATRICIA stands for Practical Algorithm to Retrieve Information Coded In Alphanumeric Pop quiz: What was Donald R. Morrison, inventor of PATRICIA tries back in 1968, smoking when he came up with this horrendous name? Speaking of acronyms, what does DEBIAN stand for? Weed Crack LSD Meth
Acronyms in Computer Science PATRICIA stands for Practical Algorithm to Retrieve Information Coded In Alphanumeric Pop quiz: What was Donald R. Morrison, inventor of PATRICIA tries back in 1968, smoking when he came up with this horrendous name? Speaking of acronyms, what does DEBIAN stand for? The late Ian Murdock, founder of the project, and his ex-wife, Deborah! Weed Crack LSD Meth
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Source: Google Image Search for «δημόσιος υπάλληλος», which is Greek for “state employee”.
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Look at this example of a trie with two long keys, and two not so long ones, but which do share a prefix: root a s k p r n o c i i o d f n e
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Look at this example of a trie with two long keys, and two not so long ones, but which do share a prefix: root a s k p The problem stems from the fact that every link is associated with a character… r n o c i i o d f n e
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Look at this example of a trie with two long keys, and two not so long ones, but which do share a prefix: root root ar a s But if we were somehow able to associate entire paths with a string, compressing links… k p c id spoon r knife n o c d i i o d f n e
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Look at this example of a trie with two long keys, and two not so long ones, but which do share a prefix: root root ar a s But if we were somehow able to associate entire paths with a string, compressing links… k p c id spoon wit r knife n o c d i i o Note: All non-key-containing nodes with just one child node have now vanished, and the paths traversing them have been compressed to a single link! d f n e
Problems with tries “Non-key” nodes with just one (non-null) child are about as useful as a Greek State Employee. Look at this example of a trie with two long keys, and two not so long ones, but which do share a prefix: root root ar a s But if we were somehow able to associate entire paths with a string, compressing links… k p c id spoon wit r knife n o c d i i o Note: All non-key-containing nodes with just one child node have now vanished, and the paths traversing them have been compressed to a single link! d f n So how can we associate links with entire strings? e
Patricia node structure We will add an integer index to every Patricia node. This index will give us the following information: “In terms of all the keys that have already been inserted in the trie, in which character of the input key should I be splitting into different paths?”
Patricia node structure We will add an integer index to every Patricia node. This index will give us the following information: “In terms of all the keys that have already been inserted in the trie, in which character of the input key should I be splitting into different paths?” Some trie-less examples: If our keys are just the words “tree” and “sun”, then a single root node with a disambiguating index of 0 will split into two nodes. If we add the strings “treemap” and “treeset”, we need another node that disambiguates at the 4th character (0-indexing employed)
In a trie… (alphabet: lowercase english characters) Conceptual Image Representation in memory root root a b … s t y z sun tree (node that contains “sun”) (node that contains “tree”) set map a b … s t y z a b … m s y z 2 3 (node that contains “treemap”) (node that contains “treeset”) a b … s t y z a b … s t y z 6 6
In a trie… (alphabet: lowercase english characters) Conceptual Image Representation in memory root root a b … s t y z sun tree (node that contains “sun”) (node that contains “tree”) set map a b … s t y z a b … m s y z 3 4 (node that contains “treemap”) (node that contains “treeset”) a b … s t y z a b … s t y z 6 6 Make sure you understand why the indices are such!
In a trie… (alphabet: lowercase english characters) Conceptual Image Representation in memory root root a b … s t y z sun tree (node that contains “sun”) (node that contains “tree”) set map a b … s t y z a b … m s y z 3 4 (node that contains “treemap”) (node that contains “treeset”) a b … s t y z a b … s t y z 6 6 Quiz: Is the root’s disambiguating index ever going to be greater than 0? Make sure you understand why the indices are such! Yes (example?) No (why?)
In a trie… (alphabet: lowercase english characters) Conceptual Image Representation in memory root root a b … s t y z sun tree (node that contains “sun”) (node that contains “tree”) set map a b … s t y z a b … m s y z 3 4 Root node of a Patricia trie always dummy! See examples on Piazza. (node that contains “treemap”) (node that contains “treeset”) a b … s t y z a b … s t y z 6 6 Quiz: Is the root’s disambiguating index ever going to be greater than 0? Make sure you understand why the indices are such! Yes (example?) No (why?)
Pop quiz Which keys are stored inside this Patricia Trie? root a b … s a b … s t y z a b … m s y z 3 4 a b … s t y z a b … s t y z 8 7
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! root NO CLUE! a b … s t y z a b … s t y z a b … m s y z 3 4 a b … s t y z a b … s t y z 8 7
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! root NO CLUE! a b … s t y z This could be storing “tree” or “test” This might store “sun” or “say”. Any string of the form s?? is fair game! a b … s t y z a b … m s y z 3 4 a b … s t y z a b … s t y z 8 7
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! End-of-string flag insufficient; we need actual references to the stored keys themselves! Which keys are stored inside this Patricia Trie? root NO CLUE! a b … s t y z This could be storing “tree” or “test” This might store “sun” or “say”. Any string of the form s?? is fair game! a b … s t y z a b … m s y z 3 4 a b … s t y z a b … s t y z 8 7
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! * * End-of-string flag insufficient; we need actual references to the stored keys themselves! Which keys are stored inside this Patricia Trie? root NO CLUE! x * a b … s t y z This could be storing “tree” or “test” This might store “sun” or “say”. Any string of the form s?? is fair game! * * a b … s t y z a b … m s y z 3 4 test sun * * a b … s t y z a b … s t y z 8 7 treeset testable
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! * * End-of-string flag insufficient; we need actual references to the stored keys themselves! Which keys are stored inside this Patricia Trie? root NO CLUE! x * a b … s t y z This could be storing “tree” or “test” This might store “sun” or “say”. Any string of the form s?? is fair game! * * a b … s t y z a b … m s y z 3 4 test sun * * a b … s t y z a b … s t y z 8 7 treeset testable Make sure you are convinced that these strings are all valid choices given our branchings!
Pop quiz Which keys are stored inside this Patricia Trie? NO CLUE! * * End-of-string flag insufficient; we need actual references to the stored keys themselves! Which keys are stored inside this Patricia Trie? root New check to see if a node has a key: keyRef!=null NO CLUE! x * a b … s t y z This could be storing “tree” or “test” This might store “sun” or “say”. Any string of the form s?? is fair game! * * a b … s t y z a b … m s y z 3 4 test sun * * a b … s t y z a b … s t y z 8 7 treeset testable Make sure you are convinced that these strings are all valid choices given our branchings!
Searching Searching a key in a Patricia Trie should make use of the disambiguating index of every node. This will speed up search considerably! Node’s ``disambiguating” (or ``splitting”) index named splitInd Suppose we are at a node 𝑛. We have some cases: n.splitInd < key.length(). Subcases: n.next[key.charAt(n.splitInd)]==null. Key not in tree! n.next[key.charAt(n.splitInd)]!=null. Recurse into that node and keep searching! n.splitInd == key.length(). Subcases: n.keyRef==null. This means that the key is not in the tree! n.keyRef!=null && n.keyRef.equals(key). This means that the key was found! n.keyRef!=null && !n.keyRef.equals(key). This means that the key was not found! n.splitInd > key.length(). This means that the key is not in the tree: it is a prefix of another stored key!
Let’s code it up! class ASCIIPatriciaTrie{ private class Node { Node[] next = new Node[128]; String keyRef; int splitInd; } private Node root; public String search(String key){ // returns the key itself or null Node keyContNode= search(root, key); return (keyContNode != null) ? keyContNode.keyRef : null; private Node search(Node n, String k){ /* You know what to do! Make sure you remember the different cases. */
Let’s code it up! class ASCIIPatriciaTrie{ private class Node { Node[] next = new Node[128]; String keyRef; int splitInd; } private Node root; public String search(String key){ // returns the key itself or null Node keyContNode= search(root, key); return (keyContNode != null) ? keyContNode.keyRef : null; private Node search(Node n, String k){ if(n.splitInd < key.length()){ Node appropriateChild = n.next[key.charAt(n.splitInd)]; return (appropriateChild != null) ? search(appropriateChild, key) : null; } else if (n.splitInd == key.length()){ return (n.stringRef != null) ? (n.stringRef.equals(k) ? n: null) : null; } else { return null; This could work! Ternary operator used just to conserve space; Jason does not necessarily recommend its widespread use.
Let’s code it up! class ASCIIPatriciaTrie{ private class Node { Node[] next = new Node[128]; String keyRef; int splitInd; } private Node root; public String search(String key){ // returns the key itself or null Node keyContNode= search(root, key); return (keyContNode != null) ? keyContNode.keyRef : null; private Node search(Node n, String k){ if(n.splitInd < key.length()){ Node appropriateChild = n.next[key.charAt(n.splitInd)]; return (appropriateChild != null) ? search(appropriateChild, key) : null; } else if (n.splitInd == key.length()){ return (stringRef != null) ? this : null; } else { return null; Main implementation difference with classic tries, where we just increment our counter by 1. Linear Probing, anyone? This could work! Ternary operator used just to conserve space; Jason does not necessarily recommend its widespread use.
Let’s code it up! IS THIS IMPLEMENTATION TAIL RECURSIVE? Yes No class ASCIIPatriciaTrie{ private class Node { Node[] next = new Node[128]; String keyRef; int splitInd; } private Node root; public String search(String key){ // returns the key itself or null Node keyContNode= search(root, key); return (keyContNode != null) ? keyContNode.keyRef : null; private Node search(Node n, String k){ if(n.splitInd < key.length()){ Node appropriateChild = n.next[key.charAt(n.splitInd)]; return (appropriateChild != null) ? search(appropriateChild, key) : null; } else if (n.splitInd == key.length()){ return (stringRef != null) ? this : null; } else { return null; IS THIS IMPLEMENTATION TAIL RECURSIVE? Yes No Main implementation difference with classic tries, where we just increment our counter by 1. Linear Probing, anyone? This could work! Ternary operator used just to conserve space; Jason does not necessarily recommend its widespread use.
Let’s code it up! IS THIS IMPLEMENTATION TAIL RECURSIVE? class ASCIIPatriciaTrie{ private class Node { Node[] next = new Node[128]; String keyRef; int splitInd; } private Node root; public String search(String key){ // returns the key itself or null Node keyContNode= search(root, key); return (keyContNode != null) ? keyContNode.keyRef : null; private Node search(Node n, String k){ if(n.splitInd < key.length()){ Node appropriateChild = n.next[key.charAt(n.splitInd)]; return (appropriateChild != null) ? search(appropriateChild, key) : null; } else if (n.splitInd == key.length()){ return (stringRef != null) ? this : null; } else { return null; IS THIS IMPLEMENTATION TAIL RECURSIVE? returns are last calls before end of execution, in spite of if/else if! Yes No Main implementation difference with classic tries, where we just increment our counter by 1. Linear Probing, anyone? This could work! Ternary operator used just to conserve space; Jason does not necessarily recommend its widespread use.
Let’s change our notation a bit Since your project involves Binary Patricia Tries, which use the minimal alphabet {0,1}, we can actually simplify our node structure as follows: private class BPTNode{ BPTNode left, right; // Instead of an array of size 2 int splitInd; String keyRef; } We will use this notation to simplify our subsequent examples. * 2 x Make point about neither numerical nor lexicographical order necessarily preserved in inorder traversal of a BPT…. or a PT or even general trie. Unless strings of the same length. * * * * 01010 5 1001 4
Insertion We have some cases to consider. Let’s begin with the easiest ones. Currently examined node is null Allocate new node with key. Store its length as your splitInd Return this to parent (parent will have made recursive insertion call either from left or right child)
Insertion Currently examined node is not null Length of key equal to splitInd No key stored in node Then simply set the key stored in node to your input key. (Different) Key stored in node Then, I have to split node into 3 nodes: Parent node, without a key and with splitInd equal to the length of the maximal common prefix between the node’s key and the insertion key. Two children nodes, one with the key of the old node and one with the key to be inserted. Which child will be left and which one will be right depends on the value of key.charAt(node.splitInd)!
Insertion Currently examined node is not null Length of key smaller than splitInd This means that our key is the prefix of at least one existing key in the trie! Node has a (different) key stored. Split the node into two nodes: Parent, which will store the new key, whose length is now splitInd. Child (left or right) with previously existing key and its own splitInd. The other child of the node will be null. Node does not have a key stored. (most expensive case) Recurse into children to find maximal common prefix with key to be inserted. This is necessary because we no longer have the string itself to easily compute the common prefix, as in case 2.a.ii! Make a new node without a key, whose splitInd will be the length of this maximal common prefix. Connect the new node with the node where the problem came up in the first place through either left or right link, depending on value of key.charAt(node.splitInd) The other link will be the node with our key!
Insertion Currently examined node is not null Length of key smaller than splitInd This means that our key is the prefix of at least one existing key in the trie! Node has a (different) key stored. Split the node into two nodes: Parent, which will store the new key, whose length is now splitInd. Child (left or right) with previously existing key and its own splitInd. The other child of the node will be null. Node does not have a key stored. (most expensive case) R ecurse into children to find maximal common prefix with key to be inserted. This is necessary because we no longer have the string itself to easily compute the common prefix, as in case 2.a.ii! Make a new node without a key, whose splitInd will be the length of this maximal common prefix. Connect the new node with the node where the problem came up in the first place through either left or right link, depending on value of key.charAt(node.splitInd) The other link will be the node with our key! Some people are interested in optimizing this by storing a bit in their node class that will give information about which link to use for connecting… While this wastes a bit of space, it’s A-ok with us!
Insertion Currently examined node is not null Length of key larger than splitInd Recurse appropriately, based on the value of key.charAt(node.splitInd)
Deletion Recall: In classic tries, we did not like blue nodes that only had one child. That’s why we came up with Patricia tries in the first place; we want to collapse those to improve search efficiency. Only complex case in deletion: If you delete a key from a node that only has a single child, that node is useless and you can erase it. This looks like it might contradict the “soft deletion” strategy that Jason suggested in class, but it really doesn’t; there is still at least one node with a key in the path towards the leaves! In fact, this strategy implements dynamic collapsing of freshly created blue nodes, which, as we’ve stated, make our search inefficient!
Deletion Once again, we have some cases. Current node is null. Then, this means that the key is not in the trie. We searched as much as we could for it, but we fell off the trie.
Deletion Current node is not null. key.length() < splitInd Then this means that the key is not in the trie, since all the possible keys that follow this path are guaranteed to be larger
Deletion Current node is not null. key.length() == splitInd Node.keyRef.key.compareTo(key) == 0 Key found! Make keyRef null If you have only one child, you are like one of those pesky classic trie blue nodes with only one child that we agreed we want to collapse with the Patricia trick; return your only child! Otherwise, if you have 0 or two children, nothing you need to do after setting keyRef to null. Node.keyRef.key.compareTo(key) != 0 Key not in the trie . We know this because traversing the trie further down can only lead to lengthier keys, and to get where we were, we followed pointers that examined chunks of our existing key to guide us. All other paths are invalid for our key; we are sure it’s not in the trie.
Deletion Current node is not null. key.length() > splitInd Have to recurse appropriately depending on the value of key.charAt(node.splitInd)!