Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick
Radix-based Keys Key has multiple parts Each part is an element of some set Character Numeral Key parts can be accessed (e.g., string s[i]) Size of set is radix
Advantages of Radix-based Search Good worst-case performance Simpler than balanced trees, etc. Fast access to data Easy way to handle variable-length keys Save space (part of key in structure)
Disadvantages of Radix-based Search May be space-inefficient Performance depends on access to bytes of keys Must have distinct keys, or other way to handle duplicate keys
Digital Search Trees Similar to binary search trees Difference is that we use bits of the key to determine subtree to search Path in tree = prefix of key
Digital Search Trees Insert A-S-E-R-C-H-I-N-G Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G00111 A S 1 E R 10 C 1 0 H 10 I 10 N 10 G 10 Note that binary tree is not sorted in BST sense
Digital Search Trees Prop 15.1: A search or insertion into a DST takes about lg N comparisons on average, and about 2 lg N comparisons in the worst case, in a tree built from N keys. The number of comparisons is never more than the number of bits in the search key.
Tries Use bits of key to guide search like DST But keep keys in order like BST Allow recursive sort, etc. Pronounced “try-ee” or “try” Keys kept at leaves of a binary tree
Tries Defn. 15.1: A trie is a binary tree that has keys associated with each leaf, defined as follows: a trie for an empty set is a null link a trie for a single key is a leaf w/key a trie for > 1 key is an internal node with left link referring to trie for keys that start with 0, right for keys 1xxx
Tries Insert A-S-E-R-C-H-I-N-G Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G00111 A S 1 A 0 Construct tree to point where prefixes match
Tries Insert A-S-E-R-C-H-I-N-G Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G00111 A 10 AE RS S 10 A Construct tree to point where prefixes match
Tries Insert A-S-E-R-C-H-I-N-G Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G A RS A 10 C E H
Tries Insert A-S-E-R-C-H-I-N- G Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G RS A 10 C E H HI
Tries Prop. 15.2: The structure of a trie is independent of key insertion order; there is one unique trie for any given set of distinct keys. Prop. 15.3: Insertion or search for a random key in a trie built from N random keys takes about lg N bit comparisons on average, in the worst case, bounded by bits in key
Tries Annoying feature of tries: One-way branching when keys have common prefix Prop. 15.4: A trie built from N random w-bit keys has about N/lg 2 nodes on the average (about 1.44 N)
Patricia Tries Annoying feature of tries: One-way branching when keys have common prefix Two different types of nodes in trie Patricia tries: fix both of these Practical Algorithm To Retrieve Information Coded In Alphanumeric
Patricia Tries Avoid one-way branching: Keep at each node the index of the next bit to test Skip over common prefix! Avoid two types of nodes: Store data in internal nodes Replace external links with back links
Patricia Tries S R 4 H 0 1 E 2 3 C 4 A Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G00111
Patricia Tries S R 4 H 0 1 E 2 3 C 4 A Key Repr A00001 S10011 E00101 R10010 C00011 H01000 I01001 N01110 G00111
Patricia Tries Prop 15.5: Insertion or search in a patricia trie built from N random bitstrings takes about lg N bit comparisons on average, and about 2 lg N in the worst case, but never more than the length of the key.
Map Radix search Digital Search Trees Tries Patricia Tries Multiway tries and TSTs Text string algorithms
Multiway Tries Like radix sort, can get benefit from comparing more than one bit at a time Compare r bits, speed up search by a factor of r What could possibly be bad? Number of links is now R=2 r Can waste a lot of space!
Multiway Tries Structure is (almost) the same as binary tries Except there are R branches Search: start at root, leftmost digit Follow i th link if next R-ary digit is i If null link, then miss If reach leaf, it contains only key with prefix matching path to it - compare
Existence Tries Only keys, no records Insert/search Defn. 15.2: The existence trie for a set of keys is: Empty set: null link Non-empty set: internal node with links for each possible digit to tries built with the leading digit omitted
Existence Tries Convenient to return null on miss, dummy record on hit Convenient to have no duplicate keys and no key a prefix of another key Keys of fixed length, or Use termination character with value NULLdigit, only used as sentinel
Existence Tries No need to store any data All keys captured in trie structure If reach NULLdigit at the same time we run out of key digits, search hit Otherwise, search miss Insert: search until find null link, then add nodes for each of the remaining digits in the key
Existence Tries t the time is now for n h e i i m e s o w f o r
Multi-way Tries R-ary branching Keys stored at leaves Path to leaf defines prefix of key stored at leaf Only build tree downward until prefixes become distinct
Multi-way Tries Defn. 15.3: The multiway trie for a set of keys associated with leaves is: Set empty: null link Singleton set: leaf with key Larger set: internal node with links for each possible digit to tries built with the leading digit omitted
Multi-way Tries Def
Summary Radix search Digital Search Trees Tries Patricia Tries Multiway tries and TSTs Text string algorithms