Data Structures and Analysis (COMP 410) David Stotts Computer Science Department UNC Chapel Hill
Design Problem
Real Problem Type ahead Like on google search, phone typing… you type a few chars and the program fills in a list of possible choices for you… based on the prefix you have typed Keep typing more chars, the choices narrow and change Design a data structure that will let you do this Describe the time complexity of using it… searching it as typing is done, generating alternatives, etc.
Take some time Discuss an approach with your neighbor In 5-10 mins we will discuss ideas as a class
Let’s not use node to store a whole word Use child link to represent a char typed Path is then the word Basic idea t <root> n a a a e r o s e n tar to as an a w tea new
Basic idea… This tree encodes (stores) these words: tar, tan, tea, to, ton, toe, a, an, ant, as, net, nest, new, no t <root> n a a n e tan a o o r s n e no tar to as an a n w t s e t tea ton ant new t net toe nest
This has a name Trie Pronounced “try” or “tree”, both ways Or “trie tree” tree-tree, try-tree Comes from “ re TRIE val ” Used for prefix-based retrieval of strings formed over an alphabet
Representation How many children at each node? As many as there are chars you can type Let’s say 26 for this example node { string val = null; node[26] child = new [null,null,…,null]; boolean isWord = false; }
Representation node { string val = null; node[26] child = new [null,null,…,null]; boolean isWord = false; } val: isWord: false . . . child: 0 1 2 3 4 5 6 7 . . . 22 23 24 25
Representation . . . . . . . . . . . . val: isWord: false child: 0 1 2 3 4 5 6 7 . . . 22 23 24 25 val: “b” child: isWord: false . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25 val: “a” child: isWord: true . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25 val: “be” child: isWord: true . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25
Representation . . . . . . . . . . . . val: isWord: false child: b a 0 1 2 3 4 5 6 7 . . . 22 23 24 25 a b e be <root> val: “b” child: isWord: false . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25 val: “a” child: isWord: true . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25 val: “be” child: isWord: true . . . 0 1 2 3 4 5 6 7 . . . 22 23 24 25
Analysis Big Oh time complexity is always expressed in terms of some problem size Here the problem size is not the number of words encoded in the tree, like we say for BST Rather we choose M, the length of a word being inserted or searched for
Analysis The worst case time needed to find a word of length M is… O(M) This is true if the tree contains 10 words or 10 million words Length of the longest path in the tree is length of the longest word stored in the tree
Analysis If a word of length M can be made from N different characters (like 26 in the alphabet) then the number of possible nodes in the data structure is M^N A trie to store words 20 character long in an alphabet of 52 chars (upper and lower) is 20^52
Analysis Note that if we store 26 character words and limit us to lower case we get 26^26 possible nodes… This is slightly worse than 26 ! 26 * 26 * 26 * … * 26 Is worse than 26 * 25 * 24 * … * 2 * 1
Analysis How bad is N! ? Lets compare let N = 20 2^N is 2^20 is about a million N! is 20! is 2.432902e+18 2,432,902,000,000,000,000 2,432,902,000,000 * a million 2.4 trillion millions
So what? A trie made to hold 20 character words… Made from 20 lower case characters Worst case find operation is O(20) or O(N) Worst case space… O(N!) So -- its very fast to use -- Impossible (very impractical) to build in time and space
Beyond this is just templates END Beyond this is just templates