String Abstract Data Type Timothy J. Ham
What is it? Quite literally, the string abstract data type is "an array of [characters]. The array consists of the string characters followed by the null character which signals the end of the string." (Textbook, p104, 2.6 para 3)
Where is it found? Strings are used in a number of places: Page / window titles Web page / document titles Web page / document body text
What can I do with it? One of the most common operations performed on strings is the search. There are a number of places in which this operation is employed: Operating systems->File names Word processors->Text documents Web pages->Body text **Anyone ever use the "Find" feature in Firefox that searches as you type? Databases->Miscellaneous data
How does this occur? The searches are performed using one of several algorithms; this presentation will cover three of many different string search algorithms: – Exhaustive - easy to code, but very inefficient – Aho-Corasick - used in UNIX command fgrep, predecessor to modern UNIX grep – Knuth-Morris-Pratt - much more efficient than simple algorithm
How will we test this? The environment for testing these algorithms will feature a string to search: – “I hope I can find that for which I seek.” The pattern for which we will search is: – “that for which I seek” In addition to providing the theoretical run time, we will measure the actual time (in milliseconds) required to run the algorithm
Simple / Exhaustive
Easy to code; largely inefficient Theoretical run time Actual run time
Aho-Corasick
Used in UNIX command fgrep, predecessor to modern-day UNIX command grep Theoretical run time
Knuth-Morris-Pratt
Theoretical run time