CSC 212 – Data Structures Lecture 34: Strings and Pattern Matching
Problem of the Day You drive a bus from Rotterdam to Delft. At the 1 st stop, 33 people get in. At the 2 nd stop, 7 more people get in, and 11 passengers leave. The 3 rd stop, sees 5 people leave and 2 get in. After one hour, the bus arrives in Delft. What is the name of the driver? Read the question: You are the driver!
Strings Algorithmically, String is just sequence of concatenated data: “CSC212 STUDENTS IN DA HOUSE” “I can’t believe this is a String!” Java programs HTML documents Digitized image DNA sequences
Strings In Java Java Strings are immutable Java maintains a Map of text to String objects Each time String created, Map is checked If text exists, Java uses the String object to which it is mapped Otherwise, makes a new String & adds text and object to Map Happens “under the hood” Make String work like a primitive type Also makes it cheap to do lots of text processing
String Terminology String drawn from elements in an alphabet ASCII or Unicode Bits Pixels DNA bases Substring P[i... j] contains characters from P[i] through P[j] Substrings starting at rank 0 called a prefix Substrings ending with string’s last rank is suffix
Suffixes and Prefixes “I am the Lizard King!” PrefixesSuffixes I I I a I am … I am the Lizard Kin I am the Lizard King I am the Lizard King! ! g! ng! ing! … am the Lizard King! am the Lizard King! I am the Lizard King!
Pattern Matching Problem Given strings T & P, find first substring of T matching P T is the “text” P is the “pattern” Has many, many, many applications Search engines Database queries Biological research
Brute-Force Approach Common method of solving problems Easy to develop Often requires little coding Needs little brain power to figure out Uses computer’s speed for analysis Examines every possible option Painfully slow and use lots of memory Generally good only with small problems
Brute-Force Pattern Matching Compare P with every substrings in T, until find substring of T equal to P -or- reject all possible substrings of T If | P | = m and | T | = n, takes O(nm) time Worst-case: T aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa P aaag Common case for images & DNA data
Brute-Force Pattern Matching Algorithm BruteForceMatch(String T, String P) // Check if each rank of T starts a matching substring for i 0 to T.length() – P.length() // Compare substring starting at T[i] with P j 0 while j < P.length() && T.charAt(i + j) == P.charAt(j) j j + 1 if j == P.length() return i // Return 1 st place in T we find P return -1 // No matching substring exists
Your Turn Get back into groups and do activity
Before Next Lecture… Keep up with your reading! Cannot stress this enough Get ready for Lab Mastery Exam Start thinking about questions for Final