Nate Brunelle Today: Regular Expressions CS1110 Nate Brunelle Today: Regular Expressions
Questions?
String.find() Takes a string as an argument, and if exactly that string appears, give its index Mystring.find(“Purple Elephant”) “purple elephant”.find(“Purple Elephant”) “the elephant was purple”
Wildcards [Rr]ugs?[^a-zA-Z] Match on/ find: Will not match on/find: Rugged rugged We might want: A way of saying r or R å Maybe there’s an s ç Something that’s not a letter ê åugçê [Rr]ugs?[^a-zA-Z]
he she it they went to the store s?h?e?i?t? Alternation (or) Sit Alternation (or) | s?he|it|they (she|he|it|they) went to the store she went to the store he went to the store it went to the store they went to the store
Star vs plus vs ? Spo?ky Spo*ky Spo+ky Spky Spoky Spooky Spoooky Spooooky Spoooooooooooooky … Spo+ky
R string “\”” r“\”” -> error r“\”this” -> error r“\n” -> \n
Regex Pieces Operation Example Meaning Character class [Rr] or [rR] [abcd] [\^A] R or r Exactly one of a, b, c, or d Just carat (^) or A Character Range [a-z] [a-zA-Z] [0-9] Exactly one character “between” a and z “between” a and z or “between” A and Z Any one digit Negative character class [^a] [^a-zA-Z] [^\^] Any one character that’s not an a Any one character that’s not a letter any one character that’s not a carat Optional Quantifier s? [Rr]? Maybe there’s an s, 0 or 1 s Either have one of R or r or neither OR, alternation wx|xyz s?he|it One of the strings wx or xyz Matches one of the two regexes Star [abc]* Any number of a’s b’s and c’s at all 0 or more copies of… Plus [abc]+ At least one of a’s, b’s, and c’s 1 or more copies of…
Regex Pieces, Cont. All UVA computing IDs Operation Example Meaning Count Range {3, 5} [ab]{2,3} [abc]{5} Between 3 and 5 (inclusive) copies of. aa, ab, ba, bb, aaa, aab, abb, baa, … End of Text $ This is some text# Beginning of Text ^ #This is some text Word Boundary \b #This# #is# #some# #text# Anything . Any one character .* Any number of characters All UVA computing IDs 2-3 letters, number, 1-3 letters [a-z]{2,3}[2-9][a-z]{1,3}
Give an Expression to match All UVA computing IDs 2-3 letters, number, 1-3 letters [a-z] [a-z] [a-z]?[2-9] [a-z] [a-z]? [a-z]?
What does a for loop look like? for [variable] in [collection]: Variable: [a-zA-Z]+ [0, 1, 5, 9]
import re finder = re.compile Use the finder Match Object search Similar to string.find(), gives just the first matching instance finditer Gives a collection of match objects findall I list containing: 0 parentheses: m.group() 1 paren: m.group(1) 2+ paren: m.groups() Match Object Group The text we matched on start end groups
Writing a regex Write down some examples of strings you want to match, and some examples of similar strings that you don’t want to match Want to match: njb2b, mst3k, aaa8bbb, aa4aa Don’t want to match: a2b, njb2, 7bb Going left-to-right through your examples, try to come up with the rules that will match/not match on the correct strings
Regex for phone numbers ((3n) ?|3n-)? 3n- 4n Area = (\([0-9]{3}\) ? | [0-9]{3}-)? Office = [0-9]{3}- rest = [0-9]{4} Want to match: 555-1234 434-555-1234 (434) 555-1234 Don’t want to match: 555-123 5551234 5555-1234 111-1234 123-234-5678 Also handle parentheses [2-9][0-9]{2}\-([2-9][0-9]{2}\-)?[0-9]{4}|\([2-9][0-9]{2}) ? [2-9][0-9]{2}\-[0-9]{4}