CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann
Michael Eckmann - Skidmore College - CS Fall 2006 Today’s Topics Questions / comments? Anyone try the tutorial? Homework assignment to be assigned tonight. Perl –continue with pattern matching/regular expressions
Perl Michael Eckmann - Skidmore College - CS Fall 2006 =~ (matches) !~ (doesn't match) m/ / (this is the format of match regular expression) Variables can be put inside the / / search pattern and they are interpolated. =~ can be omitted if matching the $_ special default variable ^ forces the match to be required to be at the very beginning of the string $ forces the match to be required to be at the very end of the string [ ] square brackets denote a character class where ONE character in the class will match –^ (not) to match a char NOT inside the character class, ^ must be right after the [ –– (hyphen) to specify a range of characters $` $& and $' (left, matched, right) special variables that are set after a match
Perl Michael Eckmann - Skidmore College - CS Fall 2006 There are ways to specify common character classes \d (any digit) \s (any whitespace \ \t\r\n\f \w (any “word” character (a digit, letter or underscore)) \D (any non-digit) \S (any non-whitespace) \W (any non-word character). (any character other than newline \n) These can be used within the square brackets or without.
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Modifiers are characters that go after the second forward slash i is a modifier for ignore case. The behaviour for no modifier (the default) is that. Matches any non-newline character ^ matches at beginning of string $ matches at end of string (or before a newline at end) s modifier: treats the string as a single long line, so. matches any character including newline m modifier: treats string as multiple lines so, ^ and $ match the beginning or end of any line But now, \A matches the beginning of the whole string, \Z matches the end of the whole string. Let’s look at this webpage’s examples under Using Character Classes for some more examples:
Perl Michael Eckmann - Skidmore College - CS Fall 2006 | alternation character (acts sort of like a logical or) Grouping characters using the parentheses Getting the “submatches” by using the $1, $2, $3, etc. variables which are set via the parentheses. Using \1, \2, \3, etc. WITHIN the match expression will allow earlier subgroup matches to be part of the match string! These \1, \2, \3, etc. are called backreferences.
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Let’s continue looking at this site for examples of using the alternation character, grouping using parentheses and the backtracking mechanism and extracting matches using parentheses
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Repetition quantifiers are put immediately after the –character, –character class, or –grouping The repetition quantifiers and their meanings are: ? - 0 or 1 time * - 0 or more times or more times { } – min and max, at least or exactly { min, max } - match >=min times and at most max times. { min, } - match >=min times { n} - match n times exactly –these are GREEDY, that is, they match as much of the string as possible while still allowing the whole regular expression to match
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Curly braces { } – min and max, at least or exactly { min, max } - match >=min times and at most max times. { 5, 10 } - matches between 5 and 10 times inclusive { min, } - match >=min times {3, } - matches 3 or more times { n} - match n times exactly { 6 } - matches exactly 6 times Examples of a repetition quantier after a grouping and after a character m/(the){3}/ this will match thethethe all consecutively. m/the{3}/ this will match theee (only the e is repeated 3 times) m/the.*the.*the/ This will match 3 the’s with any characters (except \n) btwn them Any other way to write it?
Perl Michael Eckmann - Skidmore College - CS Fall 2006 In terms of regular expression repetition quantifiers, what does greedy mean again?
Perl Michael Eckmann - Skidmore College - CS Fall 2006 In terms of regular expression repetition quantifiers, what does greedy mean? A quantifier is greedy if it matches as much of the string as possible while still allowing the whole regular expression to match. We'll see that greediness in action now. Let’s continue looking at this site for examples of matching repetitions and the 4 principles that are followed:
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Recap on the special variables we learned $_ $` $& and $' (left, match, right) $0 (program name) $1, $2, $3,... (the submatches)
Perl Michael Eckmann - Skidmore College - CS Fall 2005 Let's write a few regular expressions. match any signed or unsigned integers of arbitrary length. e.g. it should match –-22 –4567 –1 –+43 but not things like: –- –+ –4.56 –abcd –etc.
Perl Michael Eckmann - Skidmore College - CS Fall 2005 Let's try these: 1) ignore beginning whitespace if there is any, and match the word program and store the rest of the string (after the word program) into some variable. 2) Now what if there were \n's in the string? What might we change? 3) cs330 or cs106 or CS106 or CS330 but not Cs330, or cS106 etc.
Perl Michael Eckmann - Skidmore College - CS Fall ) ignore beginning whitespace if there is any, and match the word program and store the rest of the string (after the word program) into some variable. m/\s*program(.*)/ 2) Now what if there were \n's in the string? What might we change? m/\s*program(.*)/s 3) cs330 or cs106 or CS106 or CS330 but not Cs330, or cS106 etc. m/cs330|cs106|CS330|CS106/ OR m/(cs|CS)(106|330)/
Perl Michael Eckmann - Skidmore College - CS Fall 2006 Let’s look at a larger parsing example using many of the features we just learned. We'll read the problem and try to solve it ourselves before looking at the solution. The “doing string selections” section of: The following page is a good page for reference. It is a nice summary of the different characters and their meanings with succinct examples: