Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single.

Similar presentations


Presentation on theme: "By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single."— Presentation transcript:

1 By: Andrew Cory

2 Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single unit Useful for the creation of multiple words and/or phrases with similar base characters and/or words Ex. /house(cat|keeper)/ =~ /housecat|housekeeper/ Ex. /(a|[bc])d/ =~ ‘ad’, ‘bd’, or ‘cd’ Ex. /(19|20|)\d\d/ =~ matches 19xx, 20xx, or xx

3 Continued Backtracking: step-by-step process of trying alternatives and seeing if they match, and moving on to the next alternative if it doesn’t Any given regular expression has several paths that result in a different string Backtracking is a trial-and-error method that goes through one character at a time.

4 Continued Backtracking Example – “abcd” =~ /(af|ab)(ce|c|cd)/; 1 – start with letter “a” 2 – try 1 st alternative 3 – ‘a’ matches, but ‘f’ doesn’t match ‘b’, backtrack to ‘a’ and try 2 nd alternative 4 – ‘a’ and ‘b’ matches the first 2 characters, first group satisfied, next group. 5 – ‘c’ matches, but ‘e’ doesn’t, backtrack to ‘c’, try 2 nd alt. 6 – ‘c’ matches, second group is satisfied, therefore whole expression is satisfied by “abcd” Note – 3 rd alt. in the 2 nd group matches too, but is irrelevant: the string already satisfied the regular expression.

5 Extracting Matches Parentheses not only group, they also extract and separate parts of strings that match the given condition I.e. if ($time =~ /(\d\d):(\d\d):(\d\d)/) { $hours = $1; $minutes = $2; $seconds = $3; } ($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

6 Continued Nested grouping in a regular expression results in more separation Ex. /(ab(cd|ef)((gi)|j))/; $1 = ab $2 = cd|ef $3 = gi|j $4 = gi Backreferences – related to matching variables $1, $2, etc., but can only be used inside the regular expression Useful for repeating phrases Ex. /(\w\w\w)\1/ =~ ‘booboo’, or ‘murmur’

7 Continued Positions of string portions that match the conditions are also stored in the @- and @+ arrays Ex. $x = “Mmm…donut”; $x =~ /^(Mmm)\.\.\.(donut)/; Foreach $expr (1..$#-) { print “$expr: ‘${$expr}’ at ($-[$expr],$+[$expr])\n” Output: 1: ‘Mmm’ at (0,3) 2: ‘donut’ at (6,11)

8 Continued Strings that have no groupings but are still searched for are still stored in separate variables $` is the string before the match $& is the string that matched $’ is the string after the match Ex. $x = “I like chips”; $x =~ /like/; $` = “I “ $& = “like” $’ = “ chips”

9 Matching Repetitions Quantifier characters ?, *, +, and {} are used to match words or syllables of any length without massive amounts of repetition Definitions a? = matches ‘a’ one or zero times a* = matches ‘a’ any number of times a+ = matches ‘a’ one or more times (at least once) a{n,m} = matches at least n times, not more than m times a{n, } = matches at least n or more times a{n} = matches exactly n times

10 Continued Examples /[a-z]+\s+\d*/ = a lowercase word, some space, and any number of digits (ajc 93, jgro 843986) /(\w+)\s\1/ = a doubled word of any length with a space inbetween (jon jon, hidalgo hidalgo) /y(es)?/i = ‘y’, ‘Y’, or ‘yes’

11 Continued Perl will always try to match as much of a given string as possible to a regular expression so long as the regular expression holds true I.e. the ‘?’ operator will be matched to the string with whatever precursor present, if not it stops using it Ex. $x = “the cat in the hat”; $x =~ /^(.*)(at)(.*)$/; $1 = ‘the cat in the h’ $2 = ‘at’ $3 = ‘’

12 Continued Quantifiers that grab as much of the string as possible are known as ‘maximal match’ or ‘greedy’ quantifiers 4 important regular expression principles Principle 1: any regexp will be matched at the earliest possible position in the string Principle 2: The leftmost alternation that matches in a group will be the one used (a|b|c) Principle 3: Matching quantifiers will match as much of the string as possible while holding true to the regexp Principle 4: The leftmost greedy quantifier has more priority over other existing greedy quantifiers

13 Continued Examples $x = “The programming republic of Perl”; $x =~ /^(.+)(e|r)(.*)$/ $1 = ‘The programming republic of Pe’ $2 = ‘r’ $3 = ‘l’ $x =~ /.*(m{1,2})(.*)$/ $1 = ‘m’ $2 = ‘ing republic of Perl’

14 Continued Sometimes returning the minimal piece of a string is essential, thus, ‘minimal match’ or ‘non-greedy’ quantifiers ??, *?, +?, and {}? were created. Definitions a?? = match ‘a’ 0 or 1 times, 0 first, then 1 a*? = match ‘a’ any number of times, as few as possible a+? = match ‘a’ 1 or more times, as few as possible a{m,n}? = match n times, no more than m, as few as pos. a{n, }? = match n times, as few as possible a{n}? = match n times, same thing as a{n}

15 Continued Examples: same as above, different operators! $x = “The programming republic of Perl”; $x =~ /^(.+?)(e|r)(.*)$/ $1 = ‘Th’ $2 = ‘e’ $3 = ‘ programming republic of Perl’ $x =~ /.*?(m{1,2})(.*)$/ $1 = ‘mm’ $2 = ‘ing republic of Perl’

16 Continued Note: Principle 3 (matching quantifiers) may be manipulated for non-greedy quantifiers so that the leftmost quantifier matches the least amount of the string as possible

17 Continued Quantifiers are susceptible to backtracking Ex. $x = “the cat in the hat” $x =~ /^(.*)(at)(.*)$/; $1 = ‘the cat in the h’ $2 = ‘at’ $3 = ‘’ 1 Start with the first letter, ‘t’ 2 The first quantifier starts, matches whole string 3 ‘a’ does not match the end of the string, backtrack once 4 ‘a’ does not match the last letter ‘t’, backtrack once more 5 match ‘a’, then the ‘t’ 6 move on to the 3 rd element. Already at the end of the string, assign it as an empty string

18 Continued Error alert! Nested indeterminable quantifiers are dangerous things Ex. /(a|b+)*/; In the above example, the first repetitions searches with b+ of whatever length (up to infinite), and then again searches with the * thereafter with whatever length (infinite) If a match is not found early in the process, Perl will attempt to find EVERY possibility before halting (massive amount of memory used)

19 Building a Regexp Step one: decide what we want to match and what we want to exclude. Ex. A regexp that matches numbers will reject any string, and accept both integers and floating point #’s Step two: break the problem down into smaller parts Smaller parts are easier to work with Ex. Any integer: /[+-]?\d+/ \d+ represents a digit [+-] represents a number’s sign (positive/negative)

20 Continued Ex. Floating point Has a sign, decimal point, fractional part, and an exponent, i.e. 25.4E-72 /[+-]?(\d+\.\d|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$/; 1 st part ([+-]?) is the sign of the number 2 nd part (\d+\.\d|\d+\.|\.\d+|\d+) is the several different ways a floating point number can be (2.54, 346.,.395, 500) 3 rd part ([eE][+-]?\d+)? is the exponential part, which is represented by e or E followed by a sign, then a decimal of any size (e-5, E9000)

21 Continued The //x modifier in Perl allows one to write complex regexps with as much spacing as the programmer wants /^ [+-]? ( \d+\.\d+ |\d+\. |\.\d+ |\d+ ) ([eE][+-]?\d+)? $/x;

22 Continued The downside to the //x modifier: certain symbols must be typed differently Spacing Since //x ignores spaces as relevant regexp input, spaces must be typed in as ‘\ ‘ or ‘[ ]’ Pound Signs Similar instance as spaces, they are typed out as ‘\#’ or ‘[#]’ using //x

23 Continued Example – /^ [+-]?\ *#an infinite amount of spaces has been added (#between the sign and the floating point # \d+ (#the coding for the floating point has been re- \.\d*#worked since most of the conditions )?#started similarly. |\.\d+ ) ([eE][+-]?\d+)? $/x;


Download ppt "By: Andrew Cory. Grouping Things & Hierarchical Matching Grouping characters – ( and ) Allows parts of a regular expression to be treated as a single."

Similar presentations


Ads by Google