Download presentation
Presentation is loading. Please wait.
Published byMalcolm Shepherd Modified over 9 years ago
1
Regular Expressions in Perl Part I Alan Gold
2
Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters Prefixing the expression with “m” allows for arbitrary delimiters: e.g. m%Don’t use this% Modifiers follow the closing delimiter
3
Simple matching “Hello World” =~ /Hello/ Matches the literal string “Hello” “Superman” =~ /Kal-El/ Unfortunately does not match
4
Metacharacters Metacharacters are {}[]()^$.|*+?\ These must be escaped with a “\” to match their literal characters “Spoon+fork” =~ /Spoon+/ will match, but not how you want it to “Spoonnnnnn” =~ /Spoon+/ will also match “Spoon+fork” =~ /Spoon\+/ matches properly
5
Escape sequences Several characters can’t be printed directly They are matched using an escape sequence \t is a tab character (ASCII code 9) \n is a newline character (ASCII code 10) \r is a carriage return (ASCII code 13) \0.. Is an octal character, e.g. \033 \x.. Is a hexidecimal character, e.g. \x1B
6
Variables Variables can be used in regular expressions similarly to double-quoted strings $something = “cool”; ‘cool cruel pool’ =~ /$something/ Will match just fine
7
Anchors ^ anchors the pattern to the beginning of the string $ anchors to the end “Speaker” =~ /^peak/ Will not match “Rabbit” =~ /bit$/ Will match
8
Character classes Character classes match any character contained in [brackets] /tin[yas]/ will match tiny, tina, and tins “-” can be used to represent a range /[a-zA-Z0-9]/ will match a single alphanumeric character The literal “-” character can be matched if it is the first or last character, e.g. /[-0-9]/
9
Negated character classes The “^” character negates a character class /200[^7]/ will not match 2007 but will match 2008, 200q, etc.
10
Shortcut character classes \d is a digit, equivalent to [0-9] \s is any whitespace, equivalent to [\ \t\r\n\f] \w is a word character, eq. [0-9a-zA-Z_] \D is any non-digit, eq. [^0-9] \S is any non-whitespace, eq. [^\s] \W is any non-word, eq. [^\w] The period ‘.’ matches any character but ‘\n’
11
Word anchors The word anchor ‘\b’ matches the boundary between a word character and non-word character /\bpen/ matches “penitentiary”, not “open” /\bpen\b/ only matches “pen” if surrounded by non-words, e.g. “this pen is blue”
12
Modifiers Modifiers change the behavior of the engine // is the default, ‘.’ doesn’t match newlines //s causes ‘.’ to match newlines //m treats each line as its own string //i matches case-insensitively Modifiers can be combined, e.g. //sim /^car.$/im matches “not a car\nCAR!”
13
Or The pipe character ‘|’ can be used to match any one of the given choices /lumber|wood/ will match “My desk is made of spare lumber” and “My desk is made of 100,000 year old petrified wood” /0|1|2/ is equivalent to [0-2]
14
A blank slide
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.