Download presentation
Presentation is loading. Please wait.
Published byAsher Blair Modified over 9 years ago
1
Pattern Matching: Simple Patterns
2
Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that begin with “A”. –Find all files that end in “txt” This capability is provided by a variety of tools. –e.g. egrep, grep, awk, Useful to include this functionality in a programming language.
3
Perl’s Pattern Matcher Perl has a built in pattern matcher. –Motivation: system administrators frequently use regular expressions. They also use Perl. Syntax is borrowed from the grep utility in Unix. Based on regular expressions from computer science.
4
Perl’s Pattern Matcher (cont.) Operates over a single string. Contexts: –Scalar: Returns true or false. –List: Matching substrings returned in a list. The syntax is: m dl pattern dl [modifiers] (/) is the most common delimiter. –m operator is unnecessary. Other delimiters can be used: m~pattern~
5
Simple Patterns Simple patterns – match individual characters or character classes. An abstract representation of a set of strings. A pattern “matches” when the string it’s compared with is in the set. Matching is done from left to right.
6
Three Categories of Characters Normal characters: –Match themselves. –Includes escape characters – e.g. \t, \cC Metacharacters: –Have special meanings in patterns –\ | ( ) [ ] { } ^ $ * + Period: –Matches any character except newline.
7
An Example $_ = “It’s snowing today.”; if (/snow/) { print “There was snow somewhere in $_”; } else { print “$_ was snowless \n”; }
8
Character Classes Character classes specify collections of characters in patterns. Defined by placing the set in [ ] –e.g. /[<>=] Dashes are used specify ranges of characters: –/[A-Za-z]/ –/[0-7]/ –/[0-3-]/
9
Exclusion From a Class Characters can be excluded from a class with (^) Matches anything except the specified characters. For example: –/[^A-Za-z]/ –/[^01]/
10
Useful Abbreviations AbbreviationPatternMatches \d[0-9]A digit \D[^0-9]A nondigit \w[A-Za-z_]A word char \W[^A-Za-z_]A nonword char \s[ \r\t\n\f]A white-space char \S[^ \r\t\n\f]A non-white-space char
11
Some Examples /[A-Z]”\s/ /[\dA-Fa-f]/ /\w\w:\d\d/ /0x\d/
12
Variables in Patterns A variable in a pattern is interpolated. For example, $hexpat = “\\s[\dA-Fa-f]\\s”; if (/$hexpat/) { print “$_ has a hex digit.” }
13
Quantifiers Quantifiers can make a pattern more powerful. Allows a pattern to be repeated a specified number of times. Perl has four kinds of quantifier: –*, +, ?, {m, n} Quantifier immediately follows the pattern it quantifies.
14
{m, n} {n} – exactly n repetitions. {m,} – at least m repetitions. {m,n} – at least m, but not more than n repetitions.
15
{m,n} Examples /a{1,3}b/ - ab, aab, aaab /ab{3}c/ - abbbc /ab{2,}c/ - abbc, abbbc, abbbbc, … /c{3} z{5}/ - ccc zzzzz /[abc] {1, 2}/ - a,b,c,ab,ac,ba,bc,ca,cb
16
Asterisk (*) (*) means zero or more repetitions. Equivalent to {0,} For example, –/0\d\d*/ –/\w\w*/ –/bob.*cat/
17
Plus (+) (+) means one or more repetitions. Equivalent to {1,} For example, –/\w+/ –/[A-Za-z][A-Za-z\d_]+/ –/\d+\.\d+/
18
Question Mark (?) (?) means either zero or one. Equivalent to {0,1}. For example, –/\d+\.?/ –/\$?\d+\.\d\d/ –/”?\w+”?/
19
Subpatterns Quantifiers modify only the last character. –e.g. /ball*/ () can be used to group parts of patterns. The quantifier modifies the group. For example, –/(ball)*/ –/(boo! ){3}/
20
Alternation (|) is the logical OR operator in a pattern. /a|e|i|o|u/ is equivalent to /[aeiou]/ For example, –/(Bob|Tom|Pussy|Scaredy)cat/ –/t(oo?|wo)/ Be careful! –/Tom|Tommie/
21
Precedence The precedence of the operators are: –Parenthesis –Quantifiers –Character Sequence –Alternation For example, –/#|-+/ –/(#|-)+/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.