Download presentation
Presentation is loading. Please wait.
Published byBernice Hodge Modified over 9 years ago
1
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics
2
Types & Regular Expressions2 Regular Expressions Regular expressions are a powerful tool for matching patterns against strings Available in many languages (AWK, Sed, Perl, Python, Ruby, C/C++, others) Matching strings with RegExp’s is very efficient and fast
3
Types & Regular Expressions3 RegExp basics A regular expression is a pattern that can be compared to a string A regular expression is created using the / / delimiters: /^[abc].*f$/ A regular expression is matched using the =~ (binding) operator A regular expression match returns true or false if ($mystring =~ /^[abc].*f$/) { }
4
Types & Regular Expressions4 String Matching Examples of a few simple regular expressions $a = "Fats Waller"; $a =~ /a/ » 1 (true) $a =~ /z/ » nil (false) $a =~ /ll/ » 1 (true)
5
Types & Regular Expressions5 Regular Expression Patterns Most characters match themselves Wildcard:. (period) = any character Anchors ^ = “start of line” $ = “end of line”
6
Types & Regular Expressions6 Character Classes Character classes: appear within [] pairs Most special Regexp characters (^, $, etc) turned off Escape sequences (\n etc) still work [aeiou] [0-9] ^ as first character = negate the class You can use the literal characters ] and – if they appear first: []-abn-z]
7
Types & Regular Expressions7 Predefined character classes These work inside or outside []’s: \d = digit = [0-9] \D = non-digit = [^0-9] \s = whitespace, \S = non-whitespace \w = word character [a-zA-Z0-9_] \W = non-word character
8
Types & Regular Expressions8 Repetition in Regexps These quantify the preceding character or class: * = zero or more + = one or more ? = zero or one {m, n} = at least m and at most n {m, } = at least m High precedence – Only matches one character or class, unless grouped: /^ran*$/ vs. /^r(an)*$/
9
Types & Regular Expressions9 Alternation | is like “or” – matches either the regexp before the | or the one after Low precedence – alternates entire regexps unless grouped /red ball|angry sky/ matches “red ball” or “angry sky” not “red ball sky” or “red angry sky) /red (ball|angry) sky/ does the latter
10
Types & Regular Expressions10 Side Effects (Perl Magic) After you match a regular expression some “special” Perl variables are automatically set: $& – the part of the expression that matched the pattern $‘ – the part of the string before the pattern $’ – the part of the string after the pattern
11
Types & Regular Expressions11 Side effects and grouping When you use ()’s for grouping, Perl assigns the match within the first () pair to: \1 within the pattern $1 outside the pattern “mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss” /([aeiou][aeiou]).*\1/
12
Types & Regular Expressions12 Repetition and greediness By default, repetition is greedy, meaning that it will assign as many characters as possible. You can make a repetition modifier non-greedy by adding ‘?’ a = "The moon is made of cheese“ showRE(a, /\w+/)» > moon is made of cheese showRE(a, /\s.*\s/)» The >cheese showRE(a, /\s.*?\s/)» The >is made of cheese showRE(a, /[aeiou]{2,99}/)» The m >n is made of cheese showRE(a, /mo?o/)» The >n is made of cheese
13
Types & Regular Expressions13 RegExp Substitutions
14
Types & Regular Expressions14 Using RegExps Repeated regexps with list context and /g Single matches
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.