Chapter 8 – Regular Expressions

Chapter 8 – Regular Expressions
Outline 8.1 Introduction 8.2 Matching Operator m// 8.3 Substitution operator s// 8.4 Special Characters and Character Classes 8.5 Alternation 8.6 Quantifiers 8.7 Quantifiers Greediness 8.8 Assertions 8.9 Backreferences 8.10 More Regular-Expression Modifiers 8.11 Global Searching and the /g modifier 8.12 Example: Form Verification 8.13 Internet and World Wide Web Resources

1 #!/usr/bin/perl 2 # Fig. 8.1: fig08_01.pl 3 # Simple matching example. 4 5 use strict; 6 use warnings; 7 8 my $string = 'It is winter and there is snow on the roof.'; 9 my $pattern = 'and'; 10 11 print "String is: '$string'\n\n"; 12 13 print "Found 'snow'\n" if $string =~ m/snow/; 14 15 print "Found 'SNOW'\n" if $string =~ m/SNOW/; 16 17 print "Found 'on the'\n" if $string =~ m/on the/; 18 19 print "Found '$pattern'\n" if $string =~ m/$pattern/; 20 21 print "Found '$pattern there'\n" if $string =~ m/$pattern there/; The matching operator takes two operands. The first is the regular expression (or matching pattern) to search for, which is placed between the slashes of the m// operator. The second operand is the string in which to search, which is assigned to the match operator using =~. By default, regular expressions are case sensitive. Thus, a similar search for the string SNOW returns false, and the associated print statement does not execute. Uses the matching operator, m//, to search for the string snow inside variable $string. Rather than searching for the literal characters “$pattern,” the matching operator interpolates the value of $pattern (the string and) into the search pattern. String is: 'It is winter and there is snow on the roof.' Found 'snow' Found 'on the' Found 'and' Found 'and there'

The substitution operator can use delimiters other than /.
1 #!/usr/bin/perl 2 # Fig. 8.2: fig08_02.pl 3 # Substitution Example 4 5 use strict; 6 use warnings; 7 8 my $string = "Hello to the world"; 9 10 print "The original string is: \"$string\"\n"; 11 $string =~ s/world/planet/; 12 print "s/world/planet/ changes string: $string \n"; 13 14 our $_ = $string; 15 print "The original string is: \"$_\"\n"; 16 s/planet/world/; 17 print "s/planet/world/ changes string: $_ \n"; 18 19 print "The original string is: \"$_\"\n"; 20 s(world)(planet); 21 print "s(world)(planet) changes string: $string \n"; 22 23 $string = "This planet is our planet."; 24 print "$string\n"; 25 my $matches = $string =~ s/planet/world/g; 26 print "$matches occurrences of planet were changed to world.\n"; 27 print "The new string is: $string\n"; Place the pattern, denoting the string to be replaced, between the first two slashes of the substitution operator, s///. Between the second two slashes, place the substitution pattern that will replace the first pattern. The global modifier, /g, at the end of the regular expression causes the substitution operator to replace every occurrence of the first pattern (planet) with the second pattern (world). The assignment operator takes the value returned from the s/// operator and assigns it to $matches. Using the string currently stored in $_ with the substitution operator. The substitution operator can use delimiters other than /.

The original string is: "Hello to the world"
s/world/planet/ changes string: Hello to the planet The original string is: "Hello to the planet" s/planet/world/ changes string: Hello to the world s(world)(planet) changes string: Hello to the planet This planet is our planet. 2 occurrences of planet were changed to world. The new string is: This world is our world.

This pattern (\d) is called a special character. It matches any digit.
1 #!/usr/bin/perl 2 # Fig 8.3: fig08_03.pl 3 # Determine if a string has a digit. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "hello there"; 9 my $string2 = "this one has a 2"; 10 11 number1( $string1 ); 12 number1( $string2 ); 13 number2( $string1 ); 14 number2( $string2 ); 15 16 sub number1 17 { 18 my $string = shift(); 19 20 if ( $string =~ /\d/ ) { print "'$string' has a digit.\n"; 22 } 23 else { print "'$string' has no digit.\n"; 25 } 26 } 27 This pattern (\d) is called a special character. It matches any digit.

28 sub number2 29 { 30 my $string = shift(); 31 32 if ( $string =~ /[0-9]/ ) { print "'$string' has a digit.\n"; 34 } 35 else { print "'$string' has no digit.\n"; 37 } 38 } Brackets ([]) enclose the character class to separate it from the surrounding pattern. Inside the brackets, a dash indicates a range. So, [0-9] matches any digit, like \d. 'hello there' has no digit. 'this one has a 2' has a digit.

1 #!/usr/bin/perl 2 # Fig. 8.5: fig08_05.pl 3 # Using alternation. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "i think we should stop"; 9 my $string2 = "lets continue"; 10 my $string3 = "i don't want to end"; 11 12 finish( $string1 ); 13 finish( $string2 ); 14 finish( $string3 ); 15 16 sub finish 17 { 18 my $string = shift(); 19 print "$string\n"; 20 21 if ( $string =~ /stop|quit|end/ && $string !~ /not|don't/ ) { print "alright, we're finished.\n"; 24 } 25 else { print "ok, lets keep going.\n"; 27 } 28 } Searches $string to determine if it contains one of the strings stop, quit or end, and it does not contain not or don't. If the pattern matches, the condition is true and alright, we're finished. is displayed; otherwise, ok, let's keep going is displayed.

i think we should stop alright, we're finished. lets continue ok, lets keep going. i don't want to end

1 #!usr/bin/perl 2 # Fig. 8.6: fig08_06.pl 3 # Showing the dangers of using alternate without parentheses. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "hello"; 9 my $string2 = "hello there"; 10 my $string3 = "hi there"; 11 12 print "$string1\n$string2\n$string3\n"; 13 14 print "watch this:\n"; 15 16 print "1: how are you?\n" if ( $string1 =~ m/hello|hi there/ ); 17 print "2: how are you?\n" if ( $string2 =~ m/hello|hi there/ ); 18 print "3: how are you?\n" if ( $string3 =~ m/hello|hi there/ ); 19 20 print "now watch this:\n"; 21 22 print "1: how are you?\n" 23 if ( $string1 =~ m/(hello|hi) there/ ); 24 print "2: how are you?\n" 25 if ( $string2 =~ m/(hello|hi) there/ ); 26 print "3: how are you?\n" 27 if ( $string3 =~ m/(hello|hi) there/ ); We want to search for “hello” or “hi,” and then “there.” However, Perl interprets the space in the pattern the same as any other character. So, “hi there” is considered as one whole string. Thus, it is also considered as one option for the alternation operator. In the second part of this example, the alternation expression hello|hi is separated from the rest of the pattern with parentheses. This pattern is the one that we wanted to match in the first place.

hello hello there hi there watch this: 1: how are you? 2: how are you? 3: how are you? now watch this:

1 #!usr/bin/perl 2 # Fig. 8.7: fig08_07.pl 3 # Some quantifiers. 4 5 use strict; 6 use warnings; 7 8 my $string = "11000"; 9 10 change1( $string ); 11 change2( $string ); 12 change3( $string ); 13 14 $string = " "; 15 16 change1( $string ); 17 change2( $string ); 18 change3( $string ); 19 20 sub change1 21 { 22 my $string = shift(); 23 print " Original string: $string\n"; 24 $string =~ s/1\d*1/22/; 25 print "After s/1\\d*1/22/: $string\n\n"; 26 } 27 The asterisk (*) quantifier tells the regular-expression engine to match any number of (including zero) matches of the preceding pattern.

28 sub change2 29 { 30 my $string = shift(); 31 print " Original string: $string\n"; 32 $string =~ s/1\d+1/22/; 33 print "After s/1\\d+1/22/: $string\n\n"; 34 } 35 36 sub change3 37 { 38 my $string = shift(); 39 print " Original string: $string\n"; 40 $string =~ s/1\d?1/22/; 41 print "After s/1\\d?1/22/: $string\n\n"; 42 } The plus (+) quantifier tells the engine to match one or more instances of a pattern. The question mark (?) quantifier tells the engine to match 0 or 1 instances of a pattern.

Original string: 11000 After s/1\d*1/22/: 22000 After s/1\d+1/22/: 11000 After s/1\d?1/22/: 22000 Original string: After s/1\d*1/22/: 22 After s/1\d+1/22/: 22 After s/1\d?1/22/:

1 #!usr/bin/perl 2 # Fig. 8.9: fig08_09.pl 3 # Greedy and non-greedy quantifiers. 4 5 use strict; 6 use warnings; 7 8 my $string1 = 9 "Hello there. Nothing here. There could be something here."; 10 my $string2 = $string1; 11 12 print "$string1\n"; 13 $string1 =~ s/N.*here\.//; 14 print "$string1\n"; 15 print "$string2\n"; 16 $string2 =~ s/N.*?here\.//; 17 print "$string2\n\n"; When the quantifier is greedy (no ? after the quantifier), the dot will match as many characters as it possibly can, leaving the here to match at the end of the third sentence. When the quantifier is not greedy, (.*?), the period matches as little as possible, leaving the here to match at the end of the second sentence. Hello there. Nothing here. There could be something here. Hello there. Hello there. There could be something here.

The look-behind assertion takes the form (?<=value1)value2
1 #!usr/bin/perl 2 # Fig. 8.10: fig08_10.pl 3 # Testing the look behind assertion. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "i be hungry."; 9 my $string2 = "we be here."; 10 my $string3 = "he be where?"; 11 12 conjugate( $string1 ); 13 conjugate( $string2 ); 14 conjugate( $string3 ); 15 16 sub conjugate 17 { 18 my $string = shift; 19 print "$string\n"; 20 $string =~ s/(?<=i )be/am/; 21 $string =~ s/(?<=we )be/are/; 22 $string =~ s/(?<=he )be/is/; 23 print "$string\n"; 24 } The look-behind assertion takes the form (?<=value1)value2 where we check to see if value1 occurred right before value2. The first look-behind assertion (?<=i ) tests whether the string matched “i” right before it matched “be.” If so, “be” is replaced with “am.”

i be hungry. i am hungry. we be here. we are here. he be where? he is where?

1 #!usr/bin/perl 2 # Fig. 8.11: fig08_11.pl 3 # Using backreferencing to find palindromes. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "madam im adam"; 9 my $string2 = "the motto means something"; 10 my $string3 = "no palindrome here"; 11 12 findPalindrome( $string1 ); 13 findPalindrome( $string2 ); 14 findPalindrome( $string3 ); 15 16 sub findPalindrome 17 { 18 my $string = shift(); 19 20 if ( $string =~ /(\w)\W*(\w)\W*(\w)\W*(\w)\W*\4\W*\3\W*\2\W*\1/ or $string =~ /(\w)\W*(\w)\W*(\w)\W*(\w)\W*\3\W*\2\W*\1/ ) { print "$string - ", "has a palindrome of at least 7 characters.\n"; 26 } 27 else { print "$string - has no long palindromes.\n"; 29 } 30 } This expression (\4) is called a backreference. In regular expressions, parentheses capture bits of a string that can be referenced later in a pattern with a \, followed by a number that indicates the set of parentheses that captured the value.

madam im adam - has a palindrome of at least 7 characters.
the motto means something - has a palindrome of at least 7 characters. no palindrome here - has no long palindromes.

1 #!usr/bin/perl 2 # Fig. 8.12: fig08_12.pl 3 # Capitalize all sentences. 4 5 use strict; 6 use warnings; 7 8 my $string1 = "lets see. there should be two things capitalized."; 9 my $string2 = "This string is fine."; 10 my $string3 = "this could use some work. what needs to be fixed?"; 11 my $string4 = "yes! another string to be capitalized."; 12 my $string5 = "all done? yes."; 13 14 capitalize( $string1 ); 15 capitalize( $string2 ); 16 capitalize( $string3 ); 17 capitalize( $string4 ); 18 capitalize( $string5 ); 19 20 sub capitalize 21 { 22 my $string = shift(); 23 print "$string\n"; 24 $string =~ s/(([.!?]|\A)\s*)([a-z])/$1\u$3/g; 25 print "$string\n"; 26 } This regular expression searches for the places where a letter might need to be capitalized (the information that gets captured in $1), finds the letter to be capitalized (stored in $3) and capitalizes it (using \u). Next, we have the character class [a-z], which matches any lowercase letter. So, the whole pattern tells the engine to find a sentence-ending punctuation mark or the start of the string, followed by any amount of whitespace, followed by a lowercase letter. The character class is alternated with a \A, which matches the beginning of a string. This class tells the regular-expression engine to search for a period, an exclamation point or a question mark. The first part will be captured in $1 and the second part in $3 (the punctuation or \A will be captured in $2, but will also get captured in $3).

lets see. there should be two things capitalized.
This string is fine. this could use some work. what needs to be fixed? This could use some work. What needs to be fixed? yes! another string to be capitalized. Yes! Another string to be capitalized. all done? yes. All done? Yes.

1 #!usr/bin/perl 2 # Fig. 8.13: fig08_13.pl 3 # Using the x modifier. 4 5 use strict; 6 use warnings; 7 8 my $string = "hello there. i am looking for a talking dog."; 9 10 print "$string\n"; 11 12 $string =~ s/ # start the pattern 13 talking # match talking 14 \ # here is a space 15 dog\ # and then dog and a period 16 /what?/x; # replace it with 'what?' 17 print "$string\n"; The /x modifier allows the programmer to add comments and extra whitespace into a pattern in the program’s source code. The substitution pattern is split over multiple lines. This format allows a programmer to use comments in the middle of a regular expression to explain complicated matching patterns. hello there. i am looking for a talking dog. hello there. i am looking for a what?

1 #!usr/bin/perl 2 # Fig 8.14: fig08_14.pl 3 # Search perl code for variables. 4 5 use strict; 6 use warnings; 7 8 my $string = '$one $six 9 10 findScalar( $string ); 11 findArray( $string ); 12 13 sub findScalar 14 { 15 my $string = shift(); 16 17 while ( $string =~ m/\$(\w+)/g ) { print "scalar name: $1\n"; 19 } 20 21 print "\n"; 22 } 23 This regular expression looks for a dollar sign (which needs to be escaped in the pattern) followed by some number of word characters. The /g modifier alters the position of the start of the match. Each time the loop executes, the matching operator finds a different substring that matches.

24 sub findArray 25 { 26 my $string = shift(); 27 28 while ( $string =~ ) { print "array name: $1\n"; 30 } 31 32 print "\n"; 33 } This regular expression looks for followed by some number of word characters. scalar name: one scalar name: two scalar name: four scalar name: six scalar name: seven array name: three array name: five array name: eight

Specify a submit and reset button for the form.
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> 2  3 <!-- A simple HTML form > 4 5 <html> 6 <head> 7 <title>form page</title> 8 </head> 9 10 <body> 11 <p>here's my test form</p> 12 <form method = "post" action = "/cgi-bin/fig08_16.pl"> 13 14 <p>First name: 15 <input name = "firstName" type = "text" size = "20"></p> 16 17 <p>Last name: 18 <input name = "lastName" type = "text" size = "20"></p> 19 20 <p>Phone number: 21 <input name = "phone" type = "text" size = "20"></p> 22 23 <p>Date (MM/DD/YY): 24 <input name = "date" type = "text" size = "20"></p> 25 26 <p>Time (HH:MM:SS): 27 <input name = "time" type = "text" size = "20"></p> 28 29 <input type = "submit" value = "submit"> 30 <input type = "reset" value = "reset"> This line specifies the form’s method as POST, and the action is to run the Perl script fig08_16.pl, a CGI script that processes the information sent from the form to the Web server. Specify a submit and reset button for the form.

31 32 </form> 33 </body> 34 35 </html>

1 #!/usr/bin/perl 2 # Fig. 8.16: fig08_16.pl 3 # Form processing CGI program. 4 5 use strict; 6 use warnings; 7 use CGI ':standard'; 8 9 my $firstName = param( "firstName" ); 10 my $lastName = param( "lastName" ); 11 my $phone = param( "phone" ); 12 my $date = param( "date" ); 13 my $time = param( "time" ); 14 15 print header(); 16 print start_html( -title => "form page" ); 17 18 if ( $firstName =~ /^\w+$/ ) { 19 print "<p>Hello there \L\u$firstName.</p>"; 20 } 21 22 if ( $lastName =~ /^\w+$/ ) { 23 print "<p>Hello there Mr./Ms. \L\u$lastName.</p>"; 24 } 25 The parameters from the Web page are stored into variables that are used later in the code to formulate the part of the Web page that will be returned to the client. The condition in this if structure executes if there are one or more words that make up the entire string (the words must be at the beginning and the end, because of the ^ and $ assertions). These two lines begin the document that will be returned to the client. The \L in this statement puts the remaining string in lowercase letters and the \u makes the letter right after the string uppercase.

Checks for a dash, which the user may or may not enter.
26 if ( $phone =~ /^ # beginning of line (?:1-?)? # optional 1- (?: # start alternate $ # left paren (\d{3}) # capture three digits $ # right paren | # or (\d{3}) # capture three digits ) # end alternate ? # optional dash (\d{3}) # capture three more digits ? # optional dash (\d{4}) # capture the final four digits $/x ) # end of line, with x modifier 40 { 41 print "<p>Your phone number is ", $1 || $2 , " - $3 - $4.</p>"; 42 } 43 44 if ( $date =~ m#^(1[012]|0?[1-9])/([012]?\d|3[01])/(\d\d)$# ) { 45 print "<p>The date is $1 / $2 / $3.</p>"; 46 } 47 48 if ( $time =~ m#^(1[012]|[1-9]):([0-5]\d):([0-5]\d)$# ) { 49 print "<p>The time is $1 : $2 : $3.</p>"; 50 } 51 52 print end_html(); We use ?: so that the value in the set of parentheses is not captured. The ?: does not apply to the nested parentheses in lines 30 and 33. The first part captures three digits and stores them in $1 if the three digits are in parentheses. Otherwise, we check the next half of the alternation. This part first determines if the user input an optional 0 followed by a digit from 1 through 9, denoting the first 9 months of the year. The result is stored in $1. Checks for a one followed by one of the digits 0, 1 or 2 (i.e., months 10, 11 and 12). This locates the numbers for the months October, November and December. If one of these numbers is found, its value is stored in $1. Otherwise, an attempt is made to match the other case, where the first three digits are captured and stored in $2. The first part captures the first three digits of the phone number, and the second part captures the last four digits of the phone number, storing them in $3 and $4, respectively. Checks for a dash, which the user may or may not enter. For the year, we store two digits (\d\d) in $3. Formats and outputs the area code and phone number. Work similarly to parse the time and format it for output. The ? checks for (at most) one digit in the beginning, this digit being 0, 1 or 2. This may not occur at all. Checks for the day of the month and stores it in $2. Formats and outputs the date.

Chapter 8 – Regular Expressions

Similar presentations

Presentation on theme: "Chapter 8 – Regular Expressions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 8 – Regular Expressions

Similar presentations

Presentation on theme: "Chapter 8 – Regular Expressions"— Presentation transcript:

Similar presentations

About project

Feedback