Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions CISC/QCSE 810. Recognizing Matching Strings ls *.exe translates to "any set of characters, followed by the exact string ".exe" The.

Similar presentations


Presentation on theme: "Regular Expressions CISC/QCSE 810. Recognizing Matching Strings ls *.exe translates to "any set of characters, followed by the exact string ".exe" The."— Presentation transcript:

1 Regular Expressions CISC/QCSE 810

2 Recognizing Matching Strings ls *.exe translates to "any set of characters, followed by the exact string ".exe" The "*.exe" is a regular expression ls gets a list of all files, and then only returns those that match the expression "*.exe"

3 In Perl In Perl, can see if strings match using the =~ operator $s = "Cat In the Hat"; if ($s =~ /Cat/) { print "Matches Cat"; } if ($s =~ /Chat/) { print "Matches Chat"; }

4 Common references \wCharacters in words\WNon-word character \sSpace, tab\SNon-whitespace character \dMatch a digit\DNon-digit match \nNewline\tTab.Any character ^Start of string$End of string Modifiers *0 or more occurences{n}Exactly n matches {n,}n or more matches{n,m}Match n to m matches Character Groups [a-z][xyz] [0-9A-Z][\w_] [^a-z]NOT a-z

5 Exercise 1 Write a regexp that matches only on Canadian postal codes

6 Exercise 2 Write a regexp that matches typical intermediate files (.o,.dvi,.tmp) helpful if you want a systematic way to delete them

7 String Substitution Found an input file (*.dat), looking for a matching output file (.out) @input_files = foreach $input_file (@input_files) { # Copy to output name $output_file = $input_file; # replace.dat with.out $output_file =~ s/.dat/.out/; if (! -f $output_file) { print "Need to create output for $output_file\n"; }

8 Translating $s = "Alternate Ending"; $s =~ tr/[a-z]/[A-Z]; Can also use 'uc' and 'lc' (more generic for non-English languages)

9 Grabbing Substrings Get root URL $url = "http://www.mast.queensu.ca/~math224/Slides/Week_09/driven_spring2.m"; $url =~ /(www[\w.]*)/; $short_url = $1; print "Full URL: $url\n"; print "Site URL: $short_url\n";

10 End options s/a/A/g – global; swap all matches changes "aaaba" to "AAAbA" Compare with s/a/A/ changes "aaaba" to "Aaaba" /tmp/i - case insensitive recognizes "tmp", "Tmp", "tMP", "TMP"…

11 Exercise Write a regexp line that returns all the integers in the text Can it be extended to handle floating point values?

12 Functions with Regex split split /\s+/, $line; split /,/, $line; split /\t/, $line split //, $line; grep @v = qw( aaa bba bbc); @matches = grep /bb/, @v;

13 Longer example – Log files Parsing log files 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/new.gif HTTP/1.1" 200 926 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/update.gif HTTP/1.1" 200 971 proxy.skynet.be - - [25/Mar/2003:02:40:54 -0800] "GET /gcs/gc1hint.html HTTP/1.1" 200 16358 j3194.inktomisearch.com - - [25/Mar/2003:03:13:12 -0800] "GET /~gcs/K-12.html HTTP/1.0" 200 3235 kittyhawk.hhmi.org - - [25/Mar/2003:03:17:20 -0800] "HEAD /gcs/ HTTP/1.0" 200 0 j3104.inktomisearch.com - - [25/Mar/2003:03:54:43 -0800] "GET /gcs/pa.html HTTP/1.0" 200 5614 crawl11-public.alexa.com - - [25/Mar/2003:04:51:41 -0800] "GET /gcs/clinical.html HTTP/1.0" 200 20132 … livebot-65-55-208-64.search.live.com - - [24/Jul/2007:22:16:58 -0700] "GET /gcs/webstats/usage_200602.html HTTP/1.0" 200 128720 203.129.234.42 - - [24/Jul/2007:22:22:39 -0700] "GET /gcs/status/statuscheck.html HTTP/1.1" 200 1522624 livebot-65-55-208-65.search.live.com - - [24/Jul/2007:22:47:32 -0700] "GET /gcs/webstats/usage_200610.html HTTP/1.0" 200 132580 …

14 Alternate uses If you write your own program, with many print statements, can 1. make print statements meaningful  "Time spent on loading: 23.5s" 2. can parse afterwards to process/store values  $line = m/: ([\d.])+s/;  $time = $1;

15 Resources Any web search for "perl regular expression tutorial" Perl reg exp by example http://www.somacon.com/p127.php Reference card http://www.erudil.com/preqr.pdf Perl site reference http://perldoc.perl.org/perlre.html


Download ppt "Regular Expressions CISC/QCSE 810. Recognizing Matching Strings ls *.exe translates to "any set of characters, followed by the exact string ".exe" The."

Similar presentations


Ads by Google