LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28
Today’s Lecture regexp: recap hands on introduction to Perl –follow along with your laptop –do the background reading practice writing Perl –Homework 1 will be out on Thursday
Background Reading Perl Quick Intro – Perl Regular Expressions (RE) –perlrequick - Perl regular expressions quick startperlrequick –perlretut - Perl regular expressions tutorialperlretut
regexp: Recap Repetition abbreviations: –a exactly one a –a? a optional –a* zero or more a’s –a+ one or more a’s –a{n,m} between n and m a’s –a{n,} at least n a’s –a{n} exactly n a’s Metacharacters: –{}[]()^$.|*+?\ –may be escaped using by prefixing the metacharacter with backslash (\) Concatenation –two regexps may be concatenated to form a new regexp Disjunction –infix operator: | (vertical bar) –[set of characters] match one of the characters –[^set of characters] don’t match any of the characters –[char1-char2] dash (-) shorthand for a range of characters (ASCII)
regexp: Recap Range Abbreviations: –period (.) stands for any character (except newline) –\d (digit) = [0-9] –\s (whitespace character) = space (SP), tab (HT), carriage return (CR), newline (LF) or form feed (FF) –\w (word character) = [0-9a-zA-Z_] –uppercase versions, e.g. \D and \W denote negation... Line-oriented metacharacters: –caret (^) at the beginning of a regexp string matches the “beginning of a line” –dollar sign ($) at the end of a regexp string matches the “end of the line” Word-oriented metacharacters: –a word is any sequence of digits [0-9], underscores (_) and letters [a-zA-Z] –\b matches a word boundary
Perl we’re going to use the regexp facility built into Perl
Perl Run from the command line in Windows –Start > Run... –cmd (brings up command line interpreter) Running a Perl program: –perl -help (gives options) –perl filename.pl (runs Perl command file filename.pl ) –perl filename.pl inputfile.txt (runs Perl command file filename.pl, inputfile.txt is supplied to filename.pl ) e.g. filename.pl reads and processes input file inputfile.txt
Perl Example Perl program ( match.pl ) to read in a text file and print lines matching a regexp enclosed by /.../ Example input file ( text.txt ) Command perl match.pl text.txt open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ( ) { print $_ if (/The/); } This is a test. The cat sat on the mat. These shoes are made for walking. Otherwise, I thought it was cold. 45
Perl Program: open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ( ) { print $_ if (/The/); } –while ( ) first evaluates – reads in a line from the file referenced by F and places the line in the program variable $_ –then it executes the program code between the curly braces –then it goes back and reads another line –it does this repeatedly while produces a valid line – if we reach the end of the file, the while loop stops –print $_ if (/.../); is conditional code that means print the contents of variable $_ if the regexp between the /.../ can be found in $_ Program explained: – open the file referenced in $AGRV[0] for input – $AGRV[0] is the first command line argument following the program name – F is the file descriptor associated with the opened file – if there is a problem opening the file, e.g. file doesn’t exist, program execution dies and prints the value of the string enclosed in double quotes "$ARGV[0] not found!\n"
More Perl Reference: – /perlintro.html
More Perl Variables: –always prefixed by $ –e.g. $count, $i Assignment and arithmetic expressions: –e.g. –$count = 0; –$count = $count + 1; –$count++; (auto-increment) Arithmetic operators: + addition - subtraction * multiplication ** exponentiation / division
More Perl Variables and strings: –$i = “this”; –$i = $i. “ moment”; –. is the string concatentation operator Printing: –print $count; –print “Count: “, $count, “\n”; means print the string “Count: “ followed by the value of the variable $count followed by a newline –, is a separator –\n is the newline character
More Perl Conditionals: –if ($count < 1000) {... } –infix version –{... }if ($count != 0) –if-then-else version –if ($count == 1000) {... } else {...} Numeric comparisons: == equality != inequality < less thanless > greater than <= less than or equalless >= greater than or equal String comparisons: eq equality ne inequality lt less thanless gt greater than le less than or equalless ge greater than or equal