LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong
Administrivia Homework 1 (from lecture 3) – was due last night (at midnight)
Today’s Topics Review – Homework 1 – We’ll go through it in class today Chapter 2 of JM – Section 2.1 on regular expressions – (which you’ve already read…)
Safari Book available online (Thanks! Don Merson) UA Library has been given access to the full Safari Books Online service. This allows you to read a vast number of technical books via your browser. However, it is currently only a trial.
Homework Review Question 1: 438 and 538 (7 points) – Given = (I, saw, the, the, cat, on, the, mat); = (the, cat, sat, on, the, mat); – Write a simple Perl program which detects repeated words (many spell checker/grammar programs have this capability) – It should print a message stating the repeated word and its position if one exists – e.g. word 3 “the” is repeated in the case of sentence1 – No repeated words found in the case of sentence2 – note: output multiple messages if there are multiple repeated words – Hint: use a loop – Submit your Perl code and show examples of your program working
Homework Review Thinking algorithmically… w1w1 w2w2 w3w3 w4w4 w5w5 Compare w 1 with w 2 Compare w 2 with w 3 Compare w 3 with w 4 Compare w 4 with w 5
Array indices start from 0… Homework Review Turning an algorithm into Perl code: Compare w 1 with w 1+1 Compare w 2 with w 2+1 Compare w n-2 with w n-2+1 Compare w n-1 with w n “for” loop implementation words 0,words 1 … words n-1 for ($i=0; $i<$#words; $i++) { compare word indexed by $i to word indexed by $i+1 if same string, print message } Array indices end at $#words…
Homework Review First iteration (there are many ways to do this…) – (the basic for-loop) = (I, saw, the, the, cat, on, the, mat); = (the, cat, sat, on, the, mat); for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n" }
Homework Review 2 nd iteration – (setting a flag when a repeated word is found) – (condition the output based on the value of the flag) my $flag = 0; for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1 } print "No words repeated\n" unless $flag
Homework Review 3 rd iteration – (encapsulating the loop in a subroutine) sub check_repeated { my $flag = 0; for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1 } print "No words repeated\n" unless $flag } print print print print
Homework Review Question 2: 438 and 538 (3 points) – Describe what would it take to stop a repeated word program from flagging legitimate examples of repeated words in a sentence – (No spell checker/grammar program that I know has this capability) – Examples of legitimately repeated words: I wish that that question had an answer Because he had had too many beers already, he skipped the Friday office happy hour
Homework Review Question 3: 538 (10 points), (438 extra credit) – Write a simple Perl program that outputs word frequencies for a sentence – E.g. given = (I, saw, the, cat, on, the, mat, by, the, saw, table); – output a summary that looks something like: – the occurs 4 times – saw occurs twice – I, car, mat, on, by, table occurs once only – Hint: build a hash keyed by word with value frequency – Submit your Perl code and show examples of your program working
Homework Review Thinking algorithmically… w1w1 w2w2 w3w3 w4w4 w5w5 w0w0 foreach $word w0w0 hash data structure = “labeled medicine cabinet”
Homework Review Sample = (the, cat, sat, on, the, mat, that, the, cat, likes, most); %freq = (); foreach $word { if (exists $freq{$word}) { $freq{$word}++; } else { $freq{$word} = 1; } foreach $word (keys %freq) { print "$word occurs $freq{$word} time(s)\n"; } perl e2.prl on occurs 1 time(s) the occurs 3 time(s) cat occurs 2 time(s) most occurs 1 time(s) sat occurs 1 time(s) likes occurs 1 time(s) that occurs 1 time(s) mat occurs 1 time(s) perl e2.prl on occurs 1 time(s) the occurs 3 time(s) cat occurs 2 time(s) most occurs 1 time(s) sat occurs 1 time(s) likes occurs 1 time(s) that occurs 1 time(s) mat occurs 1 time(s) Further simplifications to the code are possible but the basic logic remains
Chapter 2: JM Today – using your Perl skills on – Section 2.1 Regular Expressions – Online tutorials
Pattern Matching JM, Chapter 2, pg 17 Merriam-Webster online
Chapter 2: JM Perl regular expression (re) matching: – $a =~ /foo/ – /…/ contains a regular expression – will evaluate to true/false depending on what’s contained in $a Perl regular expression (re) match and substitute: – $a =~ s/foo/bar/ – s/…match… /…substitute… / contains two expressions – will modify $a by looking for a single occurrence of match and replacing that with substitute – s/…match… /…substitute… /g global match and substitute
Chapter 2: JM Most useful with code for reading in a file line-by-line: open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = ) { do RE stuff with $line }
Chapter 2: JM
Sheeptalk
Chapter 2: JM
Precedence of operators – Example: Column 1 Column 2 Column 3 … – /Column [0-9]+ */ – /(Column [0-9]+ *)*/ – /house(cat(s|)|)/ Perl: – In a regular expression the pattern matched by within the pair of parentheses is stored in $1 (and $2 and so on) Precedence Hierarchy:
Chapter 2: JM A shortcut: list context for matching
Chapter 2: JM s/([0-9]+)/ /