A Stroll through Perl (R L Schwartz & T Christiansen, O’Reilly) PERL = Practical Extraction and Report Language. A major strength of Perl is the recognition.

Slides:



Advertisements
Similar presentations
Arrays A list is an ordered collection of scalars. An array is a variable that holds a list. Arrays have a minimum size of 0 and a very large maximum size.
Advertisements

Ruby (on Rails) CSE 190M, Spring 2009 Week 2. Arrays Similar to PHP, Ruby arrays… – Are indexed by zero-based integer values – Store an assortment of.
Adv. UNIX:Perl/81 Advanced UNIX v Objectives of these slides: –introduce Perl (version ) –mostly based on Chapter 1, Learning Perl
CSC 4630 Perl 1. Perl Practical Extraction and Support Language A glue language under UNIX Written by Larry Wall Claimed to be the most portable of scripting.
Lecture 2 Introduction to C Programming
Introduction to C Programming
Programming Perls* Objective: To introduce students to the perl language. –Perl is a language for getting your job done. –Making Easy Things Easy & Hard.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
Introduction to Perl. How to run perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Your program/script.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Guide To UNIX Using Linux Third Edition
Introduction to C Programming
Perl - Advanced More Advanced Perl  Functions  Control Structures  Filehandles  Process Management.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
CPS120: Introduction to Computer Science Decision Making in Programs.
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
CPTG286K Programming - Perl Chapter 7: Regular Expressions.
Introduction to Unix – CS 21
BY A Mikati & M Shaito Awk Utility n Introduction n Some basics n Some samples n Patterns & Actions Regular Expressions n Boolean n start /end n.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introduction to Perl “Practical Extraction and Report Language” “Pathologically Eclectic Rubbish Lister”
5 1 Data Files CGI/Perl Programming By Diane Zak.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Topic 4:Subroutines CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 4, pages 56-72, Programming Perl 3rd edition pages 80-83,
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
 2001 Prentice Hall, Inc. All rights reserved. Chapter 7 - Introduction to Common Gateway Interface (CGI) Outline 7.1Introduction 7.2A Simple HTTP Transaction.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
CSC 4630 Meeting 17 March 21, Exam/Quiz Schedule Due to ice, travel, research and other commitments that we all have: –Quiz 2, scheduled for Monday.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
CSI605 perl. Perl Facts Perl = Pathologically Eclectic Rubbish Lister Perl is highly portable across many different platforms and operating systems Perl.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CPTG286K Programming - Perl Chapter 9: Misc. Control Structures Chapter 10: Filehandles & File tests.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Perl Subroutines User Input Perl on linux Forks and Pipes.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Linux Administration Working with the BASH Shell.
Built-In Functions. Notations For each function, I will give its name and prototype. –prototype = number and type of arguments ARRAY means an actual named.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 8.
Chapters 13 and 14 in Quigley's "UNIX Shells by Example"
Definition of the Programming Language CPRL
Chapter 5 - Control Structures: Part 2
Perl Variables: Array Web Programming.
Pemrosesan Teks Herika Hayurani Sept 19, 2006
Programming Perls* Objective: To introduce students to the perl language. Perl is a language for getting your job done. Making Easy Things Easy & Hard.
T. Jumana Abu Shmais – AOU - Riyadh
CIS 136 Building Mobile Apps
Presentation transcript:

A Stroll through Perl (R L Schwartz & T Christiansen, O’Reilly) PERL = Practical Extraction and Report Language. A major strength of Perl is the recognition and substitution of text sequences called regular expressions. This is useful for: Web searching - are the query keywords in this web page? Computation of frequencies in a document collection, e.g. to produce a stoplist, or mid-frequency terms for automatic indexing. Making finite state transducers e.g. pluraliser, stemmer, americanizer. Dialogue systems, e.g. ELIZA.

“Hello World” Program #!/usr/bin/perl -w print “Hello, world!\n”; The first line means “this is a Perl program”. -w tells Perl to generate warning messages. Apart from the first line, all Perl statements end with a semicolon ; To run a PERL program from UNIX: perl programname.pl comments: # anything from the hash sign to the end of the line is a comment

Scalar Variables Now get the “Hello, world” program to call you by your name. To do this, we need a place to hold the name, a way to ask for the name, and a way to get a response. One place to hold values (like a name) is as a scalar variable. Here we will use the scalar variable $name to hold your name. A scalar variable starts with $ and can hold either a single number or a string (sequence of characters).

print,, chomp The program needs to ask for the name (prompt): use the print function. The way to get a line from the terminal is with the construct, which grabs one line of input. We assign this input to the $name variable. This gives us the program: print “What is your name?”; $name = ; The value of $name has a terminating newline \n. To get rid of that, we use the chomp function chomp ($name); Now we can reply with: print “Hello, $name!\n”; (what does this do?)

Putting it all together we get: #!/usr/bin/perl -w print “What is your name?”; $name = ; chomp ($name); print “Hello, $name!\n”;

Adding Choices Let’s say we have a special greeting for Randal, but we want an ordinary greeting for anyone else. To do this, we need to compare the name that was entered with the string Randal, and if it’s the same, do something special. Let’s add a C-like if-then-else branch and a comparison to the program: #!/usr/bin/perl -w print “What is your name?”; $name = ; chomp ($name); if ($name eq “Randal”){ print “Hello Sir Randal!\n”; } else { print “Hello, $name!\n”; }

Guessing the Secret Password What does this code do? #/usr/bin/perl -w $secretword = “llama”; # the secret word print “What is the secret password?”; $guess = ; chomp($guess); while ($guess ne $secretword) { print “Wrong, try again:\n”; $guess = ; chomp($guess); } First, we define the secret word by putting it into another scalar variable, $secretword. The person is asked (using print) for a guess, which goes into $guess. The guess is compared with the secret word using the ne operator, which returns true if the strings are not equal (this is the logical opposite of the eq operator). The result of the comparison controls a while loop, which executes the block as long as the ne comparison remains true.

Arrays. We can store several secret words in sort of list, a data structure called an array. Each element of the array is a separate scalar variable that can be independently set or accessed. The entire array can also be given a value in one fell swoop. We can assign a value to the entire array so that it contains three possible good = (“camel”,”llama”,”alpaca”); = qw(camel llama alpaca) Note arrays begin while scalar variables begin with $. Once the array is assigned, we can access each element using a subscript reference. So $words[0] is camel, $words[1] is llama, and $words[2] is alpaca. The subscript can be an expression as well, so if we set $i = 2 then $words[$i] = alpaca. Note: array elements start with $ rather because they refer to a single element of an array rather than the whole array.

More than one Secret Word #/usr/bin/perl = qw (camel llama alpaca); print “What is the secret password?”; $guess = ; chomp($guess); $i = 0; $correct = “maybe”; while($correct eq “maybe”){ if($words[$i] eq $guess){ $correct = “yes”; } elsif ($i < 2){ $i = $i + 1; } else { print “Wrong, try again:”; $guess = ; chomp ($guess); $i = 0; –} } This program also shows the elsif block of the if-then-else statement. Perl doesn’t have C’s switch statement, so in Perl we tend to compare a set of conditions in a if-elsif-elsif-elsif-else type chain.

Hashes Giving each person a different secret word: The easiest way to store such a table in Perl is with a hash. Each element of the hash holds a separate scalar value (just like an array) but the hashes are referenced by a key, which can be any scalar value (string or number). To create a hash called %words (notice the % rather we can write: %words = qw( fredcamel barneyllama bettyalpaca wilmaalpaca ); To find the secret word for Betty, we need to use betty as the key in a reference to the hash %words, via some expression such as $words{“betty”} will return alpaca or $person = “betty”; $words{$person} will also return alpaca.

Trying to look up a word not in the hash When we look up someone’s secret word, if their name is not one of the hash keys, the value of $secretword will be an empty string, e.g: { instantiate %words, get $name first, then:} $secretword = $words{$name} if($secretword eq “”){ print “secret word not found\n”; } else { print “your secret word is $secretword”; }

Handling Varying Input Formats How do we make our password checker accept Randal, randal, or Randal L. Schwartz ? If ($name =~ /^Randal\b/i) { # yes, it matches } else { # no, it doesn’t } Notes: eq is for exact equality, =~ for pattern matching. The regular expression is delimited by forward slashes. /^Randal/ means any string starting with Randal. /^Randal\b/ means there must be a white space after Randal, so Randall is excluded. /^Randal\b/i means that we ignore case, so randal is accepted.

Two Text Converters We can write a case converter by using the translate operator. $name = tr/A-Z/a-z/; The slashes delimit the searched-for and replacement character lists. The hyphen stands for all the characters between A and Z, so the two lists are the same length (26 characters). We can replace the word Eurasia with Eastasia using the substitution operator. $temp =~ s/Eastasia/XXXX/; $enemy =~ s/Eurasia/Eastasia/; $ally =~ s/XXXX/Eurasia/;

Making it Modular Perl provides subroutines that have parameters and return values. A subroutine is defined once in a program, and can be used repeatedly by being invoked from any expression. Let’s create a subroutine called good_word that takes a name and a guessed word, and returns true if the word is correct and false if not: sub good_word { my($somename, $someguess) # name the parameters if ($words{$somename} eq $someguess { return 1; # true } else { return 0; # false }

Subroutines First, the definition of a subroutine consists of a reserved word sub followed by the subroutine name followed by a block of code { delimited by curly braces }. The definition can go anywhere in the program file, though most people put it at the end. The first line within this particular definition is an assignment that copies the values of the two parameters of this subroutine into two local variables named $somename and $someguess. The my()defines the two variables as private to the enclosing block - in this case the whole subroutine - and the parameters are initially in a special local array A return statement can be used to make the subroutine immediately return to its caller with the supplied value. Note that the subroutine assumes that the value of the %words hash is set by the main program.

Let’s Integrate this with the Rest of the Program #!/usr/bin/perl %words = qw{ fredcamel barneyllama bettyalpaca wilmaalpaca }; print “What is your name? “; $name = ; chomp($name); print “What is the secret word? “; $guess = ; chomp($guess); while (! good_word($name, $guess){ print(“Wrong, try again: ”); $guess = ; chomp($guess); } # insert definition of good_word here …

While, ! The while loop contains the subroutine good_word. Here we see an invocation of the subroutine, passing it two parameters, $name and $guess. Inside the subroutine, the value of $somename is set from the first parameter, $name, and the value of $someguess is set from the second parameter $guess. The value returned by the subroutine (either 1 or 0) is logically inverted with the prefix ! (logical not) operator. This expression returns true is the expression following is false, and returns false if the expression following is true. The overall meaning is “while it’s not a good word …”

Moving the Secret Word List into a separate file Suppose we wanted to share the secret word list among three programs, e.g. for simultaneous updating. We can put the word list into a file and then read the file to get the word list into the program. To do this, we need to create an I/O channel called a filehandle. Your Perl program automatically gets three filehandles called STDIN, STDOUT and STDERR. Now we want another handle attached to a file of our own choice. sub init_words { open (WORDSLIST, “wordslist”) || die “can’t open wordlist: $!; while ( defined ($name = )) { chomp ($name); $word = ; chomp ($word); $words{$name} = $word; } close (WORDSLIST) || die “couldn’t close wordlist: $!”; }

The (arbitrary) form of the word list fred camel barney llama betty alpaca wilma alpaca The open function initialises a filehandle named WORDSLIST by associating it with a file named wordslist in the current directory. while ( defined ($name = ) ) { i.e. while there are still values in the data file to read The die function is frequently used to exit the program with an error message in case something goes wrong, e.g. the word list file is not found. $! contains the system error message explaining what went wrong.

Three More Loops 1. To print out scalar variables: This example prints the numbers 1 to 10, each followed by a space: for ($i = 1; $i <= 10; $i++){ print “$i “; } The above code is very similar to C To print out the contents of an array: foreach { print “$somelist[$i]\n”; } The foreach statement takes a list of values and assigns them one at a time to a scalar variable, executing a block of code with each successive statement. 3. To print out the contents of a hash: foreach $key (keys(%freqhash)) { print “$key $freqhash{$key}\n”; }

Regular Expressions See Chapter 7 of “Learning Perl”, by R L Schwartz & T Christiansen, O’Reilly, A regular expression is a pattern to be matched against a string. e.g. is put found in computer? Succeeds Is michael found in computer? Fails Sometimes match success or failure is all you are concerned about. Other times you want to match and replace. e.g. Find put in computer and replace with pil. If the match is unsuccessful, nothing happens. $_ is Perl’s default variable – we don’t have to declare it.

Search, Substitution Print out every line in the file specified on the command line which contains abc: while (<>) { if(/abc/){ print $_; } Substitution. If abc is found in $_, replace it with def (g means every time). s/abc/def/g;

Patterns A regular expression is a pattern. Some parts of the pattern match single characters, others match multiple characters.. stands for any single character except \n (newline). /a./any two letter sequence that starts with a but is not a\n /[abcde]/ matches a, b, c, d, or e. (“character class”) /[a-zA-Z0-9_]/matches a Perl “word” character. /[^0-9]/any NON-digit (“negated character class”) character class abbreviations: \ddigit \Dnon-digit \wPerl “word”character \Wnot a Perl “word” character \sspace character (\r \t \n \f or “ “) All of the above match one character. We now look at “grouping patterns”: *zero or more of the immediately previous character or character class. +one or more of the immediately previous character ?zero or one of the immediately previous character.

Patterns are greedy by default $_ = “fred xxxxxx barney”; s/x+/boom/; now $_ = “fred boom barney” /x{3}/ would mean match against exactly xxx.

Parentheses as memory, anchoring patterns, alternation Parentheses as memory: abc*matches ab, abc, abcc, abccc, abcccc etc. (abc)*matches “”, abc, abcabc, abcabcabc etc. Anchoring patterns: /fred\b/;matches fred and alfred but not frederick /\bfred/;matches fred and frederick but not alfred /\bfred\b/;matches fred but not frederick and alfred. Alternation: (song|blue)birdmatches songbird or bluebird

Selecting a different target (the =~ operator) $a = “hello world” if($a =~ /he/) { # do something … $a =~ s/hello/goodbye/; Special read-only variables $_ = “this is a sample string”; /sam.le/; # matches “sample” within the string # $` is now “this is a” # $& is now “sample” # $’ is now “string” More substitutions $_ = “this is a test”; $new = “quiz”; s/test/$new/; # now $_ = “this is a quiz”

Basic Data Structures $scalar - single value or - list = qw(fred barney betty wilma); $array[2] = “betty”; foreach $member print “$flintstones [$member]; } %hash, e.g. frequency list %freq built up by: $freq{“the”} = 100; $freq{“chandelier”} = 1; $freq{$string} = 5; foreach $key {keys (%freq)) { # once for each key of %freq print “ $key was found $freq{$key} times\n”; # show key and value; }

Sorting: = qw(small medium = is (large medium = (15, 27, 9, 49, = is (14, 15, 27, 49, = (15, 27, 9, 49, = sort { $a $b is (9, 14, 15, 27, 49).

Sorting: hashes Sort by alphabetic order of keys, or numeric order of = sort by_names keys(%freqhash); sub by_names { return $a cmp $b; } foreach { print “$_ is found $freqhash{$_}times\n”; = sort by_number keys(%freqhash); sub by_number { return $freqhash{$a} $freqhash{$b}; } foreach { print “$_ is found $freqhash{$_}times\n”; }

Array of arrays (2D = { [ “fred”, “barney” ], [ “george”, “jayne”, “elroy” ], [ “homer”, “marge”, “bart” ], }; print $AoA[2][1]; # prints “marge” for $x (0.. 9) { for $y (0.. 9) { $AoA[$x][$y] = x * y; } while (<>) {# read in a line of = split; # split elements into a 1D array add 1D array as the next row of a 2D array } for $i (0.. $#AoA) # for each row in AoA $row = $AoA[$i]; # put row of 2D array into a 1D array - # note $ subscript even so for $j (0.. {# for each element of that 1D array print “element $i Sj is $AoA[$i][$j]\n”; }

Hashes of Hashes %HoH = ( flintstones => { husband => “fred”, pal => “barney”, }, jetsons => { husband => “george”, wife => “jane”, “his boy” => “elroy”, }, simpsons => { husband => “homer”, wife=> “marge”, kid => “bart”, }, ); To add another hash to the hash of hashes, you can simply say: $HoH{ mash } = { captain => “pierce”; major => “burns”; corporal => “radar”; };

Populating a Hash of Hashes Here is one technique for populating a hash of hashes. To read from a file with the following format: flintstones: husband=fredpal=barneywife=wilmapet=dino while ( <> ) { next unless s/^(.*?):\s//; #look for characters from start of line to colon $who = $1; # $1 is first parenthesised part of reg exp for $field (split) { #for each other Perl word in the input line ($key, $value) = split /=/, $field # cut each key=value pair at = $HoH{$who}{$key} = $value; }

To set a key/value pair, and print out a hash of hashes You can set a key/value pair of a hash of hashes as follows: $HoH{flintstones}{wife} = “wilma”; To print out the families, loop through all the keys of the outer hash nd then looping through the keys of the inner hash: for $family ( keys %HoH) { print “$family: “; for $role (keys %{ $HoH{$family} } ) { print “$role=$HoH{$family}{$role} “; } print “\n”; }

More advanced data structures Also possible: Arrays of hashes, hashes of arrays, hashes of functions and more elaborate records. See chapter 9 of “Programming Perl” by Larry Wall, Tom Christiansen & Jon Orwant, O’Reilly, 3rd edition.

ELIZA (1) Substitutions may use memory e.g. /the (.*)er they were, the \1er they will be/ will match the bigger they were, the bigger they will be but not the bigger they were, the faster they will be. Substitutions using memory are very useful in implementing a simple natural-language understanding program like ELIZA (Weizenbaum, 1966), which could carry on conversations like the following:

ELIZA (2) User: Men are all alike. ELIZA: IN WHAT WAY User: They’re always bugging us about something or other. ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE User: Well, my boyfriend made me come here. ELIZA: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time. ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED. ELIZA works by searching the user’s sentence for regular expressions and substituting them, e.g. s/my/YOUR/ and s/I’m/YOU ARE/, and then: s/.* YOU ARE (depressed|sad).*/I AM SORRY TO HEAR YOU ARE \1/ s/.* always.*/CAN YOU THINK OF A SPECIFIC EXAMPLE/

ELIZA (3) ELIZA works by searching the user’s sentence for regular expressions and substituting them, e.g. s/my/YOUR/ and s/I’m/YOU ARE/, and then: s/.* YOU ARE (depressed|sad).*/I AM SORRY TO HEAR YOU ARE \1/ s/.* always.*/CAN YOU THINK OF A SPECIFIC EXAMPLE/