Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga.

Similar presentations


Presentation on theme: "Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga."— Presentation transcript:

1 Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

2 Overview Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 2  About programming  Why Perl?  How to write, how to run  Variables  Operations  Basic input and output  Conditionals and loops  Regular expressions

3 About programming Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 3  Working with algorithms  Program needs to contain exact commands  (Mostly) not: Go buy some bread  But: Put on your coat and shoes, open the door, go through it, close the door, go down the stairs…  Has a certain input  Processes it  Produces a certain output

4 Why Perl? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 4  Easy to learn  Simple syntax  Good at manipulating text  Good at dealing with regular expressions

5 How to write a Perl program Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 5  Perl programs can be written in any text editor  Notepad, vim, even Word…  Recommended: A simple text editor with syntax highlighting  Write the program code  Save the file as xxx.pl .pl extension not necessary, but useful

6 What is a Perl program like? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 6 # This *very* simple program prints "Hello World!“ print "Hello World!";

7 What is a Perl program like? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 7  The content of a line after the # is commentary. It is ignored by the program  What are commentaries for, then?  They are for you, and others who will have to read the code  Imaging looking at a complex program in a few months and trying to figure out what it does  Write as much commentary as you can # This *very* simple program prints "Hello World!“ print "Hello World!";

8 What is a Perl program like? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 8  This is a Perl command  In this case, for printing text on the screen  Every command should start at a new line  Not a Perl requirement, but crucial for readability  Every command should end with a semicolon;  Many commands take arguments  Here: “Hello World!” # This *very* simple program prints "Hello World!“ print "Hello World!";

9 What to do with the program? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 9  Perl works from the command line  Windows: „Start“  „Run…“  Go to the directory where you saved the program  E.g.: cd C:\Perl\MyPrograms  Run the program:  perl myprogram.pl  See the results of your labours!

10 Exercise (1) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 10  Create a folder for your Perl programs  Open the editor of your choice and write the „Hello World“ program  The command is print „Hello World!“;  Don‘t forget the commentary!  Save the program  Run it!  What happens if you misprint the print command?

11 Variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 11  The „Hello World“ program always has the same output  Not a very useful program, as such  We need to be able to change the output  Variables are objects that can hold different values

12 Defining variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 12  To define a variable, write a dollar sign followed by the variable’s name  Names should consist of letters, numbers and the underscore  They should start with a letter  Variable names are case-sensitive!  $a and $A are different variables!  Generally, a variable’s name should tell you what the variable does # We define a variable „a“ and assign it a value of „42“ $a = 42;

13 Defining variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 13  Variables can be assigned values  String: text (character sequence) in quotes/double quotes  Numbers  $a = 42;  $a = “some text”; # We define a variable „a“ and assign it a value of „42“ $a = 42;

14 Changing variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 14  Arithmetic operations  $a = 42 / 2;# division  $a = 42 + 5;# addition  $a = $b * 2;# multiplication  $a = $a - $b;# subtraction  Also useful:  $a += 42;# the same as $a = $a + 42;  The same for +, -, /  String operations  $a = “some“. “ text“;# concatenation  $a = $a. “ more text“;

15 Basic output Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 15  We have already seen an output command  print “text“;  print $a;  print “text $a“;  print “text “. $a+$b. “ more text.“;  Special characters:  \n – new line  \t – tabulator

16 Exercise (2) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 16  Define a variable  Assign it a value of 15  Print it  Double the value  Print it again  Define another variable with the string „apples“  Print both variables  Change the first variable to its square and the second to „pears“  Print both variables

17 Basic input Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 17  The <> operator returns input from the standard source (usually, the keyboard)  Syntax:  $a = <>;  Don’t forget to tell the user what he’s supposed to enter!  Try the following program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; print "Hello $name!";

18 Input, output and new lines Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 18  As the user input is followed by the [Enter] key, the string in $name ends in a new line  The chomp function deletes the new line at the end of a string  Try the following, modified program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; chomp($name); print "Hello $name!";

19 Exercise (3) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 19  Let the user enter the radius of a circle  Tell him the diameter (2r), circumference (2 π r) and area ( π r²) of the circle  Try doing this using one variable for each measure  Try doing this using only one variable

20 If, else Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 20  Until now, the course the program runs is fixed  The if clause allows us to take different actions in different circumstances # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; }

21 If, else Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 21  Note: = is the assignment operator, == is the comparison operator  Else is an optional operator triggering if the if condition fails # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; }

22 Exercise (4) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 22  Try out the password program.  Why doesn‘t it work correctly? Fix it.  Tell the user if the number he entered is too large or too small  Hint: The comparison operators you’ll need are  Ask the user for a geometrical form (circle or square), and then for a radius or side length. Return the area and perimeter.

23 While Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 23  What if we want to do checks until something happens?  The while loop repeats commands until its criteria are met  Note: in the example below, $password has no value, so it specifically doesn’t have the value 42 # Now on to a "while" loop while ($password != 42) { print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome.";

24 Exercise (5) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 24  Write a small game: take a number, and make the user guess it. Tell him if it‘s too high or too low. If the user gets it right, the program terminates.  If you like, you can take a random number: $random = int (rand(10) );

25 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 25  Regular expressions very useful for text processing  Perl matching character: =~  Perl non-matching character: !~  The regular expression must be in backslashes: /regex/  The program below accepts any password that contains the characters „42“ anywhere # A "while" loop with regular expressions while ($password !~ /42/) { # While the entered line doesn’t contain “42” print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome.";

26 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 26  Simple string: some text  One of a number of symbols: [aA]  Matches a or A  Also possible: [tT]he, matching the or The  One of a continuous string of symbols: [a-h][1-8]  Matches any two-character string from a1 to h8  Special characters  ^ matches the beginning of a line  $ matches the end of a line

27 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 27  More special characters  Wildcard: the dot. Matches any single character  b.d matches bad, bed, bid, bud…  Don‘t forget: it also matches forbid, badly…  + matches one or more of the previous character  re+d matches red and reed (and also reeed and so on!)  * matches zero or more occurrences of the previous character  bel* matches be, bel and bell (and belll…)  ? matches zero or one occurrences of the previous character  soo?n Matches son or soon

28 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 28  Character classes  \d: digits  Rule \d+ matches Rule 1, Rule 2,..., Rule 334...  \w: “word characters” – letters, digits, _  \w \w – any two “words” separated by a blank  \s: any whitespace (blanks, tabs)  ^\s+\d – any line where the first character is a digit  Capitalize the symbols to get the opposite  \S is anything but whitespace, \D are non-digits…

29 Exercise (6) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 29  Write a program which asks the user for his e-mail address.  Check if the address is syntactically correct.  Possible rules:  Must contain an @ character  At least one symbol before it  Must contain a dot  At least two symbols between @ and.  At least two symbols after.  No fancy symbols like {§*  Do you accept addresses with more than one dot?

30 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 30  Switches  Tell Perl how to deal with the regular expression  /regex/i: ignore lower/upper case  /wiebke/i matches Wiebke and wiebke  s/regex/regex2/: substitute regex with regex2  $text =~ s/Mark/Euro/  /regex/g: repeat match until end of the line # What the //g switch does $text = “The meat costs 10 Mark, the fish costs 15 Mark.”; $text2 = $text1; $text =~ s/Mark/Euro/; # “The meat costs 10 Euro, the fish costs 15 Mark.” $text2 =~ s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.”

31 Perl regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 31  Grouping  Allows us to use matched string  /(text)/ matches text and stores it in a variable  The first group is stored in $1, the second in $2... # Substitution and grouping $sum = 0; # initializing the variable with zero $text = “The meat costs 10 Mark, the fish costs 15 Mark.” while ($text =~ s/(\d+) Mark/$1 Euro/) { # numbers-spaces-”Mark” $sum = $sum + $1; # adding amount to $sum value } print “Substituted $sum Mark for Euro!”;

32 Reading files Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 32  What if we want to have input from a file, not from the user?  Open file for reading:  open(INPUT, "<file.ext");  Read a line:  $line = ;  $line = <>; # is just a special case

33 Writing files Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 33  What if we want to print to a file, not to the screen?  Open file for writing:  open(OUTPUT, “>file.ext");  Write:  print OUTPUT “Some text...”;

34 Reading files Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 34  A program for testing e-mail addresses  Note: If we want to use a special character literally, we need to escape it with a backslash  In strings : "  In regular expressions:. + * ^ $ and the backslash \ itself open(INPUT, "<test.txt"); while ($line = ) { chomp($line); if ($line =~ /^.+@..+\...+$/) { # testing for e-mail: x@xx.xx print "\"$line\" is a valid e-mail address.\n"; } else { print "E-mail address \" $line\" not valid.\n"; }

35 Exercise (7) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 35  Make a text file and fill it with a Wikipedia article  Count the number of definite and indefinite articles  Count the number of numbers and digits  Insert a tag before every number

36 Arrays Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 36  Arrays contain lists of variables  Syntax:  @days = [“Monday“, “Tuesday“, “Friday“];  $days[0] = “Saturday“;  $day = $days[2];  Useful for storing linear sequences of variables  Note: @ for whole lists, $ for single variables

37 Arrays Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 37  Useful array commands  push(@array, “element“);  Adds a new element to the end of the array  Creates the array if necessary  $element = pop(@array);  Moves the last value of @array to $element # Trying out arrays @tags = (“N”, “V”, “Adj”); $tag1 = pop(@tags);# $tag1 is now “Adj”, @tags is (“N”, “V”) $tag2 = pop(@tags);# $tag2 is now “V”, @tags is (“N”) Push(@tags, „V“, $tag2);# @tags is now again (“N”, “V”, “Adj”)

38 Hashes Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 38  Hashes are associative arrays  They are lists where the elements are not ordered, but identified by a „name“  Syntax:  %probability = (”verb“, 0.32, “adjective“, 0.02, “adverb“, 0);  $probability{“noun”} = 0.52;

39 Exercise (7) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 39  What happens if you try to print an array?  What about a hash?  What happens if you convert an array into a hash, or the other way round?

40 Practical: Tokenizer Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 40  Take a Wikipedia article and put it into a text file  Clean it up if necessary  Tokenize it!  We only want one word per line  Insert a „sentence boundary“ symbol where appropriate  The output should be another file  Think about what choices you make and why!

41 Practical: Tagger Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 41  Take the POS-annotated corpus from treebank.txt  Clean and tokenize it  Count the tag-token probabilities  Count the transition probabilities  For the first time, I strongly recommend bigrams  Apply the Viterbi algorithm and tag an input file of your choice!

42 Practical: Tagger++ Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin 42  If it‘s still too easy, or if you want a long-term aim:  Implement smoothing: words can have tags you haven‘t seen them with, or appear in contexts you never saw them before  Try to figure out a way to guess the tags for unknown words better  Write a program to train on 9/10 of the corpus, and test it on the rest.  Compare your results to the actual annotations  Do this 10 times for every 9/10  Still too easy? Implement trigrams and compare the results.


Download ppt "Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga."

Similar presentations


Ads by Google