Download presentation
Presentation is loading. Please wait.
Published byNyah Atwater Modified over 9 years ago
1
An Introduction to Perl with Applications in Web Page Scraping
2
What is Perl? Practical Extraction and Report Language High Level General purpose Interpreted, dynamic programming language Borrows from Unix shell scripting languages Ideal for “small” tasks which involve text processing
3
What is going to be taught during this workshop? Most of this presentation takes from the www.perl.com introductionwww.perl.com Perl language constructs Variables Flow control String processing File I/O Subroutines Object oriented Perl Application: Web page scraping
4
Hello World > perl -e 'print "hello world\n"' hello world > perl -e 'print "hello ", "world\n"' hello world > perl -e "print 'hello ', 'world\n'" hello world\n>
5
Scalars Single things Number String $fruitCount=5; $fruitType='apples'; $countReport = "> There are $fruitCount $fruitType"; print $count_report; > There are 5 apples
6
Scalars continued $a = "8"; $b = $a + "1"; print “> $b\n”; > 9 $c = $a. "1"; print “> $c\n” > 81
7
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Even more scalar examples* $a = 5; $a++; # $a is now 6; we added 1 to it. $a += 10; # Now it's 16; we added 10. $a /= 2; # And divided it by 2, so it's 8.
8
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Arrays Lists of scalars @months = ("July", "August", "September"); print $months[0]; #This prints "July". $months[2] = "Smarch"; If an array doesn't exist you'll create it when you try to assign a value to one of its elements. $winterMonths[0] = "December"; #This implicitly #creates @winterMonths.
9
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Arrays continued If you want to find the last index of an array, use: print “> $#months\n”; > 2 If the array is empty or doesn't exist, -1 is returned You can also resize a list $#months=0 #Now months only contains “July”
10
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Hashes Map a key to a value %daysInMonth = ( "July" => 31, "August" => 31, "September" => 30 ); print “> $daysInMonth{'September'}\n”; > 30 To add a new key and value, $daysInMonth{"February"} = 28;
11
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Hashed continued Getting the key values print “>”. keys(%daysInMonth). “\n”; > 3
12
For loops print “> “; for ($i=0; $i <= 5; $i++) { print “I can count to $i\n”; } print “\n”; > 0 1 2 3 4 5
13
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. For loops Iterating over a list print “> “; for $i (5, 4, 3, 2, 1) { print "$i "; } print “\n”; > 5 4 3 2 1
14
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. For loops continued @one_to_ten = (1.. 10); $top_limit = 25; for $i (@one_to_ten, 15, 20.. $top_limit) { print "$i\n"; }
15
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. One more for loop for $marx ('Groucho', 'Harpo', 'Zeppo', 'Karl') { print "> $marx is my favorite Marx brother.\n"; } > Groucho is my favorite Marx brother. > Harpo is my favorite Marx brother. > Zeppo is my favorite Marx brother. > Karl is my favorite Marx brother.
16
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. While loop my $count = 0; print “> “; while ($count != 3) { $count++; print "$count "; } print “\n”; > 1 2 3
17
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Until loop $count=3; print “> “; until ($count == 0) { $count--; print "$count "; } print “\n”; > 2 1 0
18
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. if/elsif/else if ($a == 5) { print "It's five!\n"; } elsif ($a == 6) { print "It's six!\n"; } else { print "It's something else.\n"; }
19
*Shameless taken from http://www.perl.com/pub/a/2000/10/begper l1.html. Unless unless ($pie eq 'apple') { print "Ew, I don't like $pie flavored pie.\n"; } else { print "Apple! My favorite!\n"; }
20
Comparing unless and if print "I'm burning the 7 pm oil\n" unless $day eq 'Friday'; print “I'm burning the 7pm oil\n” if not ($day eq 'Friday');
21
String operations $yes_no = 'no'; print “> affirmative\n” if $yes_no == 'yes'; > affirmative Strings are automatically converted to numbers for operations like '==' Use eq instead of == for this to work correctly
22
More string comparisons my $five = 5; print "> Numeric equality!\n" if $five == " 5 "; print "> String equality!\n" if $five eq "5"; > Numeric equality > String equality print "> No string equality!\n" if not($five eq " 5"); > No string equality
23
substr $greeting = "Welcome to Perl!\n"; print “> “.substr($greeting, 0, 7).”\n”; > Welcome print “> “, substr($greeting, 7) ”\n”; > to Perl! print “> “, substr($greeting, -6, 6), “>”; > Perl! >
24
substr continued my $greeting = "Welcome to Java!\n"; substr($greeting, 11, 4) = 'Perl'; # $greeting is now "Welcome to Perl!\n"; substr($greeting, 7, 3) = ''; #... "Welcome Perl!\n"; substr($greeting, 0, 0) = 'Hello. '; #... "Hello. Welcome Perl!\n";
25
split my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting); # Three items: "Hello.", "Welcome", "Perl!\n" my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting, 2); # Two items: "Hello.", "Welcome Perl!\n";
26
join my @words = ("Hello.", "Welcome", "Perl!\n"); my $greeting = join(' ', @words); # "Hello. Welcome Perl!\n"; my $andy_greeting = join(' and ', @words); # "Hello. and Welcome and Perl!\n"; my $jam_greeting = join('', @words); # "Hello.WelcomePerl!\n";
27
Reading from a file This is a test test.txt
28
Reading from a file continued open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; while ($line= ){ print “> “, $line; } > This > is > a > test
29
chomp open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; print “> “; while (chomp($line= )){ print “$line “; } print “\n”; > This is a test
30
Writing to a file open my $overwrite, '>', 'overwrite.txt' or die "error trying to overwrite: $!"; # Wave goodbye to the original contents. open my $append, '>>', 'append.txt' or die "error trying to append: $!"; # Original contents still there; add to the end of the file
31
Subroutines sub multiply{ my (@ops) = @_; my $ret = 1; for $val (@ops) { $ret *= $val; } return $ret; } print "> ",multiply(2.. 5), "\n"; > 120
32
Programming with objects An objects is a programmer defined data structure which encapsulates Data Behavior (methods) A web browser object may have Data The current page A history of recently visited URL Behavior Can navigate to a page Can display a page
33
An Application: Scraping Web Pages
34
References Beginners introduction to Perl http://www.perl.com/pub/a/2000/10/begperl1.html http://www.perl.com/pub/a/2000/10/begperl1.html Perl Mechanize Library Documentation http://search.cpan.org/dist/WWW-Mechanize/ http://search.cpan.org/dist/WWW-Mechanize/ Schwartz, R.L and Phoeniz, T., Lerning Perl, 3 rd Edition, November 1993.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.