2001 Prentice Hall, Inc. All rights reserved. Chapter 17 - Web Automation and Networking Outline 17.1Introduction 17.2Introduction to LPW 17.3 LPW Commands 17.4The LPW::Simple Module 17.5HTML Parsing 17.6Introduction to Advanced Networking 17.7Protocols 17.8Transport Control Protocol (TCP) 17.9Simple Mail Transfer Protocol (SMTP) 17.10Post Office Protocol (POP) 17.11Searching the World Wide Web
2001 Prentice Hall, Inc. All rights reserved Introduction Perl –Internet-based language –Used to create CGI scripts –Web-related modules –Automated tasks
2001 Prentice Hall, Inc. All rights reserved Introduction to LPW LWP –Library for the WWW in Perl Common use: mimic browser request of a Web page –Request object –method »One of get, put, post or head –URL »Address of request item –headers »Key-value pairs that provide extra information –content »Data sent from client to server
2001 Prentice Hall, Inc. All rights reserved Introduction to LPW (II) –Response object –code »Status indicator for outcome of request –message »String that corresponds to code –headers »Additional information about response »Description of content –content »Data associated with response
2001 Prentice Hall, Inc. All rights reserved Introduction to LPW (III) –User Agent Usually a Web browser –timeout »How long user waits before timing out –agent »Name of the user agent –from » address of person using the browser –credentials »Any usernames or passwords for the response
2001 Prentice Hall, Inc. All rights reserved LPW Commands LWP –Is used to interact programmatically between a Perl program and a Web server.
2001 Prentice Hall, Inc. All rights reserved. Outline 1#!usr/bin/perl 2# Fig 17.1: fig17_01.pl 3# Simple LWP commands. 4 5use strict; 6use warnings; 7use LWP::UserAgent; 8 9my $url = " 10open( OUT, ">response.txt" ) or 11 die( "Cannot open OUT file: $!" ); 12 13my $agent = new LWP::UserAgent(); 14my $request = new 'GET' => $url ); 15my $response = $agent->request( $request ); 16 17if ( $response->is_success() ) { 18 print( OUT $response->content() ); 19} 20else { 21 print( OUT "Error: ". $response->status_line(). "\n" ); 22} 23 24print( OUT "\n \n" ); 25 26$url = " 27 fig17_01.pl This creates a new user agent objectThis creates a new request object. The argument indicates that it is a GET request, requesting $url If there was a response then the program will output the content If there was no response then it finds out the status of the response
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_01.pl Program Output 28$request = new 'POST', $url );29$request->content_type( 'application/x-www-form-urlencoded' ); 30$request->content( 'type=another' ); 31$response = $agent->request( $request ); 32 33print( OUT $response->as_string() ); 34print( OUT "\n" ); 35close( OUT ) or die( "Cannot close out file : $!" ); This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing Creates a new request to POST Determines how the response will be encoded Gets the agents request and prints it out as a string
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_01.pl Program Output HTTP/ OK Connection: close Date: Tue, 21 Nov :20:19 GMT Server: Apache/ (Win32) Content-Type: text/html Client-Date: Tue, 21 Nov :20:19 GMT Client-Peer: :80 Title: Your Style Page Your Style Page This is your style page. You chose the colors. Choose a new style.
2001 Prentice Hall, Inc. All rights reserved LPW Commands Fig. 17.2Contents of home.html. This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing
2001 Prentice Hall, Inc. All rights reserved The LPW::Simple Module LPW::Simple module –Provides procedural interface to LPW
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_03.pl Program Output 1 #!usr/bin/per; 2 # Fig 17.3: fig17_03.pl 3 # A program that uses LWP::Simple 4 5 use strict; 6 use warnings; 7 use LPW::Simple; 8 9 my $url = " 10 my $page = get( $url ); 11 print( “\n$page\n\n" ); 12 my $status = getprint( $url ); 13 print( "\n\n$status\n" ); 14 $status = getstore( $url, "page.txt" ) 15 print( "\n$status\n" ) This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing Retrieves a Web page and stores its contents in a scalar Gets the Web page and stores it into a file
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_03.pl Program Output This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing
2001 Prentice Hall, Inc. All rights reserved HTML Parsing HTML::TokeParser –Way of extracting HTML easily –Can walk through manually but TokeParser is simpler Token –Array references –5 types Start token ( S ) –starting HTML tag End token ( E ) –Array holding the tag, the name, and the original text Text token ( T ) Comment token ( C ) Declaration token ( D )
2001 Prentice Hall, Inc. All rights reserved HTML Parsing Fig. 17.4Resulting page.txt file. This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_05.pl 1#!/usr/bin/perl2# Fig 17.5: fig17_05.pl 3# A program to strip tags from an HTML document. 4 5use strict; 6use warnings; 7use LWP::UserAgent; 8use HTML::TokeParser; 9 10my $url = " 11my $agent = new LWP::UserAgent(); 12my $request = new 'GET' => $url ); 13my $response = $agent->request( $request ); 14my $document = $response->content(); 15 16my $page = new HTML::TokeParser( \$document ); 17 18while ( my $token = $page->get_token() ) { 19 my $type = $token } ); 20 my $text = $token } ); if ( $type eq "T" ) { 23 print( "$text" ); 24 } 25} Gets a Web page and stores its contents to $document Creates a new TokeParser object Goes through the tokens to display the text
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_05.pl Program Output This is my home page. This is my home page. I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing
2001 Prentice Hall, Inc. All rights reserved Introduction to Advanced Networking Sockets –All network communications are done with sockets –1 connection = 2 sockets –Allows date to be passed Streams –Sequenced –Reliable Datagrams –Less reliable –Not sequenced –Require less system resources »Connection is not permanent
2001 Prentice Hall, Inc. All rights reserved Introduction to Advanced Networking (II) Server –One endpoint / socket –Listens for a connection –Knows how to process requests Client –Other endpoint / socket –Knows the server –Initiates the connection –Sends a request
2001 Prentice Hall, Inc. All rights reserved Protocols Standardization Protocols –Need to be standardized or else server would have to know how to process each individual request –HTTP (Chapter 7) –POP receiving –STMP sending
2001 Prentice Hall, Inc. All rights reserved Transport Control Protocol (TCP) Internet connections –TCP Most general way for computers to talk Connection-oriented
2001 Prentice Hall, Inc. All rights reserved. Outline 1#!/usr/bin/perl 3# TCP chat client. 4 5use strict; 6use warnings; 7use IO::Socket; 8 9my $host = ' '; 10my $port = 5833; 11 12my $socket = new IO::Socket::INET( 13 PeerAddr => $host, 14 PeerPort => $port, 15 Proto => "tcp", 17 or die( "Cannot connect to $host:$port : ); 18 20print( $socket "What is your name?\n" ); 21print( "What is your name?\n" ); 22 23my $response = ; 24print( "From server: $response" ); 25 26my $input = ; 27 28chomp( $input ); 29 2# Fig 17.6: fig17_06.pl 19local $| = 1; 16 Type => SOCK_STREAM ) fig17_06.pl Initializes the location of the server Creates the Internet connection, will make a socket and automatically connect if server is found Turns off line buffering
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_06.pl 30while ( $input ne "q" ) { 31 print( $socket "$input\n" ); 32 $response = ; 33 print( "From server: $response" ); $input = ; 36 chomp( $input ); 37} 38 39print( "done\n" ); 40print( $socket "$input\n" ); 41 42close ( $socket ) or die( "Cannot close socket: $!" ); The user enters ‘q’ to close the connection
2001 Prentice Hall, Inc. All rights reserved. Outline } 33 34close ( $server ) or die( "Cannot end connection: $!" ); 31 print( "From client: $response\n" ); 30 chomp( $response ); 29 $response = ; 27 print( $client "$input" ); 26 my $input = ; 25while ( $response ne "q" ) { 23print( "From client: $response\n" ); 22chomp $response; 20my $response = ; 19my $client = $server->accept(); 17local $| = 1; 15 or die( "Cannot be a server on $port: ); 14 Listen => 10 ) 13 Type => SOCK_STREAM, 12 LocalPort => $port, 11my $server = new IO::Socket::INET( 9my $port = 5833; 7use IO::Socket; 6use warnings; 5use strict; 3# TCP chat server. 2# Fig 17.7: fig17_07.pl 1#!/usr/bin/perl fig17_07.pl Specifies the port to check for a clientCreates a new socket object Listen makes the server wait for a connection and specifies that 10 clients can be waiting to connect
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_07.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_07.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_07.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_07.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_07.pl Program Output
2001 Prentice Hall, Inc. All rights reserved Simple Mail Transfer Protocol (SMTP) Net::SMTP module
2001 Prentice Hall, Inc. All rights reserved. Outline 30 29print( textfield( "subject" ), br() ); 28print("Enter what you want to appear in the \"subject\" header:"); 27 26print( textfield( "to" ), br() ); 25print( "Enter what you want to appear in the \"to\" header: " ); 24 23print( textfield( "address" ), br() ); 22print( "Enter where you would like to send this " ); 21 20print( textfield( "from" ), br() ); 19print( "Enter what you want to appear in the \"from\" header: " ); 18 17print( textfield( "server" ), br() ); 16print( "Enter the SMTP server to connect to: " ); 15 14print( start_form( -action => "fig17_09.pl" ) ); 13 12print( h1( "The home page." ) ); 11 10print( start_html( "Send !" ) ); 9print( header() ); 8 7use CGI qw( :standard ); 6use warnings; 5use strict; 4 3# Form to send an message. 2# Fig. 17.8: fig17_08.pl 1#!/usr/bin/perl fig17_08.pl Gets the STMP serverGets the address to send the to
2001 Prentice Hall, Inc. All rights reserved. Outline 38print( end_html() ); 37 36print( br(), submit( "submit" ), end_form() ); wrap => 1 ), br() ); 33print( textarea( -name => "message", -rows => 5, -columns => 50, 32print( br() ); 31print( "Enter the message you want to send in the " ); fig17_08.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline 30$smtp->quit(); 29$smtp->dataend(); 28$smtp->datasend( "$message\n" ); 27$smtp->datasend( "Subject: $subject\n\n" ); 26$smtp->datasend( "To: $to\n" ); 25$smtp->datasend( "From: $from\n" ); 24$smtp->data(); 23 22$smtp->to( "$address" ); 21$smtp->mail( "$my_address" ); or die( "Cannot send $!" ); 18my $smtp = new Net::SMTP( "$server", Hello => "$server" ) 17 16my $my_address = 'my_address.smtp'; 15my $message = param( "message" ); 14my $subject = param( "subject" ); 13my $to = param( "to" ); 12my $address = param( "address" ); 11my $from = param( "from" ); 10my $server = param( "server" ); 9 8use CGI qw( :standard ); 7use Net::SMTP; 6use warnings; 5use strict; 4 3# Send an message. 2# Fig 17.9: fig17_09.pl 1#!/usr/bin/perl fig17_09.pl Creates a new Net::SMTP objectThe mail method creates an message, takes address of sender The to method is who the receiver of the is Starts and stops the transfer of data
2001 Prentice Hall, Inc. All rights reserved. Outline 35print( end_html() ); 34print( h1( "Your has been sent." ) ); 33print( start_html( "Send !" ) ); 32print( header() ); 31 fig17_09.pl
2001 Prentice Hall, Inc. All rights reserved Post Office Protocol (POP) POP –Created to make the storage and retrieval of easier –Allow checking, reading, storing and deleting of mail
2001 Prentice Hall, Inc. All rights reserved. Outline 25print( end_html() ); 24 23FORM Server: Password: Username: 12 11print <<FORM; 10 9print( start_html( -title => 'Please Login' ) ); 8print( header() ); 7 6use CGI qw( :standard ); 5use warnings; 4use strict; 3 2# Fig : fig17_10.pl 1#!/usr/bin/perl fig17_10.pl Creates an HTML page that asks for a username and password and then the IP address of the server
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_10.pl
2001 Prentice Hall, Inc. All rights reserved. Outline print( " $_: " ); 29for ( $start.. $end ) { 28 27my $end = ( $offset2 < $messages ? $offset2 : $messages ); 26my $start = 1 + $offset; 25my $offset2 = $offset + 5; 24my $offset1 = $offset - 5; 23print( " You have $messages messages in your inbox. " ); 22my $messages = $pop->Count(); print( h1( "Cannot connect: $!" ) ); 19 PASSWORD => $password, HOST => $server ) or 18my $pop = new Mail::POP3Client( USER => $user, 17 16print( start_html( -title => "Check your mail!" ) ); 15print( header() ); 14 13my $offset = param( "offset" ); 12my $server = param( "server" ); 11my $password = param( "password" ); 10my $user = param( "userName" ); 9 8use CGI qw( :standard ); 7use Mail::POP3Client; 6use MD5; 5use warnings; 4use strict; 3 2# Fig : fig17_11.pl 1#!/usr/bin/perl fig17_11.pl Gets the parameters from the user entered Web data Allows only a total of 5 messages to be displayed at once A tally of the messages in the inbox
2001 Prentice Hall, Inc. All rights reserved. Outline 34 } 35 37} 38 39print <<FORM1 if ( $offset ); FORM print <<FORM2 if ( $end != $messages ); $pop->Close(); 60 59print( end_html() ); 58 57FORM foreach ( $pop->Head( $_ ) ) { 36 print( " \n" ); /^(From|subject):\s+/i and print $_, " "; fig17_11.pl Goes through the headers of each message The next 5 messages to be shown
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_11.pl Program Output
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_11.pl
2001 Prentice Hall, Inc. All rights reserved Searching the World Wide Web Searching –A major application of the Web –Perl has several modules for searching
2001 Prentice Hall, Inc. All rights reserved. Outline 33 32print( "WebCrawler", br() ); 31print( "name = \"WebCrawler\" value = \"1\">" ); 30print( "<input type = \"checkbox\" " ); 29 28print( "HotBot", br() ); 27print( "name = \"HotBot\" value = \"1\">" ); 26print( "<input type = \"checkbox\" " ); 25 24print( "AltaVista", br() ); 23print( "name = \"AltaVista\" value = \"1\">" ); 22print( "<input type = \"checkbox\" " ); 21 20print( textfield( "amount" ), br(), br() ); 19print( "from each search engine, 1-50: " ); 18print( br() ); 17print( "Enter number of sites you want " ); 16 15print( textfield( "query" ), br(), br() ); 14print( "Enter query: " ); 13 12print( start_form( -method =>"post",-action =>"fig17_13.pl" )); 11 10print( h1( "Search the Web!" ) ); 9print( header(), start_html( "Web Search" ) ); 8 7use CGI qw( :standard ); 6use warnings; 5use strict; 4 3# Program to begin a Web search. 2# Fig : fig17_12.pl 1#!/usr/bin/perl fig17_12.pl What topic is to be searched for How many results the user desires to be returned Allows the user to check which of the 4 engines to use
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_12.pl Program Output 34print( "<input type = \"checkbox\" " );35print( "name = \"NorthernLight\" value = \"1\">" ); 36print( "NorthernLight", br() ); 37 38print( br(), submit( "Search!" ), end_form() ); 39 40print( end_html() );
2001 Prentice Hall, Inc. All rights reserved. Outline 1#!/usr/bin/perl 3# A program that collects search results. 4 5use strict; 6use warnings; 7use WWW::Search; 8use CGI qw( :standard ); 9 11my $search; 12 13my $query = param( "query" ); 14my $amount = param( "amount" ); print( header(), start_html() ); 18 print( h1( "Please try again." ) ); 20 print( end_html() ); 21 exit(); 22} 23 24if ( !$amount || $amount > 50 ) { 25 $amount = 5; 26} 27 28my $value; 29 16if ( !$query ) { 2# Fig 17.13: fig17_13.pl 19 print( " Go back " ); fig17_13.pl Allows a large use of search enginesDisplays if the user did not enter any input If there is no amount or it is greater than 50 then set it to 5
2001 Prentice Hall, Inc. All rights reserved. Outline "HotBot" ) if ( param( "HotBot" ) ); "AltaVista" ) if ( param( "AltaVista" ) ); "WebCrawler" ) if ( param( "WebCrawler" ) ); "NorthernLight" ) if ( param( "NorthernLight" ) ); 34 35print( header() ); 36print( start_html( "Web Search" ) ); 37 38foreach ) { 39 my $search = new WWW::Search( $_ ); 40 $search->native_query( WWW::Search::escape_query( $query ) ); for ( 1.. $amount ) { 44 my $result = $search->next_result(); 45 $value = $result->url(); 46 print( " $value " ); 47 print( br() ); 48 } print( br() ); 51} 52 53print( end_html() ); 41 print( b( i( "Web sites found by $_:" ) ), br() ); fig17_13.pl Insert the engines into the array if the user checked them Displays the results Searches the Web for results
2001 Prentice Hall, Inc. All rights reserved. Outline fig17_13.pl