© 2006 KDnuggets 152.152.98.11 - - [16/Nov/2005:16:32:50 -0500] "GET /jobs/ HTTP/1.1" 200 15140 "http://www.google.com/search?q=salary+for+data+mining&hl=en&lr=&start=10&sa=N"

Slides:



Advertisements
Similar presentations
Perl Practical Extration and Reporting Language An Introduction by Shwen Ho.
Advertisements

Introduction to C Programming
Introducing JavaScript
A Guide to Unix Using Linux Fourth Edition
The Linux Operating System Lecture 6: Perl for the Systems Administrator Tonga Institute of Higher Education.
SIUG Annual Meeting 2010 UNC Charlotte January 28, 2010 SIUG Annual Meeting 2010 Web Logs: Finally! Now What Do We Do With Them? Dan Pfohl, UNC Wilmington.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
CS1061 C Programming Lecture 2: A Few Simple Programs A. O’Riordan, 2004.
Practical Extraction & Report Language Picture taken from
Programming with Perl CSCE 330 Group presentation by: Robert Shannon Robert Shannon Ryan Mullaney Ryan Mullaney Anthony So Anthony So.
Introduction to Perl Learning Objectives: 1. To introduce the features provided by Perl 2. To learn the basic Syntax & simple Input/Output control in Perl.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
Perl Basics A Perl Tutorial NLP Course What is Perl?  Practical Extraction and Report Language  Interpreted Language Optimized for String Manipulation.
Perl Lecture #1 Scripting Languages Fall Perl Practical Extraction and Report Language -created by Larry Wall -- mid – 1980’s –needed a quick language.
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
Guide To UNIX Using Linux Third Edition
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
8/17/2015CS346 PHP1 Module 1 Introduction to PHP.
COMP An Introduction to Computer Programming : University of the West Indies COMP6015 An Introduction to Computer Programming Lecture 03.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
1 Introduction to PHP. 2 What is this “PHP” thing? Official description: “PHP, which stands for "PHP: Hypertext Preprocessor" is a widely-used Open Source.
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Nael Alian Introduction to PHP
2 1 Sending Data Using a Hyperlink CGI/Perl Programming By Diane Zak.
Introduction to Programming the WWW I CMSC Summer 2004 Lecture 6.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
USING PERL FOR CGI PROGRAMMING
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Done by: Hanadi Muhsen1 Tutorial 1.  Learn the history of JavaScript  Create a script element  Write text to a Web page with JavaScript  Understand.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
Perl: Lecture 2 Advanced RE & CGI. Regular Expressions 2.
Introduction to Perl Yupu Liang cbio at MSKCC
Intro to PHP IST2101. Review: HTML & Tags 2IST210.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 9: Perl Programming Practical Extraction and Report Language Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
3 1 Sending Data Using an Online Form CGI/Perl Programming By Diane Zak.
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
XP Tutorial 10New Perspectives on HTML and XHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial.
Introduction to Unix – CS 21
1 Introduction to Perl CIS*2450 Advanced Programming Techniques.
Programming for GCSE 1.0 Beginning with Python T eaching L ondon C omputing Margaret Derrington KCL Easter 2014.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
An Overview of Perl A language for Systems and Network Administration and Management: An overview of the language.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 2: Introduction to IS2803 Rob Gleasure
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
 History  Ease of use  Portability  Standard  Security & Privacy  User support  Application &Popularity Today  Ten Most Popular Programming Languages.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
PHP Tutorial. What is PHP PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
Perl Ed Finegan. Overview of Pearl Perl is a high-level programming language written by Larry Wall. It derives from the C programming language and to.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
Introduction to Perl: Practical extraction and report language
Introduction to GIS PythonScript CGIS-NURIntroduction to ArcGIS II.
CS 330 Class 7 Comments on Exam Programming plan for today:
Introduction to Programming the WWW I
PHP Introduction.
What is Bash Shell Scripting?
Introduction to Python
PHP.
Presentation transcript:

© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;.NET CLR )“ [16/Feb/2006:00:06: ] "GET / HTTP/1.1" " 740_1006" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" [16/Feb/2006:00:06: ] "GET /kdr.css HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" [16/Feb/2006:00:06: ] "GET /images/KDnuggets_logo.gif HTTP/1.1" " "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MyIE2)" Module 4b: Perl for Web Log Analysis

© 2006 KDnuggets Perl - introduction  A full-featured, fast, and easy to use scripting language  Very powerful pattern-matching facilities  More powerful than gawk; very popular for web programming and CGI files  Many Perl tutorials, e.g. learn.perl.org/

© 2006 KDnuggets Perl – historical note  PERL stands for Practical Extraction and Reporting Language  Developed by Larry Wall  Perl 1.0 was released to usenet's alt.comp.sources in 1987  Perl is the most popular web programming language – due to powerful text manipulation and quick development.  Perl is widely known as "the duct-tape of the Internet".

© 2006 KDnuggets Perl - running  First Perl script (on Unix) file1.pl #!/usr/local/bin/perl -w print "Hi there!\n"; Note: On Windows, first line usually is #!c:/Perl/bin/perl.exe -w % file1.pl Result: Hi there!

© 2006 KDnuggets Perl for Windows  Active Perl – ready-to-install Perl distribution  Runs on Windows, Linux, MAC OS, and other OS  Free download

© 2006 KDnuggets Perl basics  Two data types: numbers and strings  Perl uses many special characters %, as part of its syntax  Perl variables:  Scalars (simple variables, things) start with $, e.g. $count  Arrays (lists) start  Hashes (associative arrays) start with %  Usual control structures  Full introduction to Perl is beyond the scope of this module

© 2006 KDnuggets What does this code do? xinU / lreP rehtona tsuJ";sub =!fork;map{$P=$P[$f^ord ($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/ && close$_}%p;wait until$?;map{/^r/&& }%p;$_=$d[$q];sleep rand(2)if/\S/;print Answer: We do NOT want to know !

© 2006 KDnuggets The Tao of Coding  Human time is MUCH more precious than computer time  It is much better (and faster) to develop programs using methods that AVOID mistakes than try to find bugs in badly written programs

© 2006 KDnuggets Perl style: understandability first  Perl allows you to do tricky programs to save a few lines of text  AVOID this approach  Use careful, step by step development  Test after every step  A good program should be easy to understand  Only after you have an understandable program, and only if you need it, you can improve efficiency

© 2006 KDnuggets Perl coding  Variables can be declared implicitly by their first use, e.g. $oldvar=$nevar+27  if $nevar was not declared before, it will be initialized to zero  Danger! Can lead to hard-to-find errors (what if the variable was misspelled and was supposed to be $newvar ?)  Much better to declare variables explicitly e.g. my $newvar = 0;  Enforced by command use strict

© 2006 KDnuggets Sample log file  We will again use file d100.log – first 100 lines from the Nov 16, 2005 KDnuggets log file.  We will give useful code examples You are encouraged to try the code examples in this lecture on this file  You should get the same answers!

© 2006 KDnuggets Perl for parsing a web log file Program 0: logparse0.pl - read and print log file #!c:/Perl/bin/perl.exe -w use strict; while (<>) { my $line = $_; # current line print $line; }

© 2006 KDnuggets Perl regular expressions, 1  Usage: $var =~ / regex / where regex is a regular expression. E.g. $line =~ /google/ will match all lines containing "google" Note: / delimit regular expression, so / can't be used inside (unless escaped like this \/ )

© 2006 KDnuggets Perl log parsing, 1 #!c:/Perl/bin/perl.exe -w use strict; my $cnt=0; while (<>) { my $line = $_; if ($line =~/google/) {$cnt++;} } print " $cnt lines matched google"; Check how many lines refer to google Applying this code to d100.log,you get: 2 lines matched google

© 2006 KDnuggets Perl regular expressions, 2 Special characters:. : matches one character a* : matches zero or more repeats of "a" a+ : matches 1 or more repeats of "a" \S : matches any non-white space character ^ : anchor – matches beginning of string $ : anchor – matches end of string

© 2006 KDnuggets Log parse 2: IP address  IP address is the first item on the log line.  In almost all log files it is followed by " - - ", representing missing "ident_user" and "auth_user" fields  Regular expression for matching these 3 fields: $line =~ /^(\S+) - - /;

© 2006 KDnuggets Perl regex: parentheses capture match variables  Perl regex items enclosed in parentheses () correspond to special match variables.  Variable $1 contains value matched by regular expression in the first parentheses, etc

© 2006 KDnuggets Perl regex: match variables #!c:/Perl/bin/perl.exe –w use strict; my $cnt=0; while (<>) { my $line = $_; if ($line =~ /^(\S+) - - /) { my $ip = $1; print "ip $ip\n"; $cnt++; } else { print "bad line $line\n"; } print " processed $cnt log lines\n"; this program shows how to assign IP to variable $ip; also shows error processing if match is not successful Note: First line with Perl is probably different on your machine

© 2006 KDnuggets Perl regular expression 4: brackets  Brackets [ ] allow you match any character inside  Example:  [cmt]an will match can, man or tan,  will not match ban or dan.

© 2006 KDnuggets Perl regular expression 4b: brackets [^ ] [^x] will match any character except x  (note: here ^ is not the beginning of text anchor) Example: [^:]* will match any string that does not include a colon :. Example: if $date is 16/Nov/2005:031415, after $date =~ ([^:]*):.* [^:]* will match 16/Nov/2005 Because it was enclosed in (), match result stored in $1

© 2006 KDnuggets Parsing log: Date, Time  Date, Time is specified in the log as [DD/Mon/YYYY:HH:MM:SS timezone] Matching regular expression \[([^:]+):(..):(..):(..) -0500\]

© 2006 KDnuggets Parsing log: Date, Time Matching regular expression in detail \[([^:]+):(..):(..):(..) -0500\] \[ matches brackets \] [^:] matches any string that does not contain : ([^:]+) will match DD/Mon/YYYY ; value in $1 first (..) will match HH (hours); value in $2 second (..) will match MM ; in $3 third (..) matches SS; in $4

© 2006 KDnuggets Parsing log: Time Zone  The time zone is relative to GMT  The time zone in the log file is for the SERVER, not for the visitor, so it is nearly always the same in the time log  but it changes during daylight savings time  In our test log file the time zone is -0500, US Eastern time zone

© 2006 KDnuggets Parsing log: Request "(GET|HEAD|POST|OPTIONS) (\S+) HTTP(\S+)" Regular expression for parsing Request field: method opening and closing quotes URL, captures any string of 1 or more non-blanks HTTP version - usually ignored

© 2006 KDnuggets Parsing log: Status code and Object size Status (Response) code is always a 3-digit number, followed by space, so it can be matched with (\d\d\d) Object size is either a number or "-" followed by space. Simplest regex to match it is (\S+)

© 2006 KDnuggets Parsing log: Referrer The Referrer is a string enclosed in double quotes "…" Can have anything inside except for a double quote Can also be "-" in case of a direct request. Not documented, but can be "" (nothing between the quotes). Referrer can be matched by: "([^"]*)" opening and closing quotes anything except a double quote appearing zero or more times

© 2006 KDnuggets Parsing log: User agent User agent is also a string enclosed in double quotes " … ", that can have anything inside except for a double quote. It can also be "-". User agent can be matched by: "([^"]+)" opening and closing quotes anything except a double quote appearing one or more times

© 2006 KDnuggets Parsing a web log line: putting all together if ($line =~ /^(\S+) - - \[([^:]+):(..):(..):(..) -0500\] "(GET|HEAD|POST|OPTIONS) (\S+) HTTP(\S+)" (\d\d\d) (\S+) "([^"]*)" "([^"]+)"/ ) { … } The matching is done by the following (should be all on one line) Full code is in program weblog_parse.pl

© 2006 KDnuggets Perl arrays  Perl array is an ordered list of items  Array names begin  Array "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")

© 2006 KDnuggets Perl arrays, num of items  When referring to a single array item, name begins with "$". E.g. we print the first array item (index 0) using print $days[0] ;  Number of items in an array is $#array $#days is 7

© 2006 KDnuggets Perl array iteration  Iterating over entire array foreach $day {print $day,"\n" } ;  is the same as for $n ($n=0; $n <7; $n++) { print $days[$n],"\n" } ;

© 2006 KDnuggets Perl hash  Hash is unordered list of key, value pairs.  Hash names begin with %  Hash initialization: %capitals=("USA", "Washington D.C.", "France", "Paris", "China", "Beijing") ;

© 2006 KDnuggets Perl hash reference  Referring to a single hash item, name begins with "$".  To get capital of China from %capitals we use $capitals{"China"}  To add the capital of UK, we use  $capitals{"UK"} = "London" ;

© 2006 KDnuggets Perl hash iteration Iteration over the entire hash foreach $country (keys %capitals) { print "$country capital $capitals{$country}\n"; }

© 2006 KDnuggets Additional tools for Web log analysis  Perl for web log analysis Some web log analysis tools  Analog  AWstats awstats.sourceforge.net/  Webalizer  FTPweblog