Introduction to Perl Part II By: Bridget Thomson McInnes 22 January 2004.

Slides:



Advertisements
Similar presentations
Introduction to Perl Part II By: Cédric Notredame (Adapted from BT McInnes)
Advertisements

Input from STDIN STDIN, standard input, comes from the keyboard. STDIN can also be used with file re-direction from the command line. For instance, if.
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Scripting Languages Chapter 8 More About Regular Expressions.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Shell Script Examples.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Slide 6a-1 CHAPTER 6 Matching Patterns: Using Regular expressions to match patterns.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Sort the Elements of an Array Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. Elements considered as strings.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
An Introduction to Unix Shell Scripting
Introduction to Perl Part II By: Dipak Balabantaray
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
January 23, 2007Spring Unix Lecture 2 Special Characters for Searches & Substitutions Shell Scripts Hana Filip.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Introduction to Perl Part I, II, and III By: Bridget Thomson McInnes 20 January 2004.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Introduction to Perl Part III By: Bridget Thomson McInnes 6 Feburary 2004.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
Introduction to Unix – CS 21
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
The Scripting Programming Language
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Regular Expressions Copyright Doug Maxwell (
Input from STDIN STDIN, standard input, comes from the keyboard.
Looking for Patterns - Finding them with Regular Expressions
Lecture 9 Shell Programming – Command substitution
Pattern Matching in Strings
Topics Introduction to File Input and Output
Unix Talk #2 (sed).
Regular Expressions
Topics Introduction to File Input and Output
REGEX.
Presentation transcript:

Introduction to Perl Part II By: Bridget Thomson McInnes 22 January 2004

File Handlers Very simple compared to C/ C++ !!! Very simple compared to C/ C++ !!! Are not prefixed with a symbol %, ect) Are not prefixed with a symbol %, ect) Opening a File: Opening a File: open (SRC, “my_file.txt”); Reading from a File Reading from a File $line = ; # reads upto a newline character Closing a File Closing a File close (SRC);

File Handlers cont... Opening a file for output: Opening a file for output: open (DST, “>my_file.txt”); Opening a file for appending Opening a file for appending open (DST, “>>my_file.txt”); Writing to a file: Writing to a file: print DST “Printing my first line.\n”; Safeguarding against opening a non existent file Safeguarding against opening a non existent file open (SRC, “file.txt”) || die “Could not open file.\n”;

File Test Operators Check to see if a file exists: Check to see if a file exists: if ( -e “file.txt”) { # The file exists! # The file exists!} Other file test operators: Other file test operators: -rreadable -xexecutable -dis a directory -Tis a text file

Quick Program with File Handles Program to copy a file to a destination file Program to copy a file to a destination file #!/usr/local/bin/perl -w open(SRC, “file.txt”) || die “Could not open source file.\n”; open(DST newfile.txt”); while ( $line = ) { print DST $line; print DST $line;} close SRC; close DST;

Some Default File Handles STDIN : Standard Input STDIN : Standard Input $line = ; # takes input from stdin STDOUT : Standard output STDOUT : Standard output print STDOUT “File handling in Perl is sweet!\n”; STDERR : Standard Error STDERR : Standard Error print STDERR “Error!!\n”;

The <> File Handle The “empty” file handle takes the command line file(s) or STDIN; The “empty” file handle takes the command line file(s) or STDIN; –$line = <>; If program is run./prog.pl file.txt, this will automatically open file.txt and read the first line. If program is run./prog.pl file.txt, this will automatically open file.txt and read the first line. If program is run./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt... you will not know when one ends and the other begins. If program is run./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt... you will not know when one ends and the other begins.

The <> File Handle cont... If program is run./prog.pl, the program will wait for you to enter text at the prompt, and will continue until you enter the EOF character If program is run./prog.pl, the program will wait for you to enter text at the prompt, and will continue until you enter the EOF character –CTRL-D in UNIX

Example Program with STDIN Suppose you want to determine if you are one of the three stooges Suppose you want to determine if you are one of the three stooges#!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print “Enter your name: ? “; $name = ; chomp $name; if($stooges{lc($name)}) { print “You are one of the Three Stooges!!\n”; print “You are one of the Three Stooges!!\n”; } else { print “Sorry, you are not a Stooge!!\n”; print “Sorry, you are not a Stooge!!\n”;}

Chomp and Chop Chomp : function that deletes a trailing newline from the end of a string.   $line = “this is the first line of text\n”;   chomp $line; # removes the new line character   print $line; # prints “this is the first line of # text” without returning Chop : function that chops off the last character of a string.   $line = “this is the first line of text”;   chop $line;   print $line; #prints “this is the first line of tex”

Regular Expressions What are Regular Expressions.. a few definitions. What are Regular Expressions.. a few definitions. –Specifies a class of strings that belong to the formal / regular languages defined by regular expressions –In other words, a formula for matching strings that follow a specified pattern. Some things you can do with regular expressions Some things you can do with regular expressions –Parse the text –Add and/or replace subsections of text –Remove pieces of the text

Regular Expressions cont.. A regular expression characterizes a regular language A regular expression characterizes a regular language Examples in UNIX: Examples in UNIX: –ls *.c  Lists all the files in the current directory that are postfixed '.c' –ls *.txt  Lists all the files in the current directory that are postfixed '.txt'

Simple Example for... ? Clarity In the simplest form, a regular expression is a string of characters that you are looking for In the simplest form, a regular expression is a string of characters that you are looking for We want to find all the words that contain the string 'ing' in our text. We want to find all the words that contain the string 'ing' in our text. The regular expression we would use : The regular expression we would use : /ing/ /ing/

Simple Example cont... What would are program then look like: What would are program then look like:#!/usr/local/bin/perl while(<>) { chomp; = split/ = split/ /; foreach { foreach { if($word=~m/ing/) { print “$word\n”; } if($word=~m/ing/) { print “$word\n”; } }}

Regular Expressions Types Regular expressions are composed of two types of characters: Regular expressions are composed of two types of characters: –Literals  Normal text characters  Like what we saw in the previous program ( /ing/ ) –Metacharacters  special characters  Add a great deal of flexibility to your search

Metacharacters Match more than just characters Match more than just characters Match line position Match line position –^start of a line( carat ) –$end of a line( dollar sign ) Match any characters in a list : [... ] Match any characters in a list : [... ] Example : Example : –/[Bb]ridget/matches Bridget or bridget – /Mc[Ii]nnes/matches McInnes or Mcinnes

Our Simple Example Revisited Now suppose we only want to match words that end in 'ing' rather than just contain 'ing'. Now suppose we only want to match words that end in 'ing' rather than just contain 'ing'. How would we change are regular expressions to accomplish this: How would we change are regular expressions to accomplish this: –Previous Regular Expression: $word =~m/ ing / $word =~m/ ing / –New Regular Expression: $word=~m/ ing$ / $word=~m/ ing$ /

Ranges of Regular Expressions Ranges can be specified in Regular Expressions Ranges can be specified in Regular Expressions Valid Ranges Valid Ranges –[A-Z]Upper Case Roman Alphabet –[a-z]Lower Case Roman Alphabet –[A-Za-z]Upper or Lower Case Roman Alphabet –[A-F]Upper Case A through F Roman Characters –[A-z]Valid but be careful Invalid Ranges Invalid Ranges –[a-Z]Not Valid –[F-A]Not Valid

Ranges cont... Ranges of Digits can also be specified Ranges of Digits can also be specified –[0-9]Valid –[9-0]Invalid Negating Ranges Negating Ranges –/ [^0-9] /  Match anything except a digit –/ ^a /  Match anything except an a –/ ^[^A-Z] /  Match anything that starts with something other than a single upper case letter  First ^ :start of line  Second ^ :negation

Our Simple Example Again Now suppose we want to create a list of all the words in our text that do not end in 'ing' Now suppose we want to create a list of all the words in our text that do not end in 'ing' How would we change are regular expressions to accomplish this: How would we change are regular expressions to accomplish this: –Previous Regular Expression: $word =~m/ ing$ / $word =~m/ ing$ / –New Regular Expression: $word=~m/ [^ ing]$ / $word=~m/ [^ ing]$ /

Literal Metacharacters Suppose that you actually want to look for all strings that equal '^' in your text Suppose that you actually want to look for all strings that equal '^' in your text –Use the \ symbol –/ \^ /Regular expression to search for What does the following Regular Expressions Match? What does the following Regular Expressions Match? / [ A - Z ^ ] ^ / / [ A - Z ^ ] ^ / –Matches any line that contains ( A-Z or ^) followed by ^

Patterns provided in Perl Some Patterns Some Patterns –\d[ 0 – 9 ] –\w[a – z A – z 0 – 9 _ ] –\s[ \r \t \n \f ](white space pattern) –\D[^ 0 - 9] –\W[^ a – z A – Z 0 – 9 ] –\S[^ \r \t \n \f] Example :[ 19\d\d ] Example :[ 19\d\d ] –Looks for any year in the 1900's

Using Patterns in our Example Commonly words are not separated by just a single space but by tabs, returns, ect... Let's modify our split function to incorporate multiple white space #!/usr/local/bin/perl while(<>) { = split/\s+/, $_; foreach { if($word=~m/ing/) { print “$word\n”; }

Word Boundary Metacharacter Regular Expression to match the start or the end of a 'word' : \b Regular Expression to match the start or the end of a 'word' : \b Examples: Examples: –/ Jeff\b /Match Jeff but not Jefferson –/ Carol\b /Match Chris but not Caroline –/ Rollin\b /Match Rollin but not Rolling –/\bform /Match form or formation but not Information –/\bform\b/Match form but neither information nor formation

DOT Metacharacter The DOT Metacharacter, '.' symbolizes any character except a new line The DOT Metacharacter, '.' symbolizes any character except a new line / b. bble/ / b. bble/ –Would possibly return : bobble, babble, bubble /. oat/ /. oat/ –Would possibly return : boat, coat, goat Note: remember '.*' usually means a bunch of anything, this can be handy but also can have hidden ramifications. Note: remember '.*' usually means a bunch of anything, this can be handy but also can have hidden ramifications.

PIPE Metacharacter The PIPE Metacharacter is used for alternation The PIPE Metacharacter is used for alternation / Bridget (Thomson | McInnes) / / Bridget (Thomson | McInnes) / –Match Bridget Thomson or Bridget McInnes but NOT Bridget Thomson McInnes / B | bridget / / B | bridget / –Match B or bridget / ^( B | b ) ridget / / ^( B | b ) ridget / –Match Bridget or bridget at the beginning of a line

Our Simple Example Now with our example, suppose that we want to not only get all words that end in 'ing' but also 'ed'. Now with our example, suppose that we want to not only get all words that end in 'ing' but also 'ed'. How would we change are regular expressions to accomplish this: How would we change are regular expressions to accomplish this: –Previous Regular Expression: $word =~m/ ing$ / $word =~m/ ing$ / –New Regular Expression: $word=~m/ (ing|ed)$ / $word=~m/ (ing|ed)$ /

The ? Metacharacter The metacharacter, ?, indicates that the character immediately preceding it occurs zero or one time The metacharacter, ?, indicates that the character immediately preceding it occurs zero or one time Examples: Examples: –/ worl?ds /  Match either 'worlds' or 'words' –/ m?ethane /  Match either 'methane' or 'ethane'

The * Metacharacter The metacharacter, *, indicates that the characterer immediately preceding it occurs zero or more times The metacharacter, *, indicates that the characterer immediately preceding it occurs zero or more times Example : Example : –/ ab*c/ Match 'ac', 'abc', 'abbc', 'abbbc' ect... –Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c. Sometimes called Kleene's star Sometimes called Kleene's star

Our Simple Example again Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' How would we change are regular expressions to accomplish this: – –Previous Regular Expression: $word =~m/ ing$ / – –New Regular Expression: $word=~m/ ings?$ /

Modifying Text Match Match –Up to this point, we have seen attempt to match a given regular expression –Example : $variable =~m/ regex / Substitution Substitution –Takes match one step further : if there is a match, then replace it with the given string –Example : $variable =~s/ regex / replacement $var =~ / Thomson / McInnes /; $var =~ / Thomson / McInnes /; $var =~ / Bridgette / Bridget /; $var =~ / Bridgette / Bridget /;

Substitution Example Suppose when we find all our words that end in 'ing' we want to replace the 'ing' with 'ed'. Suppose when we find all our words that end in 'ing' we want to replace the 'ing' with 'ed'. #!/usr/local/bin/perl -w while(<>) { chomp $_; chomp = split/ \s+/, = split/ \s+/, $_; foreach { foreach { if($word=~s/ing$/ed/) { print “$word\n”; } if($word=~s/ing$/ed/) { print “$word\n”; } }}

Special Variable Modified by a Match $& $& –Copy of text matched by the regex $' $' –A copy of the target text in from of the match $` $` –A copy of the target text after the match $1, $2, $3, ect $1, $2, $3, ect –The text matched by 1st, 2nd, ect., set of parentheses. Note : $0 is not included here $+ $+ – A copy of the highest numbered $1, $2, $3, ect..

Our Simple Example once again Now lets revise are program to find all the words that end in 'ing' without splitting our line of text into an array of words Now lets revise are program to find all the words that end in 'ing' without splitting our line of text into an array of words #!/usr/local/bin/perl -w while(<>) { chomp $_; chomp $_; if($_=~/([A-Za-z]*ing\b)/) { print "$&\n"; } if($_=~/([A-Za-z]*ing\b)/) { print "$&\n"; }}

Example #!/usr/local/bin $exp = ; chomp $exp; if($exp=~/^([A-Za-z+\s)*\bcrave\b(\s[A-Za-z]+)*/) { print “$1\n”; print “$2\n”; } –Run Program with string : I crave to rule the world! –Results:  I  to rule the world!

Example #!/usr/local/bin $exp = ; chomp $exp; if($exp=~/\bcrave\b/) { print “$`\n”; print “$&\n”; print “$’\n”; } –Run Program with string : I crave to rule the world! –Results:  I  crave  to rule the world!

Thank you Thank you