Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
1 Perl Syntax: substitution s// and character replacement tr//
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
ISBN Chapter 6 Data Types Character Strings Pattern Matching.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
Regular expressions (contd.) -- remembering subpattern matches When a is being matched with a target string, substrings that match sub-patterns can be.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
COS 381 Day 22. Agenda Questions?? Resources Source Code Available for examples in Text Book in Blackboard
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
Input Validation with Regular Expressions COEN 351.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 4: 8/30.
Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Perl Language Yize Chen CS354. History Perl was designed by Larry Wall in 1987 as a text processing language Perl has revised several times and becomes.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
CPTG286K Programming - Perl Chapter 7: Regular Expressions.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
_______________________________________________________________________________________________________________ PHP Bible, 2 nd Edition1  Wiley and the.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text Introduction to Perl Session 7 ·
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
Digital Text and Data Processing Tokenisation. Today’s class □ Tokenisation and creation of frequency lists □ Keyword in context lists □ Moretti and distant.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible” 
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
Arrays and Lists. What is an Array? Arrays are linear data structures whose elements are referenced with subscripts. Just about all programming languages.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
RE Tutorial.
CS 330 Class 7 Comments on Exam Programming plan for today:
Lecture 19 Strings and Regular Expressions
Advanced Regular Expressions
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Perl
Regular Expressions in Pearl - Part II
Regular Expressions and perl
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
CSC 594 Topics in AI – Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.
Matcher functions boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. boolean lookingAt() Attempts to.
CIT 383: Administrative Scripting
- Regular expressions:
Perl Regular Expressions – Part 1
Presentation transcript:

Pattern Matching II

Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the Bobcat and listened to the Bobolink”; /.*Bob/ –$_ = “Freddie’s hot dogs”; /Fred+/ –$_ = “Freddie’s hot dogs are really hot!”; /.*hot/

Minimal Matching The minimal mode is specified by (?) after the quantifier. For example, –$_ = “Freddie’s hot dogs”; /Fred+?/ –$_ = “Freddie’s hot dogs are really hot!”; /.*?hot/

Multiple Quantifiers Leftmost quantifier is greediest. For example, –$_ = “Bob sat next to the Bobcat and listened to the Bobolink”; /Bob.*Bob.*link/ The first.* matches: –“ sat next to the Bobcat and listened to the “

Anchors More complicated patterns can be created with anchors. An anchor requires a pattern to match at specific places in a string. Allows a particular position in a pattern to align with a particular position in the string.

(^) Anchor (^) requires the pattern match at the beginning. For example, –/^Shelley/ “Shelley has red hair” “What color is Shelley’s hair?” –/^[^!]^/ The meaning of (^) depends on the context.

($) Anchor ($) requires the pattern match at the end. For example, –/hair$/ “Shelley has red hair” “What color is Shelley’s hair?”

(\b) Anchor (\b) matches the position between a word and a non-word character. For example, –/\bwear\b/ “I wear shoes” “Swimwear for sale.” “Molly wears green sweaters.”

Binding Operators A pattern can be matched against any string with binding operators (=~) and (!~) The left operand must evaluate to a string and the return value is a Boolean. For example, –$string =~ /[,;:]/ –$string !~ /[,;:]/ –if ( =~ /^[Yy]/) { … }

Pattern Modifiers A pattern can be followed by a modifier. The modifier changes how: –The pattern is interpreted. –The pattern matcher works while using the pattern. The most common modifiers are: –i, m, s, o, x

(i) Modifier (i) modifier tells the pattern matcher to ignore case. For example, /apples/i matches –“apples” –“Apples” –“APPLES” –“ApPlEs”

(m) And (s) Modifier (m) treats a string as multiple lines: –(^) matches just after any newline. –($) matcher just before any newline. (s) treats a string as a single line: –(.) will also match newline characters. If both (m) and (s) are specified: –(.) matches any character. –(^) and ($) match positions after and before a newline

(o) Modifier Patterns can include scalar variables: –The variables are interpolated. Patterns containing variables are recompiled every time their used. Provides dynamic patterns, but very expensive. Include (o) modifier if variable never changes. –Tells Perl not to recompile the pattern.

(x) Modifier (x) tells the pattern matcher to ignore white spaces. For example, /\d+ \. \d+/x is equivalent to /\d+\.\d+/ Allows comments to be included for patterns. /\d+# digits before the decimal. \.# The decimal point. \d+# digits after the point. /x

Remembering Matches Sometimes a pattern needs to reference a part of a string it matched earlier. Done by parenthesizing parts of interest. Referenced by implicitly defined variables –e.g. \1, \2, \3, … For example, –/(\w+).*\1/ - “jo likes joanne.” –/(.)\1/ –/([‘”])(.*?)\1/

References Outside a Pattern Parts of a pattern are needed outside the pattern sometimes. Can be referenced by implicit variables: –e.g. $1, $2, $3, … For example, “VY ran for 267 yards Saturday” =~ /(\d+) (\w+) (\w+)/; print “$1 $2 $3 \n”;

Nested Parentheses Patterns can have nested parentheses. Relate to variables by counting ( starting from the left. For example: $_ = “31 Oct 2005”; /((\d+) (\w+) (\d+))/; print “$1 \n $2 $3 $4 \n”;

Backreferences \n and $n are called backreferences. –Refers to the result of the previous match. Perl also includes 3 implicit variables. –$` – part before the match. –$& – part that matched. –$’ – part after the match. Costly for matcher to save these for every match.

RegEx Extensions Perl includes several extensions to previous versions of its regular expression syntax. The general form is: (?xPattern) x is a one or two character code.

Look Ahead Want a pattern to match if (not) followed by a subpattern, but do not want the subpattern as part of the match. (?=) and (?!) provides this look ahead behavior. For example, –/\d+(?=\.)/ –/\d+(?!\.)/

Look Behind Perl also allows look behinds. (?<=) and (?<!) provides this behavior. For example, –/(?<=\.)d+/ –/(?<!\.)d+/

Substitution Often need to find a substring and replace it with another. Perl has a substitution operator for this. The general form is: –s dl Pattern dl New_string dl Modifiers The common form is: –s/Pattern/New_string/ The return value is the number of substitutions made.

Examples Example 1: $_ = “No more apples!”; s/apples/applets/; Example 2: $_ = “Who are Jack and Jill?”; s/(\w+) and (\w+)/$2 & $1/;

Substitution with Modifiers Modifiers can be used with the substitution operator. i, o, m, s, and x have the same effect. There are two common modifiers for substitutions: –g: perform substitution everywhere it applies. –e: substitution part treated as a Perl expression.

(g) and (e) Examples Example 1: $_ = “ ”; s/0//g; Example 2: $_ = “Molly and Mary were cold.”; s/(\w+)/”\1”/g; Example 3: $_ = “Is it Sum, SUM, sum, or suM?”; s/sum/sum/ig; Example 4: s/(\w+)/uc($1)/e;