CPTG286K Programming - Perl Chapter 7: Regular Expressions.

Slides:



Advertisements
Similar presentations
Regular Expressions Software Tools. Slide 2 What is a Regular Expression? A regular expression is a pattern to be matched against a string. For example,
Advertisements

ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
7.1 Last time on: Pattern Matching. 7.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
PERL Part 3 1.Subroutines 2.Pattern matching and regular expressions.
Regular Expression Learning Objectives:
Regular Expressions Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string. if( $str =~ /hello/){
Regular Expressions.
7.1 Some Eclipse Tips Try Ctrl+Shift+L Quick help (keyboard shortcuts) Try Ctrl+SPACE Auto-complete Source→Format ( Ctrl+Shift+F ) Correct indentation.
8.1 Last time on: Pattern Matching. 8.2 Finding a sub string (match) somewhere: if ($line =~ m/he/)... remember to use slash( / ) and not back-slash Will.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Lecture 8 perl pattern matching features
Input Validation with Regular Expressions COEN 351.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Scripting Languages Diana Trandab ă ț Master in Computational Linguistics - 1 st year
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex Introduction to Perl Session 2 · manipulating.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible” 
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Part:2.  Keywords are words with special meaning in JavaScript  Keyword var ◦ Used to declare the names of variables ◦ A variable is a location in the.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Python Pattern Matching and Regular Expressions Peter Wad Sackett.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Scripting Languages Course 5 Diana Trandab ă ț Master in Computational Linguistics - 1 st year
Regular Expressions Copyright Doug Maxwell (
Scripting Languages Perl – course 3
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions and perl
Pattern Matching in Strings
Regular Expression: Pattern Matching
Presentation transcript:

CPTG286K Programming - Perl Chapter 7: Regular Expressions

Regular Expressions (aka regex) Regular expressions are patterns used to match against a string Regular expressions are contained between slashes The outcome is either a successful match or a failure to match Substitution, join, and split operations can be performed on successful matches

Simple Uses of regex while (<>)# similar to grep “abc” filename { if (/abc/) # regex /abc/ matches abc to $_ { print; }# prints $_ if it contains abc } Replacing regex /abc/ with: –/ab*c/ matches an a, followed by 0 or more b’s, followed by a c; same as /ab{0,}c/ –/ab+c/ matches an a, followed by 1 or more b’s, followed by a c; same as /ab{1,}c/ –/ab?c/ matches an a, followed by 0 or 1 b’s, followed by a c; same as /ab{0,1}c/

Quantifiers SymbolMeaning +Match 1 or more times *Match 0 or more times ?Match 0 or 1 time {n}Match exactly n times {n,}Match at least n times {n,m}Match at least n but not more than m times

Patterns Single-character patterns –Character class –Negated character class Grouping patterns –Parenthesis –Multipliers –Sequence and anchoring –Alternation

Single-Character Patterns Specific single-character match: /a/ Any non-newline character: /./ Character class: /[valid_list]/ –/[0-9]/ # or \d, any single digit –/[a-zA-Z0-9_]/ # or \w, any word –/[ \r\t\n\f]/ # or \s, any space Negated class: /[^valid_list]/ –/[^0-9]/ # or \D, any single non-digit –/[^a-zA-Z0-9_]/ # or \W, any single non-word –/[^ \r\t\n\f]/ # or \S, any non-space

Parenthesis grouping This grouping is used to “memorize” a pattern, so it can be referenced later A memorized pattern is referenced using a backslash and parenthesis grouping number Examples: /(a)(b)c\2d\1/;# matches abcbda /a(.*)b\1c/;# matches aFREDbFREDc but # does not match aXXbXXXc

Multiplier grouping /x{5}/# matches exactly 5 x’s /x{5,10}/# matches 5 to 10 x’s /fo+ba?r*/# matches f followed by one or more o’s, a b, # an optional a, and zero or more r’s /fo{1,}ba{0,1}r{0,}/# same as /fo+ba?r*/ using a general multiplier By default, * and + groupings are greedy: $_ = “Nuts sold here. Come here!”; /N.*here/# $_ matches “Nuts sold here. Come here!” /N.*?here/# $_ matches “Nuts sold here.” (non-greedy)

Anchor grouping \b requires a word boundary for a match \B requires NO word boundary for match ^ matches beginning of the string $ matches end of string Examples: /\bFred\b/;# matches Fred, not Frederick or alFred /\bFred\B/;# matches Frederick, not Fred Flintstone /^a/;# matches strings beginning with a /c$/;# matches strings ending in c (before \n)

Alternatives grouping /al|bert|c/; # matches al or bert or c /^x|y/;# x at beginning of line, # or y anywhere /^(x|y)/;# either x or y at # beginning of line /songbird|bluebird/;# songbird or bluebird /(song|blue)bird/;# same, using parenthesis /(a|b)(c|d)/;# ac, ad, bc, or bd

Regex Grouping Precedence Arranged from highest to lowest precedence: NameRepresentation Parenthesis( ) (?: ) Multipliers? + * {m,n} ?? +? *? {m,n}? Sequence and Anchoringabc ^ $ \A \Z (?= ) (?! ) Alternation| Example: /a|b*/;# interpreted as /a|(b*)/, not (a|b)* /a|(?:b*)/;# same, but does not trigger memory # to store into \1

The pattern binding =~ operator Use the =~ to bind pattern to a scalar variable other than the default $_ variable To match the regex to $name from keyboard: print “Proceed (y/Y)? ”;# produce prompt chomp ($name = );# chomp input if ($name =~ /^[yY]/)# test both cases print “Proceeding.”;# display decision

Ignoring case & other delimiters Append an i to the regex to ignore case: print “Proceed (y/Y)? ”;# produce prompt chomp ($name = );# chomp input if ($name =~ /^y/i)# use either case print “Proceeding.”;# display decision To use a different delimiter: –Place an m followed by a new character in place of slashes (i.e. a #) print “Proceed (y/Y)? ”;# produce prompt chomp ($name = );# chomp input if ($name =~ m#^y#i)# new # delimiter print “Proceeding.”;# display decision

Variable Interpolation A regex can be constructed from computed strings rather than literals: $sentence = “Every good bird does fly.”; print “What should I look for? “;# prompt $what = ;# read keyboard chomp($what);# chomp input if ($sentence =~ /$what/)# matches [bw]ird { print “I saw $what in $sentence. \n”; } else { print “Nope… didn’t find it.\n”; }

Special Read-only Variables Upon a successful pattern match, $1, $2, $3… are set to values in \1, \2, \3… These read-only variables can be used in later parts of the program: $_ = “This is a test”; /(\w+)\W+(\w+)/;# match first two words # $1 is now “this” and # $2 is now “is” ($first,$second) = /(\w+)\W+(\w+)/; # $first is now “this” and $second is now “is”

More Read-only Variables Use the $& variable to examine part of string matching a regex $` is part of string before matching part $’ is part of string after matching part $_ = “This is a sample string”; /sa.*le/; # matches “sample” # $` is now “This is a “ # $& is now “sample” # $’ is now “ string”

Substitutions Use the substitution operator: s/regex/new-string/ Replacement strings can be variable interpolated Can use pattern characters in the regex, and special read-only variables Can use ignore case and custom delimiters Can use the pattern binding =~ operator

Split Function The split function splits a string into fields delimited by a regex $line = = split(/:/,$line);# split $line using # : as delimiter is now # (“merlyn”, “”, “118”, “10”, “Randal”, “/home/merlyn”, # “/usr/bin/perl”)

Splitting in list context $line = “merlyn::118:10:Randal:/home/merlyn:”; ($name,$password,$uid,$gid,$gcos,$home,$shell) = split(/:/,$line);# split $line using : as delimiter # $name is now “merlyn”, # $password is now “”, # $uid is now “118”, # $gid is now “10”, # $gcos is now “Randal”, # $home is now “/home/merlyn”, # $shell is now undef

The “Default” Split $_ = “some = split; # same = split(/\s+/, $_); # where \s+ specifies 1 or more spaces is now (“some”,“string”)

Join Function The join function joins a list of values with a glue string between list elements The $line can be reconstructed from using $line = glue string “:” # is not a regex

Glue Ahead & Trailing Glue $_ = "some string";# initialize default = split;# perform default split print show split result $result = glue ahead print "$result\n";# $result is “+some+string” $output = “”);# trailing glue print $output\n”;# $output is “some\nstring\n”