Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Slides:



Advertisements
Similar presentations
FORM VALIDATION Faheem Ahmed Khokhar. FORM VALIDATION Faheem Ahmed Khokhar.
Advertisements

Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
ISBN Chapter 6 Data Types Character Strings Pattern Matching.
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Faculty of Sciences and Social Sciences HOPE JavaScript Validation Regular Expression Stewart Blakeway FML
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
Lecture 8 perl pattern matching features
Sys.Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 18: Regular Expressions in PHP.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Post-Module JavaScript BTM 395: Internet Programming.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
VBScript Session 13.
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
1 PHP Intro PHP Strings After this lecture, you should be able to: Manipulate and Output PHP Strings: Manipulate and Output PHP Strings: Single- or Double-quoted.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
CS346 Regular Expressions1 Pattern Matching Regular Expression.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Perl Regular Expressions. Things Perl Can Do Easily with Regular Expression 2 Pattern matching Find out if a string contains some specific pattern.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 6 – sed, command-line tools wrapup.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
CSE 374 Programming Concepts & Tools
Looking for Patterns - Finding them with Regular Expressions
System Administration Introduction to Scripting, Perl Session 5 – Fri 23 Nov 2007 References: Perl man pages Albert Lingelbach, Jr.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Perl
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Regular Expressions in Pearl - Part II
BASIC AND EXTENDED REGULAR EXPRESSIONS
Regular Expressions and perl
Lecture 9 Shell Programming – Command substitution
CSC 594 Topics in AI – Natural Language Processing
CSCI 431 Programming Languages Fall 2003
CIT 383: Administrative Scripting
Regular Expression: Pattern Matching
Presentation transcript:

Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.

Pattern Matching Operator expression =~ m/regexp/options; $a = "apple"; print "yes!" if $a =~ m/pp/; The result is TRUE (1) or FALSE (0).

M operator options g global search i case insensitive search m multi-line string s single line string o evaluate once only x extended regular expression Now let’s see what Regular expression is and then we will return to m operator fine points.

Regular Expressions A regular expression is a string with joker characters and joker expressions. We will look at examples to explain it.

Regular Expression to Verify = ( ); ){ if( ){ print "$_ seems to be a good \n"; }else{ print "$_ bad address\n"; } OUTPUT: seems to be a good bad address NOTES: $_ is used as default m/ is default when / is used $_ would also work instead of but is safe

Regular Expression to Verify (2) ^ at the start of the string.* zero or more any-character –* means zero or more of what stands before a character \w+ one or more alpha character –+ means one or more of what stands before \. one. (dot) character –special regexp character is escaped with \.+ one or more any character $ until end of string

Search and Replace Example of Regular Expressions $text = 'JavaScript is not used on island Java.'; $text =~ s/Java(?!Script)/Borneo/; print $text; OUTPUT: JavaScript is not used on island Borneo. NOTES: Operator s will be dicussed later in detail (?! ) is zero length forward look, detailed later

Meta (joker) Character. any character but new line ^ start of string $ end of string \ escaping the next character \w any alpha character \W any non-alpha character \s any white space \S any non-white space Only examples, there are other meta characters, see the Perl manual.

Parentheses (1) $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; # $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; OUTPUT: Hook ok is la l a Hook ok i sl s l NOTES: Numbering is in the order of the opening parentheses

Parentheses without $n $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5.$6.\n"; $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5.$6.\n"; OUTPUT: Hook ok is la a.. Hook ok i sl l.. NOTES: (?: ) groups sub-expression without creating reference $6 is zero string

Character classes List of characters between [ and ] Interval, e.g. [a-f] Negative character set [^a-f]

Repetitions * zero or more times + one or more times ? zero or one time {n} exactly n times {n,} at least n times {n,m} at least n times, at most m times NOTES: There is {n,} but there is not {,m} Why? (hint: {0,m} works, but {n, ??? } ??)

Greedy repetition Repetitions are greedy, eat as many characters as possible $text = 'Hook is not used on island Java.'; $text =~ /(.*)is/; #1 print "$1.\n"; $text =~ /(.*?)is/; #2 print "$1.\n"; $text =~ /(.*?)is.*n/; #3 print "$1.\n"; OUTPUT: Hook is not used on. Hook.

Other extensions Other UNIX tools also use simpler, similar regular expressions Perl regular expressions are more powerful List of some extensions on the next slides

Regular expression comment (?# comment comes here) Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!

Regular Expression Parentheses (?: sub expression w/o $n) (?: we have discussed it already beforehand as it came up in an example, but this is the proper place to discuss this construct.)

Positive look forward (?= subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?=\w)/uc($1)/ge; print $t; OUTPUT: jAmAIca rUm rUm kIngstOn rUm Example: Uppercase all vowels standing inside a word to upper case.

Negative look forward (?! subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?!\w)/uc($1)/ge; print $t; OUTPUT: jamaicA rum rum kingston rum Example: Uppercase all vowels standing end of a word to upper case.

Option change inside the regular expression (? imsx) This can be used inside m/ or s/ operator. i and g options can not be used Now we go back to operator m/ and discuss some details.

M operator array = "abbabaa" =~ m/(bb).+(a.)/; print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa NOTES: Parts of the expression are closed into ( ) $1, $2... are the default variables where the substrings are put

M operator option g = "abbabaa" =~ m/(b)(a)/g; print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n"; OUTPUT: 3 b a b a NOTES: $_ is used as default m/ is default when / is would also work instead of but it is safe

M operator option g (2) $t = "abbabaa"; while( $t =~ m/(ab)(b|a)/g ){ print pos($t)," $1 $2\n"; } OUTPUT: 3 ab b 6 ab a

M operator option i Case insensitive match print '.',"apple" =~ /AppLe/,".\n"; print '.',"apple" =~ /AppLe/i,".\n"; prints...1.

M operator options m and s $t = "mah\na\nb"; while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n"; OUTPUT:.ah.a.b.. b..b. m matches $ to all \n in the string s matches. to \n (otherwise. is any character but \n )

M operator option o Evaluate the regular expression only once to save processor $t = "al brab"; $a = 'al'; $b = 'rab'; &q;&p; $b = 'fe'; &q;&p; sub q { print ' q',$t =~ /$a\sb$b/o } sub p { print ' p',$t =~ /$a\sb$b/ } prints q1 p1 q1 p

M operator option = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1.+ #one or more any-character (a.) #a letter 'a' and exactly one any-character /x; #space and comment allowed print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa This option allows space (\ is space) and comments to ease readability.

Operator s $text =~ s/regexp/replace/egimosx Options: –e replace is interpreted as expression –g global search and replace –i case insensitive search –m string is treated as multi-line –o regular expression is evaluated only once –s string is treated as single-line –x extended syntax for the regexp

Global Search and Replace $t = "abbab" ; $t =~ s/ab/aa/g; print $t; OUTPUT: aabaa replaces all occurrences of the search regular expression to the replacement string

m and s operators with different delimiters / is the default, but you can use ' to have non-interpolated string Other non alphanumeric characters () {} [] with matching character pairs –In this case s{search}{replace}

m and s operators with different delimiters example $text = ('bba'); $text =~ print "$text\n"; $text = $text =~ print "$text\n"; OUTPUT: is evaluated in the first search but not in the second

Thank you for your kind attention.