Scripting Languages Chapter 8 More About Regular Expressions.

Slides:



Advertisements
Similar presentations
Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Excel Chapter 6 Review slides. How many worksheets are in a workbook, by default? three.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Introduction to C Programming
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
CS 536 Spring Learning the Tools: JLex Lecture 6.
Last Updated March 2006 Slide 1 Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Lecture 8 perl pattern matching features
Input Validation with Regular Expressions COEN 351.
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Chapter 3: Formatted Input/Output Copyright © 2008 W. W. Norton & Company. All rights reserved. 1 Chapter 3 Formatted Input/Output.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
7 Copyright © 2009, Oracle. All rights reserved. Regular Expression Support.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
CS 614: Theory and Construction of Compilers Lecture 5 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
Chapter 3: Formatted Input/Output 1 Chapter 3 Formatted Input/Output.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Looking for Patterns - Finding them with Regular Expressions
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Scripting Languages Chapter 8 More About Regular Expressions

Character Classes  a list of possible characters inside square brackets ( [ ] )  matches any single character from within the class.  matches one character but may be any of the ones listed

Square Brackets  are similar to the period except that the match is limited to the characters within exp: /h[aeiou]t/ matches hat,het,hit,hot,hut but not ht or hrt

Shortcuts  character alternatives are more popular so shortcuts can help  ( - ) dash specifies a range  [0-9], [ ]  [a-z], [abcdefghijklmnopqrstuvwxyz]  [a-fA-F], [abcdefABCDEF]  /[a-z][0-9][A-Z]/ matches a string that contains a lowercase character, followed by a digit, followed by an uppercase char

( - ) limitations  the (-) dash only has this special meaning when it is used to specify a range  /-[0-9]/ will match any string with a dash followed by a digit

Caret character  Has special meaning if it is the first character within square brackets.  It represents all characters except those that follow  This program prints those lines that do not contain a digit. while ( ){ chomp; print “$_\n” if /^[0-9]/; }

Multiple-character matches  Use set of curly braces to match multiple occurrences of characters.  Numbers inside the curly braces correspond to the character indicated at the left of the curly braces. /x[0-9]{2}x/ matches exactly two digits surrounded by the “x” char

Examples /x[0-9]{2,5}x/ matches 2-5 digits surrounded by the “x” character /a[0-9]{5,}s/ matches “at least” five digits surrounded by the “a” character

Multiple-character Patterns  Several Multiple-char patterns occur more commonly than others.  Perl has some special reg exp chars for these situations  ? is equivalent to zero or one of what is at its left  * is equivalent to zero or more chars of whatever appears at the left of it  + is equivalent to one or more chars of whatever appears at its left  But remember these symbols represent themselves when enclosed in square brackets

Example /[-+]?[0-9]+/ matches a set of optionally signed digits, that is, zero or more digits possibly preceded by either a + or – Note: the two +’s have different meanings

Other Special Symbols  Remember (from last lecture) that | allows you to determine if a string contains one of a set of alternatives. (more examples) print if /10|15|19)/; print if /1(0|5|9)/; note the parentheses were used to group a set of alternatives

Other shortcuts  \w - Matches a word character (a- z_A-Z).  \W - Matches a non-word char  \s – Matches a whitespace char (blank, tab, or newline)  \S – Matches a non-whitespace char  \d – Matches a digit char  \D – Matches a non-digit char

More About Anchoring Patterns  earlier we saw that this expression matches a set of optionally signed digits: /[-+]?[0-9]+/ Each of these match this pattern: -256hello hello+256 lyes the2ndone

 If you wanted to match a string that contained only digits, then the pattern on previous slide is probably not what you intended. Exp – you asked the user for a number, you would expect responses such as To solve this we need to anchor a match to certain boundaries.

The Caret, Dollar Sign-- Again  Remember the caret allows you to match a pattern if it is at the beginning of a string.  And the $ allows you match a pattern if it is at the end of a string.  Note: the \n must be matched explicitly  \b sequence allows you to match a sting at a word boundary.

Examples /^this/ - #at beginning of string /this$/ - #at end of string /this/ - #anywhere in the string /\bthis\b/ - #if a word /^this$/ = #only if line contains ‘this’

This code asks user for an int and then checks the result with a reg exp. If user input is not an integer, the program asks the user to re- enter the integer. Eventually, the number of attempts for a correct match is printed.

#!/usr/bin/perl print “Enter a number “; $count = 1; while(1) {$_ = ; chomp; last if /^[-+]?[0-9]+$/; print “$_ is not a number, Re-enter”; $count++; } print “$count tries to enter a number\n”;

Exercises  Get previous script working in your account – understand its contents.  Complete 2 & 3 Pages