Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Python: Regular Expressions
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Finding the needle(s) in the textual haystack
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
January 23, 2007Spring Unix Lecture 2 Special Characters for Searches & Substitutions Shell Scripts Hana Filip.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Regular Expressions – An Overview Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Pattern Matching CSCI N321 – System and Network Administration.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
Powerpoint Templates Page 1 Powerpoint Templates GROUP 8:REGULAR EXPRESSION GURU BESAR: PN. SARINA SULAIMAN CIKGU-CIKGU: 1.CIKGU NENI 2.CIKGU
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Java Script Pattern Matching Using Regular Expressions.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
What is grep ?  % man grep  DESCRIPTION  The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions Upsorn Praphamontripong CS 1110
Perl-Compatible Regular Expressions Part 1
Appendix B.1 Lex Appendix B.1 -- Lex.
REGEX.
Perl Regular Expressions – Part 1
Lex Appendix B.1 -- Lex.
Presentation transcript:

Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular expression search ^\s*$ is a regular expression that says “look for a line with nothing but white space” – Whitespace: space ( ), tab (\t), formfeed (\f), newline (\n), carriage return (\r)

Regular Expressions A “convenient” way to describe patterns of characters – Characters include “printable” and “meta” characters Three primary concepts : – Concatenation – adjacent characters in the search string must be adjacent in the data string – Alternation – specify a choice of characters that match in a specified position – Repetition – specify how many of a given character must match

Concatenation if ($data =~ /abcdef/) { … } The pattern “abcdef” must show in that order within the variable $data

Alternation if ($data =~ /a(b|c|d|e)f/) { … } The pattern “a(b|c|d|e)f” must be an ‘a’ followed by one of ‘b’, ‘c’, ‘d’, ‘e’, followed by a ‘f’ within the variable $data

Repetition if ($data =~ /ab*f/) { … } The pattern “ab*f” must be an ‘a’ followed by zero or more ‘b’, followed by a ‘f’ within the variable $data * – zero or more instances of the previous character + – one or more instances of the previous character {n} – exactly n instances of the previous character {m,n} – m or m+1, …, n instances of the previous character {n,} – n or more instances of the previous character ? – zero or one instances of the previous character

Meta-characters Anything following a \ Alternation (choice) | Grouping within ( and ) Character classes within [ and ] – e.g. [A-Za-z] all upper and lower case letters – e.g. [abc] a or b or c – same as (a|b|c) – e.g. [^0-9] anything that is not a digit 0 thru 9 Match any –. (the dot) matches all characters. e.g. [.*] zero or more of any character

Meta-characters Beginning and end of a string – ^ what follows must start the string – $ what follows must end the string – /^ matches the ^ – /$ matches the $

Character Classes Use square brackets to denote classes (sets) of characters to be matched [A-Z] match any single uppercase letter [a-z] match any single lower case letter [0-9] match any digit [A-Za-z0-9] match any single letter or digit [^0-9] match any single character that is NOT a digit Note that there is no spaces in the classes (unless you want to match a space)

Matching String matching assumes the longest possible string to formulate the match e.g. “hear ye hear ye” =~ /hear.*ye/ matches the entire string If you want the minimal string you must do the following e.g. “hear ye hear ye” =~ /hear.*?ye/ matches only the first “hear ye”