Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.

Slides:



Advertisements
Similar presentations
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Advertisements

Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular Expressions.
W3101: Programming Languages (Perl) 1 Perl Regular Expressions Syntax for purpose of slides –Regular expression = /pattern/ –Broader syntax: if (/pattern/)
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
Regular Expressions in Perl Part I Alan Gold. Basic syntax =~ is the matching operator !~ is the negated matching operator // are the default delimiters.
Regular Expression JavaScript Web Technology Derived from:
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Regular Expressions.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
VBScript Session 13.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
RE Tutorial.
Regular Expressions.
Regular Expressions Upsorn Praphamontripong CS 1110
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions and perl
Regular Expression Beihang Open Source Club.
Advanced Find and Replace with Regular Expressions
PolyAnalyst Web Report Training
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Perl Regular Expressions – Part 1
Presentation transcript:

Regular Expression Original Notes by Song Guo

What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain amount of text. A "match" is the piece of text, or sequence of bytes or characters that pattern was found to correspond to by the regex processing software.

Different Regular Expression Engines A regular expression "engine" is a piece of software that can process regular expressions, trying to match the pattern to the given string. the engine is part of a larger application and you do not access the engine directly. different regular expression engines are not fully compatible with each other. It is not possible to describe every kind of engine and regular expression syntax (or "flavor")

Literal Characters The most basic regular expression consists of a single literal character, e.g.: a‏ “Jack is a boy”, match the a after the J can match the second a too. It will only do so when you tell the regex engine to start searching through the string after the first match “cat” will match “cat” in “About cats and dogs”, but not “Cat” in “About Cats and dogs”

Special Characters Metacharacters: there are 11 characters with special meanings: [, \, ^, $,., |, ?, *, +, ( and ). If you want to match 1+1=2, the correct regex is 1\+1=2. Otherwise, the plus sign will have a special meaning. Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like {1,3}. All other characters should not be escaped with a backslash. That is because the backslash is also a special character. The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d will match a single digit from 0 to 9. Non-Printable Characters: \r for carriage return (0x0D) and \n for line feed (0x0A). More exotic non-printables are \a (bell, 0x07),

Character Classes or Character Sets With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. [ae]: You could use this in gr[ae]y to match either “gray” or “grey”, but not graay, graey or any such thing. Negated Character Classes: q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It will not match the q in the string “Iraq”. It will match the q and the space after the q in “Iraq is a country”.

Metacharacters Inside Character Classes the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-). ‏The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. [+*], [\\x]

Shorthand Character Classes‏: [A-Z],[a-z],[0-9] Negated Shorthand Character Classes: \D is the same as [^\d], \W is short for [^\w], \S is the equivalent of [^\s] Repeating Character Classes:  If you repeat a character class by using the ?, * or + operators, you will repeat the entire character class, [0-9]+ can match 837 as well as 222. The Dot Matches (Almost) Any Character

Anchors The caret ^ matches the position before the first character in the string.  ^a to “abc” matches a, ^b will not match “abc” at all‏ $ matches right after the last character in the string. c$ matches c in abc, while a$ does not match at all. ^\s+ matches leading whitespace and \s+$ matches trailing whitespace ^\d+$ will not match “qsdf4ghjk”, but \d+ will

Word Boundaries \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length. There are three different positions that qualify as word boundaries:  Before the first character in the string, if the first character is a word character.  After the last character in the string, if the last character is a word character.  Between two characters in the string, where one is a word character and the other is not a word character

Word Boundaries \bword\b: whole word only \B is the negated version of \b. \B matches at every position where \b does not.

Alternation with The Vertical Bar or Pipe Symbol If you want to search for the literal text cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog. If you want more options, simply expand the list: cat|dog|mouse|fish

Quantifier Repetition with Star and Plus Limiting Repetition:  syntax is {min,max}  {0,} is the same as *, and {1,} is the same as +  \b[1-9][0-9]{3}\b to match a number between 1000 and \b[1-9][0-9]{2,4}\b matches a number between 100 and 99999

Watch Out for The Greediness! try to match a HTML tag for example: “This is a first ” However it will match “ first ”. The reason is that the plus is greedy Make greedy become lazy Put a question mark behind the plus in the regex:

Regex in Perl Three things you can do in perl  match: m/ /  substitute: s/ / /  translate: tr/ / / $v =~ s/a/b; # substitute the character a for b, and return true if this can happen $v =~ m/a; # does $v have an a in it $v !~ s/a/b; # substitute the character a for b, and return false if this can happen