Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.

Slides:



Advertisements
Similar presentations
Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
Advertisements

Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Finding the needle(s) in the textual haystack
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
What are Regular Expressions?What are Regular Expressions?  Pattern to match text  Consists of two parts, atoms and operators  Atoms specifies what.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions Copyright Doug Maxwell (
ITNPBD2: Representing and Manipulating Data EXAM RE-VISION
Regular Expressions Upsorn Praphamontripong CS 1110
Regular Expressions ICCM 2017
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
BASIC AND EXTENDED REGULAR EXPRESSIONS
Lecture 9 Shell Programming – Command substitution
CS 1111 Introduction to Programming Fall 2018
Regular Expressions and Grep
CSCI The UNIX System Regular Expressions
1.5 Regular Expressions (REs)
Regular Expression: Pattern Matching
REGEX.
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Regular Expressions (in Python)

Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern file.txt E.g. egrep “^A” file.txt Will print all the line of file.txt which start with (^) the letter A (capital A)

Regular expression (abbreviated regex or regexp) a search pattern, mainly for use in pattern matching with strings, i.e. "find and replace"- like operations.pattern matchingstrings Each character in a regular expression is either understood to be a metacharacter with its special meaning, or a regular character with its literal meaning.metacharacter We ask the question – does a given string match a certain pattern?

List of Meta characters ? 4.* 5.^ 6.$ 7.[...] [^...] 10.| 11.() 12.{m,n}

. (dot) Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character- encoding-, and platform-specific, but it is safe to assume that the line feed character is included).newlines Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".

Example. string1 = "Hello, world." if re.search(r".....", string1): print string1 + " has length >= 5"

Example [.] literally a dot string1 = "Hello, world." if re.search(r"....[.]", string1): print string1 + " has length >= 5 and ends with a."

+ Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac". string1 = "Hello, world." if re.search(r"l+", string1): print 'There are one or more consecutive letter "l"' +\ "'s in " + string1

? Matches the preceding pattern element zero or one times. #? #Matches the preceding pattern element zero or one times. string1 = "Hello, world." if re.search(r"H.?e", string1): print "There is an 'H' and a 'e' separated by 0-1 characters (Ex: He Hoe)"

* Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on. string1 = "Hello, world." if re.search(r"e(ll)*o", string1): print "'e' followed by zero to many'll' followed by 'o' (eo, ello, ellllo)"

^ Matches the beginning of a line or string. #^ Matches the beginning of a line or string. string1 = "Hello World" if re.search(r"^He", string1): print string1, "starts with the characters 'He'"

$ Matches the end of a line or string. string1 = "Hello World" if re.search(r"rld$", string1): print string1, "is a line or string that ends with 'rld'"

[ ] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].

Example [] #[] Denotes a set of possible character matches. string1 = "Hello, world." if re.search(r"[aeiou]+", string1): print string1 + " contains one or more vowels."

[^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.

Example [^ ] #[^...] Matches every character except the ones inside brackets. string1 = "Hello World\n" if re.search(r"[^abc]", string1): print string1 + " contains a character other than a, b, and c"

Example | #| Separates alternate possibilities. string1 = "Hello, world." if re.search(r"(Hello|Hi|Pogo)", string1): print "At least one of Hello, Hi, or Pogo is contained in " + string1

() Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group.

Example () string1 = "Hello, world." m_obj = re.search(r"(H..).(o..)(...)", string1) if re.search(r"(H..).(o..)(...)", string1): print "We matched '" + m_obj.group(1) + "' and '" + m_obj.group(2) + "' and '" + m_obj.group(3)+ "'"

\n {m,n} \n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. {m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regular expressions. BRE mode requires\{m,n\}.

-v option A regex in Python, either the search or match methods, returns a Match object or None. For grep - v equivalent, you might use: import re for line in sys.stdin: if re.search(r'[a- z]', line) is None: sys.stdout.write(line)

e.g. Username /^[a-z0-9_-]{3,16}$/ Starts and ends with 3-16 numbers, letters, underscores or hyphens Any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. At least 3 to 16 characters. Matches E.g. my-us3r_n4m3 but not th1s1s- wayt00_l0ngt0beausername

e.g. Password /^[a-z0-9_-]{6,18}$/ Starts and ends with 6-18 letters, numbers, underscores, hyphens. Matches e.g. myp4ssw0rd but not mypa$$w0rd

e.g. Hex Value /^#?([a-f0-9]{6}|[a-f0-9]{3})$/ Starts with a +/- (optional) followed by one or more Matches e.g. #a3c113 but not #4d82h4

e.g. String that matches: String that doesn't match: (TLD is too long)

Match n characters egrep.exe "^...$" data1.txt Will match any line with exactly 3 characters ^ starts with. And contains “…” (i.e. 3 characters) $ ends with Or just egrep.exe "^.{3}$" data1.txt What about egrep.exe "(..){2}" data1.txt?