Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.

Slides:



Advertisements
Similar presentations
Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Python regular expressions. “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.”
Learning Ruby Regular Expressions Get at practice page by logging on to csilm.usu.edu and selecting PROGRAMMING LANGUAGES|Regular Expressions.
Python: Regular Expressions
AND FINITE AUTOMATA… Ruby Regular Expressions. Why Learn Regular Expressions? RegEx are part of many programmer’s tools  vi, grep, PHP, Perl They provide.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Last Updated March 2006 Slide 1 Regular Expressions.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
Faculty of Sciences and Social Sciences HOPE JavaScript Validation Regular Expression Stewart Blakeway FML
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
Sys.Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 18: Regular Expressions in PHP.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Regular Expression Mohsen Mollanoori. What is RegeX ?  “ A notation to describe regular languages. ”  “ Not necessarily (and not usually) regular ”
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Hossain Shahriar Announcement and reminder! Tentative date for final exam need to be fixed! Topics to be covered in this lecture(s)
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expression What is Regex? Meta characters Pattern matching Functions in re module Usage of regex object String substitution.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
ECA 225 Applied Interactive Programming1 ECA 225 Applied Online Programming regular expressions.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions The ultimate tool for textual analysis.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Python Pattern Matching and Regular Expressions Peter Wad Sackett.
CompSci 101 Introduction to Computer Science April 7, 2015 Prof. Rodger.
Regular Expressions Copyright Doug Maxwell (
Regular Expressions Upsorn Praphamontripong CS 1110
CS 330 Class 7 Comments on Exam Programming plan for today:
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
System Administration Introduction to Scripting, Perl Session 5 – Fri 23 Nov 2007 References: Perl man pages Albert Lingelbach, Jr.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions and perl
Python regular expressions
CSC 594 Topics in AI – Natural Language Processing
CSCI 431 Programming Languages Fall 2003
CS 1111 Introduction to Programming Fall 2018
CIT 383: Administrative Scripting
Regular Expressions in Java
CSCE 590 Web Scraping Lecture 4
Regular Expression: Pattern Matching
Python regular expressions
Presentation transcript:

Regular Expressions Regular Expressions

Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have similar library packages for regular expressions  Use regular expressions to: Search a string ( search and match)Search a string ( search and match) Replace parts of a string (sub)Replace parts of a string (sub) Break stings into smaller piece (split)Break stings into smaller piece (split)

Regular Expression Python Syntax  Most characters match themselves The regular expression “test” matches the string ‘test’, and only that string  [x] matches any one of a list of characters “[abc]” matches ‘a’,‘b’, or ‘c’  [^x] matches any one character that is not included in x “[^abc]” matches any single character except ‘a’,’b’, or ‘c’

Regular Expressions Syntax  “.” matches any single character  Parentheses can be used for grouping “(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.  x|y matches x or y “this|that” matches ‘this’ or ‘that’, but not ‘thisthat’.

Regular Expression Syntax  x* matches zero or more x’s “a*” matches ’’, ’a’, ’aa’, etc.  x+ matches one or more x’s “a+” matches ’a’, ’aa’, ’aaa’, etc.  x? matches zero or one x’s “a?” matches ’’ or ’a’. “a?” matches ’’ or ’a’.  x{m, n} matches i x’s, where m<i< n “a{2,3}” matches ’aa’ or ’aaa’

Regular Expression Syntax  “\d” matches any digit; “\D” matches any non-digit  “\s” matches any whitespace character; “\S” matches any non-whitespace character  “\w” matches any alphanumeric character; “\W” matches any non-alphanumeric character  “^” matches the beginning of the string; “$” matches the end of the string  “\b” matches a word boundary; “\B” matches position that is not a word boundary

Search and Match  The two basic functions are re.search and re.match Search looks for a pattern anywhere in a stringSearch looks for a pattern anywhere in a string Match looks for a match staring at the beginningMatch looks for a match staring at the beginning  Both return None if the pattern is not found (logical false) and a “match object” if it is true >>> pat = "a*b" >>> import re >>> re.search(pat,"fooaaabcde") >>> re.match(pat,"fooaaabcde")

Python’s raw string notation  Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.  Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical: >>> re.match(r"\\", r"\\") >>> re.match("\\\\", r"\\")

Search example import re programming = ["Python", "Perl", "PHP", "C++"] pat = "^B|^P|i$|H$" for lang in programming: if re.search(pat,lang,re.IGNORECASE): print lang, "FOUND" else: print lang, "NOT FOUND" The output of above script will be: Python FOUND Perl FOUND PHP FOUND C++ NOT FOUND

Q: What’s a match object?  An instance of the match class with the details of the match result pat = "a*b" >>> r1 = re.search(pat,"fooaaabcde") >>> r1.group() # group returns string matched 'aaab' >>> r1.start() # index of the match start 3 >>> r1.end() # index of the match end 7 >>> r1.span() # tuple of (start, end) (3, 7)

What got matched?  Here’s a pattern to match simple addresses >>> pat1 = >>> r1 = >>> r1.group()  We might want to extract the pattern parts, like the name and host

What got matched?  We can put parentheses around groups we want to be able to reference >>> pat2 = >>> r2 = >>> r2.groups() r2.groups() ('finin', 'cs.umbc.edu', 'umbc.', 'edu’) >>> r2.group(1) 'finin' >>> r2.group(2) 'cs.umbc.edu'  Note that the ‘groups’ are numbered in a preorder traversal of the forest

What got matched?  We can ‘label’ the groups as well… >>> pat3 ="(?P (\w+\.)+(com|org|net|edu))" >>> r3 = >>> r3.group('name') 'finin' >>> r3.group('host') 'cs.umbc.edu’  And reference the matching parts by the labels

Pattern object methods  There are methods defined for a pattern object that parallel the regular expression functions, e.g., Match Match  Search Search  splitsplit findallfindall subsub

More re functions  re.split() is like split but can use patterns >>> re.split("\W+", “This... is a test, short and sweet, of split().”) ['This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’] ['This', 'is', 'a', 'test', 'short’, 'and', 'sweet', 'of', 'split’, ‘’]  >>>re.split('[a-f]+', '0a3B9‘, flags=re.IGNORECASE) ['0', '3', '9'] ['0', '3', '9']  re.sub substitutes one string for a pattern >>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes') 'black socks and black shoes’  re.findall() finds all matches >>> re.findall("\d+”,"12 dogs,11 cats, 1 egg") ['12', '11', ’1’]  findall With Files f = open('test.txt', 'r') # it returns a list of all the found strings strings = re.findall('some pattern', f.read()) f = open('test.txt', 'r') # it returns a list of all the found strings strings = re.findall('some pattern', f.read())

Compiling regular expressions re.compile  If you plan to use a re pattern more than once, compile it to a re object  Python produces a special data structure that speeds up matching >>> capt3 = re.compile(pat3) >>> cpat3 >>> r3 = >>> r3 >>> r3.group()

Example: pig latin  Rules If word starts with consonant(s)If word starts with consonant(s)  Move them to the end, append “ay” Else word starts with vowel(s)Else word starts with vowel(s)  Keep as is, but add “zay” How might we do this?How might we do this?

The pattern ([bcdfghjklmnpqrstvwxyz]+) (\w+)

piglatin.py import re pat = ‘([bcdfghjklmnpqrstvwxyz]+)(\w+)’ cpat = re.compile(pat) def piglatin(string): return " ".join( [piglatin1(w) for w in string.split()] ) return " ".join( [piglatin1(w) for w in string.split()] )

piglatin.py def piglatin1(word): match = cpat.match(word) match = cpat.match(word) if match: if match: consonants = match.group(1) consonants = match.group(1) rest = match.group(2) rest = match.group(2) return rest + consonents + “ay” return rest + consonents + “ay” else: else: return word + "zay" return word + "zay"

Exercises  Write a python program using regexp to validate an ip address : (eg )  Write a regexp to validate your USN.  Find Domain in Address E.g 'My name is Ram, and is my .‘ and program should E.g 'My name is Ram, and is my .‘ and program should  Write a program to validate name and phone number using re. It will continue to ask until you put correct data only. (eg.Phone number: (800) #1234. Use re.compile  Define a simple "spelling correction" function correct() that takes a string and sees to it that 1) two or more occurrences of the space character is compressed into one, and 2) inserts an extra space after a period if the period is directly followed by a letter. (use regular expression) E.g. correct("This is very funny and cool.Indeed!") E.g. correct("This is very funny and cool.Indeed!") should return "This is very funny and cool. Indeed!" should return "This is very funny and cool. Indeed!"  Find all five characters long words in a sentence