More Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Slides:



Advertisements
Similar presentations
Input and Output Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Advertisements

CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Slicing Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Tuples Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Control Flow Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Python: Regular Expressions
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Regular Expressions Lecture 3. Regular Expressions Motivation: To search for strings using partially specified patterns. Examples: To validate data fields.
 2002 Prentice Hall. All rights reserved. 1 Chapter 2 – Introduction to Python Programming Outline 2.1 Introduction 2.2 First Program in Python: Printing.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Overview Regular expressions Notation Patterns Java support.
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
Fixtures Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
PHP Using Strings 1. Replacing substrings (replace certain parts of a document template; ex with client’s name etc) mixed str_replace (mixed $needle,
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Regular Expression (continue) and Cookies. Quick Review What letter values would be included for the following variable, which will be used for validation.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Browsing Directories Copyright © Software Carpentry and The University of Edinburgh This work is licensed under the Creative Commons Attribution.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Pipes and Filters Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
More Tools Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to Database using Microsoft Access 2013 Part 6.1 November 18, 2014.
First-Class Functions Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
Operators Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Operating System Discussion Section. The Basics of C Reference: Lecture note 2 and 3 notes.html.
Exceptions Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Basics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Regular Expressions. What is it 4? Text searching & replacing Sequence searching (input, DNA) Sequence Tracking Machine Operation logic machines that.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Python Aliasing Copyright © Software Carpentry 2010
Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing
Program Design Invasion Percolation: Aliasing
Regular Expressions in Perl
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
Introduction to Scripting
The backslash is used to escape characters that are used in Python
CS 1111 Introduction to Programming Fall 2018
Program Design Invasion Percolation: The Grid
Python Lessons 7 & 8 Mr. Kalmes.
Python Lessons 7 & 8 Mr. Husch.
Python 8 Mr. Husch.
REGEX.
LING 388: Computers and Language
Presentation transcript:

More Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information. Regular Expressions

More Patterns Notebook #1 Site Date Evil (millivaders) Baker Baker Baker Baker Baker Baker ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #2 Site/Date/Evil Davison/May 22, 2010/ Davison/May 23, 2010/ Pertwee/May 24, 2010/ Davison/June 19, 2010/ Davison/July 6, 2010/ Pertwee/Aug 4, 2010/ Pertwee/Sept 3, 2010/ ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns # Get date from record. def get_date(record): '''Return (Y, M, D) as strings, or None.''' # m = re.search('([0-9]{4})-([0-9]{2})-([0-9]{2})', record) if m: return m.group(1), m.group(2), m.group(3) # Jan 1, 2010 (comma optional, day may be 1 or 2 digits) m = re.search('/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/', record) if m: return m.group(3), m.group(1), m.group(2) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns # Get all fields from record. def get_fields(record): '''Return (Y, M, D, site, reading) or None.''' patterns = ( ('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)', 2, 3, 4, 1, 5), ('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)', 4, 2, 3, 1, 5) ) for p, y, m, d, s, r in patterns: m = re.search(p, record) if m: return m.group(y), m.group(m), m.group(d), m.group(s), m.group(r) return None

Regular ExpressionsMore Patterns Notebook #3 Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 spaces Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 But how to match parentheses? Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 But how to match parentheses? The '()' in '(.+)' don't actually match characters Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Put '\(' and '\)' in regular expression to match parenthesis characters '(' and ')'

Regular ExpressionsMore Patterns Put '\(' and '\)' in regular expression to match parenthesis characters '(' and ')' Another escape sequence, like '\t' in a string for tab

Regular ExpressionsMore Patterns Put '\(' and '\)' in regular expression to match parenthesis characters '(' and ')' Another escape sequence, like '\t' in a string for tab In order to get the '\' in the string, must write '\\'

Regular ExpressionsMore Patterns Put '\(' and '\)' in regular expression to match parenthesis characters '(' and ')' Another escape sequence, like '\t' in a string for tab In order to get the '\' in the string, must write '\\' So the strings representing the regular expressions that match parentheses are '\\(' and '\\)'

Regular ExpressionsMore Patterns # find '()' in text m = re.search('\\(\\)', text) ⋮ ⋮ program text

Regular ExpressionsMore Patterns # find '()' in text m = re.search('\\(\\)', text) ⋮ ⋮ \(\) program text Python string

Regular ExpressionsMore Patterns # find '()' in text m = re.search('\\(\\)', text) ⋮ ⋮ \(\) () program text Python string finite state machine

Regular ExpressionsMore Patterns Notebook #3 Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 '([A-Z][a-z]+) ([0-9]{1,2}) ([0-9]{4}) \\((.+)\\) (.+)' Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 '([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)' match actual '(' and ')' Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 '([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)' match actual '(' and ')' create a group Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Notebook #3 '([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)' match actual '(' and ')' create a group save the match Date Site Evil(mvad) May (Hartnell) May (Hartnell) June (Hartnell) May (Troughton) May (Troughton) June (Troughton) ⋮ ⋮ ⋮

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations \ddigits'0', '4', '9'

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations \ddigits'0', '4', '9' \swhite space' ', '\t', '\r', '\n'

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations \ddigits'0', '4', '9' \swhite space' ', '\t', '\r', '\n' \wword characters'[A-Za-z0-9_]'

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations \ddigits'0', '4', '9' \swhite space' ', '\t', '\r', '\n' \wword characters'[A-Za-z0-9_]' actually the set of characters that can appear in a variable name in C (or Python)

Regular ExpressionsMore Patterns Some character sets come up frequently enough to be given abbreviations \ddigits'0', '4', '9' \swhite space' ', '\t', '\r', '\n' \wword characters'[A-Za-z0-9_]' actually the set of characters that can appear in a variable name in C (or Python) Need to double up the '\' when writing as string

Regular ExpressionsMore Patterns And now for an example of really bad design

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters that's an upper-case 'S'

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters that's an upper-case 'S' \Wnon-word characters

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters that's an upper-case 'S' \Wnon-word characters that's an upper-case 'W'

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters that's an upper-case 'S' \Wnon-word characters that's an upper-case 'W' Very easy to mis-type

Regular ExpressionsMore Patterns And now for an example of really bad design \Snon-space characters that's an upper-case 'S' \Wnon-word characters that's an upper-case 'W' Very easy to mis-type Even easier to mis-read

Regular ExpressionsMore Patterns Can also match things that aren't actual characters

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text re.search('^mask', 'mask size') => match

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text re.search('^mask', 'mask size') => match re.search('^mask', 'unmask') => None

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text re.search('^mask', 'mask size') => match re.search('^mask', 'unmask') => None $At the end of a pattern, matches the end of the input text

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text re.search('^mask', 'mask size') => match re.search('^mask', 'unmask') => None $At the end of a pattern, matches the end of the input text re.search('temp$', 'high-temp') => match

Regular ExpressionsMore Patterns Can also match things that aren't actual characters ^At the start of a pattern, matches the beginning of the input text re.search('^mask', 'mask size') => match re.search('^mask', 'unmask') => None $At the end of a pattern, matches the end of the input text re.search('temp$', 'high-temp') => match re.search('temp$', 'temperature') => None

Regular ExpressionsMore Patterns \bBoundary between word and non-word characters

Regular ExpressionsMore Patterns \bBoundary between word and non-word characters re.search('\\bage\\b', 'the age of') => match

Regular ExpressionsMore Patterns \bBoundary between word and non-word characters re.search('\\bage\\b', 'the age of') => match re.search('\\bage\\b', 'phage') => None

June 2010 created by Greg Wilson Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See for more information.