Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address.

Slides:



Advertisements
Similar presentations
Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
Advertisements

Python: Regular Expressions
Regular expressions and the Corpus Query Language
Finite Automata and Regular Expressions i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Scripting Languages Chapter 8 More About Regular Expressions.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Faculty of Sciences and Social Sciences HOPE JavaScript Validation Regular Expression Stewart Blakeway FML
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
Finding the needle(s) in the textual haystack
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
January 23, 2007Spring Unix Lecture 2 Special Characters for Searches & Substitutions Shell Scripts Hana Filip.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions.
Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Regular expressions and the Corpus Query Language Albert Gatt.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Regular Expressions.
RE Tutorial.
Regular Expressions Upsorn Praphamontripong CS 1110
Perl Regular Expression in SAS
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
CS 1111 Introduction to Programming Fall 2018
Lab 8: Regular Expressions
REGEX.
LING 388: Computers and Language
Presentation transcript:

Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address Regular expressions or ‘regexes’ give use the power to do this kind of matching At simplest, any word is a regex –regex: ‘ ’ –test: ‘ ’ Regex is in string, so it matches!

Regular Expressions In reality regexes are used to search for a string that "has the form" of the regular expression” Need to define some syntax that lets us specify things such as –'a number is in a range‘; –'a letter is one of a set‘; –'a certain number of characters' etc. Requires special characters

Regular Expressions Some special characters: *, [], {} For a complete reference see expressions.info/reference.html expressions.info/reference.html An asterisk * specifies that the character preceding it can appear zero or more times, e.g, –regex: 'a*b' –test: 'b' # Matches as there is no 'a’ –test: ‘ab’ #Matches –test: ‘aaab’ #Matches

Regular Expressions A range of characters, or a "character class" is defined using square brackets [], e.g. –regex: '[a-z]' –test: 'm' # Matches as it is a lower case letter –test: ‘M' # Fails as it is an upper case letter Multiple ranges: separate with comma –regex: '[a-z,A-Z,0-9]' –test: ‘M’ # Matches –test: ‘9’ # Matches

Regular Expressions To specify an exact number of characters use braces {}, e.g. –regex: 'a{2}' –test: 'abab' # Fails as there is not two # consecutive a's in the string –test: 'aaaab' # Matches

Regular Expressions in Python Python contains a regular expression module, called ‘re’ that allows strings to be tested against regular expressions –import re –checker = re.compile('[a-z]') –if checker.match(test) != None: print 'String matches!' –else: print 'String does not contain a match'

Practical example filetestsRun = testResults.log' f = open(filetestsRun,'r') reTestCount = re.compile("Running\\s*(\\d+)\\s*test", re.IGNORECASE) reCrashCount = re.compile("OK!") reFailCount = re.compile("Failed\\s*(\\d+)\\s*of\\s*(\\d+)\\s*tests", re.IGNORECASE) Above code searches through a file for lines such as –Running 13 tests OK! Used on Mantid to keep track of build server test passes/failures