Nate Brunelle Today: Regular Expressions

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Python: Regular Expressions
CS 3240 – Chapter 3.  How would you delete all C++ files from a directory from the command line?  How about all PowerPoint files that start with the.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
1 Overview Regular expressions Notation Patterns Java support.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Formal Methods in SE Theory of Automata Qasiar Javaid Assistant Professor Lecture # 06.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lecture # 3 Regular Expressions 1. Introduction In computing, a regular expression provides a concise and flexible means to "match" (specify and recognize)
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 8.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Introduction to Theory of Automata By: Wasim Ahmad Khan.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
CompSci 101 Introduction to Computer Science November 18, 2014 Prof. Rodger.
Recursive Definations Regular Expressions Ch # 4 by Cohen
Michael Kovalchik CS 265, Fall  Parenthesis group parts of expressions together  “/CS265|CS270/” => “/CS(265|270)/”  Groups can be nested  “/Perl|Pearl/”
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
RE Tutorial.
CS314 – Section 5 Recitation 2
Finding the needle(s) in the textual haystack
Regular Expressions Upsorn Praphamontripong CS 1110
Regular Expressions 'RegEx'.
Theory of Computation Lecture #
Transition Graphs.
3. Regular Expressions and Languages
Looking for Patterns - Finding them with Regular Expressions
/208/.
CSC 594 Topics in AI – Natural Language Processing
Lexical Analysis CSE 340 – Principles of Programming Languages
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Regular Expressions in Pearl - Part II
Theory of Automata.
Lecture 9 Shell Programming – Command substitution
LANGUAGES Prepared by: Paridah Samsuri Dept. of Software Engineering
Finding the needle(s) in the textual haystack
Finding the needle(s) in the textual haystack
CSC 594 Topics in AI – Natural Language Processing
Pattern Matching in Strings
Nate Brunelle Today: Repetition, Repetition
Nate Brunelle Today: Functions again, Scope
CS 1111 Introduction to Programming Fall 2018
Nate Brunelle Today: Regular Expressions
An Overview of Grep and Regular Expression
Regular Expressions and Grep
Lecture 25: Regular Expressions
1.5 Regular Expressions (REs)
Regular Expressions in Java
Regular Expressions in Java
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
Nate Brunelle Today: Regular Expressions
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
REGEX.
LECTURE # 07.
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
An Intro to Regex in R Alan Wu.
Regular Expressions in Java
Regular Expressions.
Presentation transcript:

Nate Brunelle Today: Regular Expressions CS1110 Nate Brunelle Today: Regular Expressions

Questions?

String.find() Takes a string as an argument, and if exactly that string appears, give its index Mystring.find(“Purple Elephant”) “purple elephant”.find(“Purple Elephant”) “the elephant was purple”

Wildcards [Rr]ugs?[^a-zA-Z] Match on/ find: Will not match on/find: Rugged rugged We might want: A way of saying r or R å Maybe there’s an s ç Something that’s not a letter ê åugçê [Rr]ugs?[^a-zA-Z]

he she it they went to the store s?h?e?i?t? Alternation (or) Sit Alternation (or) | s?he|it|they (she|he|it|they) went to the store she went to the store he went to the store it went to the store they went to the store

Star vs plus vs ? Spo?ky Spo*ky Spo+ky Spky Spoky Spooky Spoooky Spooooky Spoooooooooooooky … Spo+ky

R string “\”” r“\”” -> error r“\”this” -> error r“\n” -> \n

Regex Pieces Operation Example Meaning Character class [Rr] or [rR] [abcd] [\^A] R or r Exactly one of a, b, c, or d Just carat (^) or A Character Range [a-z] [a-zA-Z] [0-9] Exactly one character “between” a and z “between” a and z or “between” A and Z Any one digit Negative character class [^a] [^a-zA-Z] [^\^] Any one character that’s not an a Any one character that’s not a letter any one character that’s not a carat Optional Quantifier s? [Rr]? Maybe there’s an s, 0 or 1 s Either have one of R or r or neither OR, alternation wx|xyz s?he|it One of the strings wx or xyz Matches one of the two regexes Star [abc]* Any number of a’s b’s and c’s at all 0 or more copies of… Plus [abc]+ At least one of a’s, b’s, and c’s 1 or more copies of…

Regex Pieces, Cont. All UVA computing IDs Operation Example Meaning Count Range {3, 5} [ab]{2,3} [abc]{5} Between 3 and 5 (inclusive) copies of. aa, ab, ba, bb, aaa, aab, abb, baa, … End of Text $ This is some text# Beginning of Text ^ #This is some text Word Boundary \b #This# #is# #some# #text# Anything . Any one character .* Any number of characters All UVA computing IDs 2-3 letters, number, 1-3 letters [a-z]{2,3}[2-9][a-z]{1,3}

Give an Expression to match All UVA computing IDs 2-3 letters, number, 1-3 letters [a-z] [a-z] [a-z]?[2-9] [a-z] [a-z]? [a-z]?

What does a for loop look like? for [variable] in [collection]: Variable: [a-zA-Z]+ [0, 1, 5, 9]

import re finder = re.compile Use the finder Match Object search Similar to string.find(), gives just the first matching instance finditer Gives a collection of match objects findall I list containing: 0 parentheses: m.group() 1 paren: m.group(1) 2+ paren: m.groups() Match Object Group The text we matched on start end groups

Writing a regex Write down some examples of strings you want to match, and some examples of similar strings that you don’t want to match Want to match: njb2b, mst3k, aaa8bbb, aa4aa Don’t want to match: a2b, njb2, 7bb Going left-to-right through your examples, try to come up with the rules that will match/not match on the correct strings

Regex for phone numbers ((3n) ?|3n-)? 3n- 4n Area = (\([0-9]{3}\) ? | [0-9]{3}-)? Office = [0-9]{3}- rest = [0-9]{4} Want to match: 555-1234 434-555-1234 (434) 555-1234 Don’t want to match: 555-123 5551234 5555-1234 111-1234 123-234-5678 Also handle parentheses [2-9][0-9]{2}\-([2-9][0-9]{2}\-)?[0-9]{4}|\([2-9][0-9]{2}) ? [2-9][0-9]{2}\-[0-9]{4}