Regular expressions Day 2

Slides:



Advertisements
Similar presentations
Regular Expressions and DFAs COP 3402 (Summer 2014)
Advertisements

NLTK & Python Day 4 LING Computational Linguistics Harry Howard Tulane University.
Strings and regular expressions Day 10 LING Computational Linguistics Harry Howard Tulane University.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Computational Language Finite State Machines and Regular Expressions.
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Monz Regular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. Dorr.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
CMSC 723: Intro to Computational Linguistics Lecture 2: February 4, 2004 Regular Expressions and Finite State Automata Professor Bonnie J. Dorr Dr. Nizar.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Globalisation & Computer systems Week 7 Text processes and globalisation part 1: Sorting strings: collation Searching strings and regular expressions Practical:
NLTK & BASIC TEXT STATS DAY /08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 4 DAY 5 - 9/05/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
UNICODE DAY /22/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Basic Text Processing Regular Expressions. Dan Jurafsky 2 The original slides from: tml Some changes.
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
Regular Expressions The ultimate tool for textual analysis.
NLTK & Python Day 5 LING Computational Linguistics Harry Howard Tulane University.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
REGULAR EXPRESSIONS 4 DAY 9 - 9/15/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
REGULAR EXPRESSIONS 2 DAY 7 - 9/10/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Finite-state automata Day 12 LING Computational Linguistics Harry Howard Tulane University.
1 LING 6932 Spring 2007 LING 6932 Topics in Computational Linguistics Hana Filip Lecture 2: Regular Expressions, Finite State Automata.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
NLTK & Python Day 6 LING Computational Linguistics Harry Howard Tulane University.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
COMPUTATION WITH STRINGS 3 DAY 4 - 9/03/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
Recursive Definitions and Regular Expression RD -- A method of defining sets RE -- A concise way of expressing a pattern in a language.
Gollis University Faculty of Computer Engineering Chapter Five: Retrieval, Functions Instructor: Mukhtar M Ali “Hakaale” BCS.
Lists 1 Day /17/14 LING 3820 & 6820 Natural Language Processing
Regular Expressions Upsorn Praphamontripong CS 1110
Looking for Patterns - Finding them with Regular Expressions
/208/.
Computation with strings 3 Day 4 - 9/07/16
Topics in Linguistics ENG 331
LING 388: Computers and Language
Control 3 Day /05/16 LING 3820 & 6820 Natural Language Processing
NLP 2 Day /07/16 LING 3820 & 6820 Natural Language Processing
Regular Expressions
CS 1111 Introduction to Programming Fall 2018
Regular expressions 3 Day /26/16
Regular Expressions
Compiler Construction
Computation with strings 4 Day 5 - 9/09/16
Lab 8: Regular Expressions
Regular Expression: Pattern Matching
LING 388: Computers and Language
Presentation transcript:

Regular expressions Day 2 LING 681.02 Computational Linguistics Harry Howard Tulane University

LING 681.02, Prof. Howard, Tulane University Course organization 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

Regular expressions SLP 2.1

LING 681.02, Prof. Howard, Tulane University Questions What is a string? A sequence of symbols. In text, a sequence of alphanumeric characters. What is a regular expression (RE or regex)? A language for specifying text search strings, requiring a pattern to search for and and a corpus to search through. What is an algebra? A set of elements and a group of operations defined for them e.g. the set of real numbers and the operations +, –, *, and /. What is a false positive? a string that is incorrectly matched > decreases accuracy What is a false negative? a string that is incorrectly excluded > decreases coverage What is precedence? 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

LING 681.02, Prof. Howard, Tulane University Notation in Perl * + - ^ ? . | () {n} \b \w $ \1 0 or more occurrences of the previous character or RE 1 or more occurrences of the previous character or RE The two ends of a range Not (negation) or beginning of line; "caret" the previous character is optional any character either … or "pipe" grouping or put in a register n occurrences of previous character or RE word boundary white space end of line replace with RE in register 1 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

LING 681.02, Prof. Howard, Tulane University Exercise 2.1: REs The set of all alphabetic strings. [a-zA-Z][a-zA-Z]* [a-zA-Z]+ The set of all lower case alphabetic strings ending in a b. [a-z]*b The set of all strings with two consecutive repeated words (e.g., “Humbert Humbert” and “the the” but not “the bug” or “the big bug”). ([a-zA-Z]+)\s+\1 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

LING 681.02, Prof. Howard, Tulane University Exercise 2.1: REs, cont. The set of all strings from the alphabet a, b such that each a is immediately preceded by and immediately followed by a b. (b+(ab+)+)? All strings that start at the beginning of the line with an integer and that end at the end of the line with a word. ˆ\d+\b.*\b[a-zA-Z]+$ 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

LING 681.02, Prof. Howard, Tulane University Exercise 2.1: REs, cont. All strings that have both the word grotto and the word raven in them (but not, e.g., words like grottos that merely contain the word grotto). \bgrotto\b.*\braven\b|\braven\b.*\bgrotto\b Write a pattern that places the first word of an English sentence in a register. Deal with punctuation. ˆ[ˆa-zA-Z]*([a-zA-Z]+) 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

LING 681.02, Prof. Howard, Tulane University Exercise 2.2 patterns (r"\b(i’m|i am)\b", "YOU ARE"), (r"\b(i|me)\b", "YOU"), (r"\b(my)\b", "YOUR"), (r"\b(well,?) ", ""), (r".* YOU ARE (depressed|sad) .*", r"I AM SORRY TO HEAR YOU ARE \1"), (r".* YOU ARE (depressed|sad) .*", r"WHY DO YOU THINK YOU ARE \1"), (r".* all .*", "IN WHAT WAY"), (r".* always .*", "CAN YOU THINK OF A SPECIFIC EXAMPLE"), (r"[%s]" % re.escape(string.punctuation), ""), 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

NLPP

LING 681.02, Prof. Howard, Tulane University REs in Python The re module provides Perl-type regular expression patterns, see http://www.amk.ca/python/howto/regex/ NLPP goes into REs in §3.4, p. 97ff 24-Aug-2009 LING 681.02, Prof. Howard, Tulane University

Next time SLP Automata: §2.2-end & Ex. 2.3-end NLPP: finish §1, do as many of the exercises as you can