CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.

Slides:



Advertisements
Similar presentations
Regular expressions Day 2
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)
ISBN Chapter 3 Describing Syntax and Semantics.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Languages, grammars, and regular expressions
Fall 2006 CSE 467/567 1 RE review (Perl syntax) single-character disjunction: [aeiou] ranges: [0-9] negation: [^aeiou] conjunction: /cat/ matching zero.
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
1 Foundations of Software Design Lecture 22: Regular Expressions and Finite Automata Marti Hearst Fall 2002.
Chapter 3: Formal Translation Models
1 Overview Regular expressions Notation Patterns Java support.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Globalisation & Computer systems Week 7 Text processes and globalisation part 1: Sorting strings: collation Searching strings and regular expressions Practical:
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
Finding the needle(s) in the textual haystack
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
1 Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer.
Regular Expression - Intro Patterns that define a set of strings (or, pieces of a string) Not wildcards (similar notion, but different thing) Used by utilities.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Corpus Linguistics- Practical utilities (Lecture 7) Albert Gatt.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
2004/12/051/27 SPARCS 04 Seminar Regular Expression By 박강현 (lightspd)
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Regular expressions and the Corpus Query Language Albert Gatt.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CSE 311 Foundations of Computing I Lecture 18 Recursive Definitions: Context-Free Grammars and Languages Autumn 2011 CSE 3111.
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
Theory of Computation Lecture #
Looking for Patterns - Finding them with Regular Expressions
Regular Expression - Intro
Lecture 9 Shell Programming – Command substitution
Formal Language Theory
Regular Expressions
Regular Expressions.
Presentation transcript:

CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo

Fall 2006CSE 467/567 2 Levels of processing phonetics/phonology – sounds morphology – word structure syntax – sentence structure semantics – meaning pragmatics – goals of language use discourse – utterances in context

Fall 2006CSE 467/567 3 Words: the building blocks of sentences

Fall 2006CSE 467/567 4 Words have internal structure readable = read + able readability = read + able + ity the structure of words can be described using a regular grammar

Fall 2006CSE 467/567 5 Chomsky hierarchy regular languages context-free languages context-sensitive languages unrestricted languages

Fall 2006CSE 467/567 6 Problem I often need to find an , but I have thousands of s in my various folders. Suppose I want to find an about geese. The may mention “geese” or “goose”; also, if it appears at the start of a sentence, its initial letter will be capitalized. Need to match “goose”, “geese”, “Goose” or “Geese”.

Fall 2006CSE 467/567 7 Regular expressions (in Perl) “a regular expression is an algebraic notation for characterizing a set of strings” [p. 22] Regular expressions are commonly used to specify search strings. For example, the UNIX utility program grep lets the user specify a pattern to search for in files.

Fall 2006CSE 467/567 8 Sequences of characters Matching a sequence of characters /…/ Examples: /a/ matches the character ‘a’ /fred/ matches the string ‘fred’ Note: /fred/ does not match the string ‘Fred’! In other words, patterns are case-sensitive.

Fall 2006CSE 467/567 9 Character disjunction (character classes) Square brackets are used to indicate disjunction of characters. Examples: /[Ff]/ matches either ‘f’ or ‘F’ /[Ff]red/ matches either ‘fred’ or ‘Fred’ This form of disjunction applies only at the character level. A set of characters in square brackets are sometimes referred to as a character class.

Fall 2006CSE 467/ Ranges Sometimes it is useful to specify “any digit” or “any letter”. “Any digit” can be written as /[ ]/, since any of the ten digits satisfies the pattern. An alternative is to use a special range notation: /[0-9]/ Any letter can be specified as /[A-Za-z]/ Range notation does not extend the power of regular expressions, but gives us a convenient way to express them.

Fall 2006CSE 467/ Complementing character classes To search for a character that is not in a character class, use the caret (^) in front of the character class that is enclosed in square brackets. Examples: /[^a]/ matches anything except ‘a’ /[^0-9]/ matches anything except a digit

Fall 2006CSE 467/ Matching 0 or 1 occurrence The ‘?’ matches zero or one occurrences of the preceding expression. Examples: /a?/ matches ‘a’ or ‘’ (nothing) /cats?/ matches ‘cat’ or ‘cats’ Note that the “preceding expression”, in these examples, is a single letter. We’ll see how to form longer expressions later.

Fall 2006CSE 467/ The Kleene star and plus The Kleene star (*) matches zero or more occurrences of the preceding expression. Examples: /a*/ matches ‘’, ‘a’, ‘aa’, ‘aaa’, etc. /[ab]*/ matches ‘’, ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’, etc. + matches one or more occurrences + is not necessary: /[ab]+/ is equiv. to /[ab][ab]*/

Fall 2006CSE 467/ Wildcard The period (.) matches any single character except the newline (\n).

Fall 2006CSE 467/ Anchors Anchors are used to restrict a match to a particular position within a string. ^ anchors to the start of a string $ anchors to the end of a string /[Ff]red/ matches both ‘Fred’ and ‘Fred is home’ /^[Ff]red$/ matches ‘Fred’ but not ‘Fred is home’ \b anchors to a word boundary \B anchors to a non-boundary

Fall 2006CSE 467/ Conjunction Two regular expressions are conjoined by juxtaposition (placing the expressions side by side). Examples: /a/ matches ‘a’ /m/ matches ‘m’ /am/ matches ‘am’ but not ‘a’ or ‘m’ alone

Fall 2006CSE 467/ Disjunction We have already seen disjunction of characters using the square bracket notation General disjunction is expressed using the vertical bar (|), also called the pipe symbol. This form of disjunction allows us to match any one of the alternative patterns, not just characters like the [ ] disjunction form.

Fall 2006CSE 467/ Grouping Parentheses, ‘(’ and ‘)’, are used to group subpatterns of a larger pattern. Ex: /[Gg](ee)|(oo)se/

Fall 2006CSE 467/ Replacement In addition to matching, we can do replacements when a match is found: Example: To replace the British spelling of color with the American spelling, we can write: s/colour/color/

Fall 2006CSE 467/ Registers – saving matches To save a match from part of a pattern, to reuse it later on, Perl provides registers Registers are named \#, where # is the number of the register Ex. DE DO DO DO DE DA DA DA IS ALL I WANT TO SAY TO YOU /(D[AEO].)*/ will match the first line /(D[AEO])(.D[AEO]) \2 \2\s \1 (.D[AEO]) \3 \3/ matches it more specifically This pattern also matches strings like DA DE DE DE DA DO DO DO \s matches a whitespace character

Fall 2006CSE 467/ For more information PERL Regular Expression TUTorial – PERL Regular Expression reference page –