CS 4705 Lecture 2 Regular Expressions and Automata.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
1 Regular Expressions and Automata September Lecture #2.
Theory Of Automata By Dr. MM Alam
8/25/2009 Sofya Raskhodnikova Intro to Theory of Computation L ECTURE 1 Theory of Computation Course information Overview of the area Finite Automata Sofya.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
CS 4705 Regular Expressions and Automata in Natural Language Analysis CS 4705 Julia Hirschberg.
Theory Of Automata By Dr. MM Alam
Finite-state automata 2 Day 13 LING Computational Linguistics Harry Howard Tulane University.
January 5, 2015CS21 Lecture 11 CS21 Decidability and Tractability Lecture 1 January 5, 2015.
Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
1 Regular Expressions and Automata September Lecture #2-2.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
CS5371 Theory of Computation
CS 4705 Some slides adapted from Hirschberg, Dorr/Monz, Jurafsky.
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CS 4705 Lecture 2 Regular Expressions and Automata in Language Analysis.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Computational Language Finite State Machines and Regular Expressions.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
CS 4705 Regular Expressions and Automata in Natural Language Analysis CS 4705.
CS 4705 Regular Expressions and Automata in Natural Language Analysis CS 4705 Julia Hirschberg.
CS 4705 Some slides adapted from Hirschberg, Dorr/Monz, Jurafsky.
Topics Automata Theory Grammars and Languages Complexities
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
CMSC 723: Intro to Computational Linguistics Lecture 2: February 4, 2004 Regular Expressions and Finite State Automata Professor Bonnie J. Dorr Dr. Nizar.
1 Non-Deterministic Finite Automata. 2 Alphabet = Nondeterministic Finite Automaton (NFA)
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
REGULAR LANGUAGES.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Natural Language Processing Lecture 6 : Revision.
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
1 Computability Five lectures. Slides available from my web page There is some formality, but it is gentle,
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Natural Language Processing Lecture 2—1/15/2015 Susan W. Brown.
1 Regular Expressions and Automata CPE 641 Natural Language Processing from Kathy McCoy’s slides, CISC 882 Introduction to NLP
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
LING 388: Language and Computers Sandiway Fong 9/27 Lecture 10.
CSA3050: Natural Language Algorithms Finite State Devices.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
1 Regular Expressions and Automata August Lecture #2.
CS 4705 Some slides adapted from Hirschberg, Dorr/Monz, Jurafsky.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
using Deterministic Finite Automata & Nondeterministic Finite Automata
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Conversions Regular Expression to FA FA to Regular Expression.
Deterministic Finite Automata Nondeterministic Finite Automata.
Theory of Computation Automata Theory Dr. Ayman Srour.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Nondeterminism The Chinese University of Hong Kong Fall 2011
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Chapter 2 FINITE AUTOMATA.
CSC NLP - Regex, Finite State Automata
Finite Automata.
4b Lexical analysis Finite Automata
Regular Expressions and Automata in Language Analysis
4b Lexical analysis Finite Automata
CPSC 503 Computational Linguistics
Statistical NLP Winter 2009
Presentation transcript:

CS 4705 Lecture 2 Regular Expressions and Automata

Representations and Algorithms for NLP Representations: formal models used to capture linguistic knowledge Algorithms manipulate representations to analyze or generate linguistic phenomena Simplest often produce best performance but….the 80/20 Rule and “low-hanging fruit”

NLP Representations State Machines –FSAs, FSTs, HMMs, ATNs, RTNs Rule Systems –CFGs, Unification Grammars, Probabilistic CFGs Logic-based Formalisms –1 st Order Predicate Calculus, Temporal and other Higher Order Logics Models of Uncertainty –Bayesian Probability Theory

NLP Algorithms Most are parsers or transducers: accept or reject input, and construct new structure from input –State space search Pair a partial structure with a part of the inputpartial structure Spaces too big and ‘best’ is hard to define –Dynamic programming Avoid recomputing structures that are common to multiple solutions

DetNom NP thecat The cat is on the mat

Today Review some of the simple representations and ask ourselves how we might use them to do interesting and useful things –Regular Expressions –Finite State Automata

Uses of Regular Expressions in NLP As grep, perl: Simple but powerful tools for large corpus analysis and ‘shallow’ processing –What word is most likely to begin a sentence? –What word is most likely to begin a question? –In your own , are you more or less polite than the people you correspond with? With other unix tools, allow us to –Obtain word frequency and co-occurrence statistics –Build simple interactive applications (e.g. Eliza) Regular expressions define regular languages or sets

Some Examples

REDescriptionUses? /a*/Zero or more a’sOptional doubled modifiers (words) /a+/One or more a’sNon-optional... /a?/Zero or one a’sOptional... /cat|dog/‘cat’ or ‘dog’Words modifying pets /^cat$/A line containing only ‘cat’ ?? /\bun\B/Beginnings of longer strings Words prefixed by ‘un’

happier and happier, fuzzier and fuzzier / (.+)ier and \1ier / Morphological variants of ‘puppy’/pupp(y|ies)/ E.G.RE

Substitutions (Transductions) Sed or ‘s’ operator in Perl –s/regexp1/pattern/ –s/I am feeling (.++)/You are feeling \1?/ –s/I gave (.+) to (.+)/Why would you give \2 \1?/

Examples Predictions from a news corpus: –Which candidate for Governor is mentioned most often in the news? Is going to win? –What stock should you buy? –Which White House advisers have the most power? Language use: –Which form of comparative is more frequent: ‘oftener’ or ‘more often’? –Which pronouns are conjoined most often? –How often do sentences end with infinitival ‘to’? –What words most often begin and end sentences? –What’s the most common word in your ? Is it different from your neighbor?

Personality profiling: –Are you more or less polite than the people you correspond with? –With labeled data, which words signal friendly msgs vs. unfriendly ones?

Finite State Automata FSAs recognize the regular languages represented by regular expressions –SheepTalk: /baa+!/ q0 q4 q1q2q3 ba a a! Directed graph with labeled nodes and arc transitions Five states: q0 the start state, q4 the final state, 5 transitions

Formally FSA is a 5-tuple consisting of –Q: set of states {q0,q1,q2,q3,q4} –  : an alphabet of symbols {a,b,!} –q0: a start state –F: a set of final states in Q {q4} –  (q,i): a transition function mapping Q x  to Q q0 q4 q1q2q3 ba a a!

FSA recognizes (accepts) strings of a regular language –baa! –baaa! –… Tape metaphor: a rejected input aba!b

State Transition Table for SheepTalkSheepTalk State Input ba!

Non-Deterministic FSAs for SheepTalk q0 q4 q1q2q3 ba a a! q0 q4 q1q2q3 baa! 

FSAs as Grammars for Natural Language q2q4q5q0q3q1q6 therev mr dr hon patl.robinson ms mrs  Can you use a regexpr to capture this too?

Problems of Non-Determinism ‘Natural’….but at any choice point, we may follow the wrong arc Potential solutions: –Save backup states at each choice point –Look-ahead in the input before making choice –Pursue alternatives in parallel –Determinize our NFSAs (and then minimize) FSAs can be useful tools for recognizing – and generating – subsets of natural language –But they cannot represent all NL phenomena (The mouse the cat chased died.)

Summing Up Regular expressions and FSAs can represent subsets of natural language as well as regular languages –Both representations may be impossible for humans to understand for any real subset of a language –But they are very easy to use for smaller subsets Next time: Read Ch 3 (1-2,5) For fun: –Think of ways you might characterize your using only regular expressions –Check over Homework 1Homework 1