Statistical NLP Winter 2009

Slides:



Advertisements
Similar presentations
Formal Languages: main findings so far
Advertisements

Formal Languages: main findings so far A problem can be formalised as a formal language A formal language can be defined in various ways, e.g.: the language.
Natural Language Processing - Formal Language - (formal) Language (formal) Grammar.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
CS5371 Theory of Computation
1 Languages and Finite Automata or how to talk to machines...
Fall 2005 CSE 467/567 1 Formal languages regular expressions regular languages finite state machines.
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
Week 14 - Friday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Natural Language Processing Lecture 6 : Revision.
Some Probability Theory and Computational models A short overview.
Grammars CPSC 5135.
Week 14 - Wednesday.  What did we talk about last time?  Regular expressions  Introduction to finite state automata.
CSA3050: Natural Language Algorithms Finite State Devices.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Finite Automata – Definition and Examples Lecture 6 Section 1.1 Mon, Sep 3, 2007.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Finite Automata Chapter 1. Automatic Door Example Top View.
Formal Languages Finite Automata Dr.Hamed Alrjoub 1FA1.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Week 14 - Wednesday.  What did we talk about last time?  Exam 3 post mortem  Finite state automata  Equivalence with regular expressions.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Week 13 - Friday.  What did we talk about last time?  Regular expressions.
Theory of Computation Automata Theory Dr. Ayman Srour.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Theory of Computation Automata Theory Dr. Ayman Srour.
Theory of Languages and Automata By: Mojtaba Khezrian.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
Formal Methods in software development
Theory of Languages and Automata
Linear Bounded Automata LBAs
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Context Sensitive Grammar & Turing Machines
Context Sensitive Languages and Linear Bounded Automata
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
CSE 105 theory of computation
CS314 – Section 5 Recitation 3
FORMAL LANGUAGES AND AUTOMATA THEORY
Formal Language Theory
CSE322 Chomsky classification
Formal Language.
CSE322 The Chomsky Hierarchy
COSC 3340: Introduction to Theory of Computation
Finite Automata.
Regular Expressions
Pushdown automata a_introduction.htm.
Regular Expressions and Automata in Language Analysis
Compiler Construction
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
CSE 105 theory of computation
Pushdown automata The Chinese University of Hong Kong Fall 2011
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Teori Bahasa dan Automata Lecture 6: Regular Expression
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
The Chomsky Hierarchy Costas Busch - LSU.
COMPILER CONSTRUCTION
CSE 105 theory of computation
Presentation transcript:

Statistical NLP Winter 2009 Lecture 7: Grammar formalisms: the tools of mathematical linguistics (weighted) finite-state automata and (weighted) context-free grammars Roger Levy

Language structure so far So far in class, we haven’t dealt much with structured representations of language A document consists of a sequence of sentences A sentence consists of a sequence of words We haven’t looked at anything in between, or farther down But there’s lots more structure in language! Words are comprised of morphemes Words are grouped into syntactic categories (parts of speech) Words combine into phrases Today we’ll talk about formal means for describing and computing these structures

Regular expressions You’ve almost certainly worked with grep before grep takes a regular expression Regular expressions can be quite rich

Finite state automata (FSAs) A Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A transition function δ(q,i) mapping from Q×Σ to Q An FSA accepts a string s if recursive application of δ leads to a final state Most accessibly represented in a graphical format Q={q0,q1} Σ={a,b} F={q1} δ={(q0,a)=q1,(q1,b)=q1}

Regular expressions and FSAs For every regular expression R, there is an FSA that accepts exactly those strings in R, and vice versa Example: ([sp]end(d|ds|ding|t))|(ship(s|ped|ping)?) However, in general there are many FSAs and regexs accepting the same set of strings.

Intersection FSAs are closed under intersection + =

The Chomsky Hierarchy Finite languages are uninteresting Regular languages: FSAs. There are richer classes! Finite languages Regular languages Context-free languages Context-sensitive languages Type 0 languages

Adding weights to FSAs FSAs can also have weights associated with their transition function A Weighted Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A semiring R A transition function δ(q,i) mapping from Q×Σ to Q×R These weights can have many interpretations A common one is “cost” (log-probability)

Probabilistic Linguistic Knowledge A generative probabilistic grammar determines beliefs about which strings are likely to be seen Probabilistic Context-Free Grammars (PCFGs; Booth, 1969) Probabilistic Minimalist Grammars (Hale, 2006) Probabilistic Finite-State Grammars (Mohri, 1997; Crocker & Brants 2000) In position 1, {a,b,c,d} equally likely; but in position 2: {a,b} are usually followed by e, occasionally by f {c,d} are usually followed by f, occasionally by e Cost (Log-probability) Input symbol

Probabilistic intersection Bayes’s rule says that posterior = evidence * prior In log space * becomes +

Intersecting weighted FSAs Bayes’ Rule says that the evidence and the prior should be combined (multiplied) For probabilistic grammars, this combination is the formal operation of intersection (see also Hale, 2006) grammar + input This is input1_1 combined with grammar Grammar affects beliefs about the future = BELIEF

{b,c} {f,e} {b,c} {?}