Finite-State Methods 600.465 - Intro to NLP - J. Eisner.

Slides:

Advertisements

Similar presentations

Intro to NLP - J. Eisner1 Finite-State Methods.

Advertisements

Intro to NLP - J. Eisner1 Modeling Grammaticality [mostly a blackboard lecture]

Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.

4b Lexical analysis Finite Automata

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:

Lecture 24 MAS 714 Hartmut Klauck

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,

Intro to NLP - J. Eisner1 Finite-State Methods.

October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.

1 Regular Expressions and Automata September Lecture #2-2.

Parsing I Miriam Butt May 2005 Jurafsky and Martin, Chapters 10, 13.

Finite state automaton (FSA)

LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.

1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.

1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.

Normal forms for Context-Free Grammars

Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.

Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.

Intro to NLP - J. Eisner1 Finite-State Methods.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.

Finite State Transducers for Morphological Parsing

Human Language Technology Finite State Transducers.

Lexical Analysis: Finite Automata CS 471 September 5, 2007.

Intro to NLP - J. Eisner1 Building Finite-State Machines.

Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.

November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.

Intro to NLP - J. Eisner1 Parsing Tricks.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

CS 203: Introduction to Formal Languages and Automata

FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.

November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.

October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.

Formal Languages and Grammars

Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.

Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.

LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,

Intro to NLP - J. Eisner1 Building Finite-State Machines.

1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.

WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.

Pushdown Automata.

Part-of-Speech Tagging

Modeling Grammaticality

Lexical analysis Finite Automata

Basic Parsing with Context Free Grammars Chapter 13

Syntax Specification and Analysis

Composition is Our Friend

CSCI 5832 Natural Language Processing

RAJALAKSHMI ENGINEERING COLLEGE

CSCI 5832 Natural Language Processing

Building Finite-State Machines

CSC NLP - Regex, Finite State Automata

Parsing Tricks Intro to NLP - J. Eisner.

CHAPTER 2 Context-Free Languages

4b Lexical analysis Finite Automata

Building Finite-State Machines

Finite-State and the Noisy Channel

Chunk Parsing CS1573: AI Application Development, Spring 2003

Finite-State Programming

4b Lexical analysis Finite Automata

Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section

David Kauchak CS159 – Spring 2019

Modeling Grammaticality

Dekai Wu Presented by David Goss-Grubbs

Recap lecture 30 Deciding whether two languages are equivalent or not, example, deciding whether an FA accept any string or not, method 3, examples, finiteness.

Lecture 5 Scanning.

Morphological Parsing

Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections

Presentation transcript:

Finite-State Methods 600.465 - Intro to NLP - J. Eisner

Finite state acceptors (FSAs) Things you may know about FSAs: Equivalence to regexps Union, Kleene *, concat, intersect, complement, reversal Determinization, minimization Pumping, Myhill-Nerode a c e Solicit interesting properties of FSAs Defines the language a? c* = {a, ac, acc, accc, …, , c, cc, ccc, …} 600.465 - Intro to NLP - J. Eisner

n-gram models not good enough Want to model grammaticality A “training” sentence known to be grammatical: BOS mouse traps catch mouse traps EOS Resulting trigram model has to overgeneralize: allows sentences with 0 verbs BOS mouse traps EOS allows sentences with 2 or more verbs BOS mouse traps catch mouse traps catch mouse traps catch mouse traps EOS Can’t remember whether it’s in subject or object (i.e., whether it’s gotten to the verb yet) trigram model must allow these trigrams 600.465 - Intro to NLP - J. Eisner

Finite-state models can “get it” Want to model grammaticality BOS mouse traps catch mouse traps EOS Finite-state can capture the generalization here: Noun+ Verb Noun+ Allows arbitrarily long NPs (just keep looping around for another Noun modifier). Still, never forgets whether it’s preverbal or postverbal! (Unlike 50-gram model) Noun Verb preverbal states (still need a verb to reach final state) postverbal states (verbs no longer allowed) 600.465 - Intro to NLP - J. Eisner

How powerful are regexps / FSAs? More powerful than n-gram models The hidden state may “remember” arbitrary past context With k states, can remember which of k “types” of context it’s in Equivalent to HMMs In both cases, you observe a sequence and it is “explained” by a hidden path of states. The FSA states are like HMM tags. Appropriate for phonology and morphology Word = Syllable+ = (Onset Nucleus Coda?)+ = (C+ V+ C*)+ = ( (b|d|f|…)+ (a|e|i|o|u)+ (b|d|f|…)* )+ 600.465 - Intro to NLP - J. Eisner

How powerful are regexps / FSAs? But less powerful than CFGs / pushdown automata Can’t do recursive center-embedding Hmm, humans have trouble processing those constructions too … This is the rat that ate the malt. This is the malt that the rat ate. This is the cat that bit the rat that ate the malt. This is the malt that the rat that the cat bit ate. This is the dog that chased the cat that bit the rat that ate the malt. This is the malt that [the rat that [the cat that [the dog chased] bit] ate]. finite-state can handle this pattern (can you write the regexp?) but not this pattern, which requires a CFG 600.465 - Intro to NLP - J. Eisner

How powerful are regexps / FSAs? But less powerful than CFGs / pushdown automata More important: Less explanatory than CFGs An CFG without recursive center-embedding can be converted into an equivalent FSA – but the FSA will usually be far larger Because FSAs can’t reuse the same phrase type in different places Noun Verb S = NP Verb S = more elegant – using nonterminals like this is equivalent to a CFG converting to FSA copies the NP twice  duplicated structure Noun NP = 600.465 - Intro to NLP - J. Eisner

We’ve already used FSAs this way … CFG with regular expression on the right-hand side: X  (A | B) G H (P | Q) NP  (Det | ) Adj* N So each nonterminal has a finite-state automaton, giving a “recursive transition network (RTN)” A B P Q G H X  Det Adj N NP  Automaton state replaces dotted rule (X  A G . H P) 600.465 - Intro to NLP - J. Eisner 600.465 - Intro to NLP - J. Eisner 8

We’ve already used FSAs once .. NP  rules from the WSJ grammar become a single DFA NP  ADJP ADJP JJ JJ NN NNS | ADJP DT NN | ADJP JJ NN | ADJP JJ NN NNS | ADJP JJ NNS | ADJP NN | ADJP NN NN | ADJP NN NNS | ADJP NNS | ADJP NPR | ADJP NPRS | DT | DT ADJP | DT ADJP , JJ NN | DT ADJP ADJP NN | DT ADJP JJ JJ NN | DT ADJP JJ NN | DT ADJP JJ NN NN etc. DFA ADJP DT NP  NP regular expression ADJP  ADJ P ADJP 600.465 - Intro to NLP - J. Eisner 600.465 - Intro to NLP - J. Eisner 9

But where can we put our weights? Det Adj N NP  CFG / RTN bigram model of words or tags (first-order Markov Model) Hidden Markov Model of words and tags together?? Det Start Adj Noun Verb Prep Stop 600.465 - Intro to NLP - J. Eisner

Another useful FSA … Network Wordlist slide courtesy of L. Karttunen (modified) Another useful FSA … compile r l c a e v t h f Network clear clever ear ever fat father Wordlist To get runtime on emu in 2004, I did “time xfst < ~/tmp/j >& /dev/null”, where ~/tmp/j contained 20 Copies of the command “read text /usr/dict/words”. Total time was 12.89u 0.15s 0:13.07 99.7% FSM 17728 states, 37100 arcs /usr/dict/words 0.6 sec 25K words 206K chars 600.465 - Intro to NLP - J. Eisner

Weights are useful here too! slide courtesy of L. Karttunen (modified) Weights are useful here too! Network clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Wordlist c/0 l/0 e/0 a/0 r/0 e/2 v/1 e/0 compile f/4 a/0 t/0 h/1 Computes a perfect hash! Sum the weights along a word’s accepting path. 600.465 - Intro to NLP - J. Eisner

Example: Weighted acceptor slide courtesy of L. Karttunen (modified) Example: Weighted acceptor Network clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Wordlist r/0 l/0 c/0 a/0 e/0 v/1 e/2 t/0 h/1 f/4 c l e a r 6 2 2 2 1 1 e v e compile f a t h 2 2 2 1 Successor states partition the path set Use offsets of successor states as arc weights Q: Would this work for an arbitrary numbering of the words? Compute number of paths from each state (Q: how?) A: recursively, like DFS 600.465 - Intro to NLP - J. Eisner

Example: Unweighted transducer VP [head=vouloir,...] V[head=vouloir, tense=Present, num=SG, person=P3] ... the problem of morphology (“word shape”) - an area of linguistics veut 600.465 - Intro to NLP - J. Eisner

Example: Unweighted transducer slide courtesy of L. Karttunen (modified) Example: Unweighted transducer VP [head=vouloir,...] veut vouloir +Pres +Sing + P3 Finite-state transducer V[head=vouloir, tense=Present, num=SG, person=P3] ... veut inflected form canonical form inflection codes v o u l i r +Pres +Sing +P3 e t the relevant path 600.465 - Intro to NLP - J. Eisner

Example: Unweighted transducer slide courtesy of L. Karttunen Example: Unweighted transducer Bidirectional: generation or analysis Compact and fast Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, ... Research systems for many other languages, including Arabic, Malay veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical form inflection codes v o u l i r +Pres +Sing +P3 e t the relevant path 600.465 - Intro to NLP - J. Eisner

What is a function? Formally, a set of <input, output> pairs where each input ∈ domain, output ∈ co-domain. Moreover, ∀x ∈ domain, ∃ exactly one y such that <x,y> ∈ the function. square: int  int = { <0,0>, <1,1>, <-1,1>, <2,4>, <-2,4>, <3,9>, <-3,9>, ... } domain(square) = {0, 1, -1, 2, -2, 3, -3, …} range(square) = {0, 1, 4, 9, ...} domain co-domain 600.465 - Intro to NLP - J. Eisner

What is a relation? Is inverse(square) a function? square: int  int = { <0,0>, <1,1>, <-1,1>, <2,4>, <-2,4>, <3,9>, <-3,9>, ... } inverse(square): int  int = { <0,0>, <1,1>, <1,-1>, <4,2>, <4,-2>, <9,3>, <9,-3>, ... } Is inverse(square) a function? 0 ↦ {0} 9 ↦ {3,-3} 7 ↦ {} -1 ↦ {} No - we need a more general notion! A relation is any set of <input, output> pairs 600.465 - Intro to NLP - J. Eisner

Regular Relation (of strings) Relation: like a function, but multiple outputs ok Regular: finite-state Transducer: automaton w/ outputs b  ? a  ? aaaaa  ? a:e b:b {b} {} a:a a:c By way of review – or maybe first time for some of you! Computes functions on strings – which is essentially what phonology (e.g.) is all about (non-string representations in phonology can still be encoded as strings) b -> {b} a -> {} might look like -> {b} but it dead-ends. aaaaa -> {ac, aca, acab, acabc} Notice that if we’re not careful we get stuck – aaaaa path is no good. These relations are closed under composition & invertible, very nice {ac, aca, acab, acabc} ?:c b:b ?:b Invertible? Closed under composition? ?:a b:e 600.465 - Intro to NLP - J. Eisner

Regular Relation (of strings) Can weight the arcs:  vs.  b  {b} a  {} aaaaa  {ac, aca, acab, acabc} How to find best outputs? For aaaaa? For all inputs at once? a:e b:b a:a a:c If we weight, then we try to avoid output c’s in this case. We can do that for input b easily. For output aaaaa we can’t, since any way of getting to final state has at least one c. But we prefer ac, aca, acab over acabc which has two c’s. For a given input string, we can find set of lowest-cost paths, here ac, aca, acab. How? Answer: Dijkstra. Do you think we can strip the entire machine down to just its best paths? ?:c b:b ?:b ?:a b:e 600.465 - Intro to NLP - J. Eisner

Function from strings to ... Acceptors (FSAs) Transducers (FSTs) a:x c:z e:y a c e {false, true} strings Unweighted Weighted a/.5 c/.7 e/.5 .3 a:x/.5 c:z/.7 e:y/.5 .3 numbers (string, num) pairs 600.465 - Intro to NLP - J. Eisner

Sample functions Acceptors (FSAs) Transducers (FSTs) {false, true} strings Grammatical? Markup Correction Translation Unweighted numbers (string, num) pairs How grammatical? Better, how probable? Good markups Good corrections Good translations Weighted 600.465 - Intro to NLP - J. Eisner

Terminology (acceptors) Regular language defines recognizes compiles into Regexp FSA implements accepts matches (or generates) matches String 600.465 - Intro to NLP - J. Eisner

Terminology (transducers) Regular relation defines recognizes compiles into Regexp FST implements accepts matches (or generates) (or, transduces one string of the pair into the other) matches ? String pair 600.465 - Intro to NLP - J. Eisner

Perspectives on a Transducer Remember these CFG perspectives: Similarly, 3 views of a transducer: Given 0 strings, generate a new string pair (by picking a path) Given one string (upper or lower), transduce it to the other kind Given two strings (upper & lower), decide whether to accept the pair FST just defines the regular relation (mathematical object: set of pairs). What’s “input” and “output” depends on what one asks about the relation. The 0, 1, or 2 given string(s) constrain which paths you can use. 3 views of a context-free rule generation (production): S  NP VP parsing (comprehension): S  NP VP verification (checking): S = NP VP (randsent) (parse) v o u l i r +Pres +Sing +P3 e t 600.465 - Intro to NLP - J. Eisner

Functions ab?d abcd f abcd g 600.465 - Intro to NLP - J. Eisner

Function composition: f  g Functions ab?d abcd Function composition: f  g [first f, then g – intuitive notation, but opposite of the traditional math notation] Like the Unix pipe: cat x | f | g > y Example: Pass the input through a sequence of ciphers 600.465 - Intro to NLP - J. Eisner

From Functions to Relations abcd g f 3 2 6 4 2 8 ab?d abcd abed abjd abed abd ... 600.465 - Intro to NLP - J. Eisner

From Functions to Relations 3 4 2 8 ab?d abcd 2 2 abed Relation composition: f  g 6 abd ... 600.465 - Intro to NLP - J. Eisner

From Functions to Relations 3+4 2+2 2+8 ab?d abcd abed Relation composition: f  g abd ... 600.465 - Intro to NLP - J. Eisner

From Functions to Relations 2+2 ab?d Pick min-cost or max-prob output abed Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. 600.465 - Intro to NLP - J. Eisner

Inverting Relations g f abcd ab?d abcd abed abjd abed abd 3 2 6 4 2 8 ... 600.465 - Intro to NLP - J. Eisner

Inverting Relations g-1 f-1 ab?d abcd abcd abed abed abd abjd 3 4 2 2 8 6 abd abjd ... 600.465 - Intro to NLP - J. Eisner

Inverting Relations (f  g)-1 = g-1  f-1 ab?d abcd abed abd 3+4 2+2 2+8 ab?d abcd abed (f  g)-1 = g-1  f-1 abd ... 600.465 - Intro to NLP - J. Eisner

Building a lexical transducer slide courtesy of L. Karttunen (modified) Building a lexical transducer r l c a e v t h f big | clear | clever | ear | fat | ... Regular Expression Lexicon Lexicon FSA Compiler Lexical Transducer (a single FST) composition Regular Expressions for Rules Composed Rule FSTs b i g +Adj r +Comp e one path 600.465 - Intro to NLP - J. Eisner

Building a lexical transducer slide courtesy of L. Karttunen (modified) r l c a e v t h f big | clear | clever | ear | fat | ... Regular Expression Lexicon Lexicon FSA Actually, the lexicon must contain elements like big +Adj +Comp So write it as a more complicated expression: (big | clear | clever | fat | ...) +Adj ( | +Comp | +Sup)  adjectives | (ear | father | ...) +Noun (+Sing | +Pl)  nouns | ...  ... Q: Why do we need a lexicon at all? 600.465 - Intro to NLP - J. Eisner

Weighted version of transducer: Assigns a weight to each string pair slide courtesy of L. Karttunen (modified) Weighted version of transducer: Assigns a weight to each string pair suivre+Imp+SG + P2 suivre+IndP+SG+P2 suivre+IndP+SG+P1 être+IndP +SG + P1 “upper language” 4 19 payer+IndP+SG+P1 20 12 50 Weighted French Transducer 3 suis paie “lower language” paye 600.465 - Intro to NLP - J. Eisner