Download presentation
Presentation is loading. Please wait.
Published byShinta Tanudjaja Modified over 6 years ago
1
Finite-State Methods Intro to NLP - J. Eisner
2
Finite state acceptors (FSAs)
Things you may know about FSAs: Equivalence to regexps Union, Kleene *, concat, intersect, complement, reversal Determinization, minimization Pumping, Myhill-Nerode a c e Solicit interesting properties of FSAs Defines the language a? c* = {a, ac, acc, accc, …, , c, cc, ccc, …} Intro to NLP - J. Eisner
3
n-gram models not good enough
Want to model grammaticality A “training” sentence known to be grammatical: BOS mouse traps catch mouse traps EOS Resulting trigram model has to overgeneralize: allows sentences with 0 verbs BOS mouse traps EOS allows sentences with 2 or more verbs BOS mouse traps catch mouse traps catch mouse traps catch mouse traps EOS Can’t remember whether it’s in subject or object (i.e., whether it’s gotten to the verb yet) trigram model must allow these trigrams Intro to NLP - J. Eisner
4
Finite-state models can “get it”
Want to model grammaticality BOS mouse traps catch mouse traps EOS Finite-state can capture the generalization here: Noun+ Verb Noun+ Allows arbitrarily long NPs (just keep looping around for another Noun modifier). Still, never forgets whether it’s preverbal or postverbal! (Unlike 50-gram model) Noun Verb preverbal states (still need a verb to reach final state) postverbal states (verbs no longer allowed) Intro to NLP - J. Eisner
5
How powerful are regexps / FSAs?
More powerful than n-gram models The hidden state may “remember” arbitrary past context With k states, can remember which of k “types” of context it’s in Equivalent to HMMs In both cases, you observe a sequence and it is “explained” by a hidden path of states. The FSA states are like HMM tags. Appropriate for phonology and morphology Word = Syllable+ = (Onset Nucleus Coda?)+ = (C+ V+ C*)+ = ( (b|d|f|…)+ (a|e|i|o|u)+ (b|d|f|…)* )+ Intro to NLP - J. Eisner
6
How powerful are regexps / FSAs?
But less powerful than CFGs / pushdown automata Can’t do recursive center-embedding Hmm, humans have trouble processing those constructions too … This is the rat that ate the malt. This is the malt that the rat ate. This is the cat that bit the rat that ate the malt. This is the malt that the rat that the cat bit ate. This is the dog that chased the cat that bit the rat that ate the malt. This is the malt that [the rat that [the cat that [the dog chased] bit] ate]. finite-state can handle this pattern (can you write the regexp?) but not this pattern, which requires a CFG Intro to NLP - J. Eisner
7
How powerful are regexps / FSAs?
But less powerful than CFGs / pushdown automata More important: Less explanatory than CFGs An CFG without recursive center-embedding can be converted into an equivalent FSA – but the FSA will usually be far larger Because FSAs can’t reuse the same phrase type in different places Noun Verb S = NP Verb S = more elegant – using nonterminals like this is equivalent to a CFG converting to FSA copies the NP twice duplicated structure Noun NP = Intro to NLP - J. Eisner
8
We’ve already used FSAs this way …
CFG with regular expression on the right-hand side: X (A | B) G H (P | Q) NP (Det | ) Adj* N So each nonterminal has a finite-state automaton, giving a “recursive transition network (RTN)” A B P Q G H X Det Adj N NP Automaton state replaces dotted rule (X A G . H P) Intro to NLP - J. Eisner Intro to NLP - J. Eisner 8
9
We’ve already used FSAs once ..
NP rules from the WSJ grammar become a single DFA NP ADJP ADJP JJ JJ NN NNS | ADJP DT NN | ADJP JJ NN | ADJP JJ NN NNS | ADJP JJ NNS | ADJP NN | ADJP NN NN | ADJP NN NNS | ADJP NNS | ADJP NPR | ADJP NPRS | DT | DT ADJP | DT ADJP , JJ NN | DT ADJP ADJP NN | DT ADJP JJ JJ NN | DT ADJP JJ NN | DT ADJP JJ NN NN etc. DFA ADJP DT NP NP regular expression ADJP ADJ P ADJP Intro to NLP - J. Eisner Intro to NLP - J. Eisner 9
10
But where can we put our weights?
Det Adj N NP CFG / RTN bigram model of words or tags (first-order Markov Model) Hidden Markov Model of words and tags together?? Det Start Adj Noun Verb Prep Stop Intro to NLP - J. Eisner
11
Another useful FSA … Network Wordlist
slide courtesy of L. Karttunen (modified) Another useful FSA … compile r l c a e v t h f Network clear clever ear ever fat father Wordlist To get runtime on emu in 2004, I did “time xfst < ~/tmp/j >& /dev/null”, where ~/tmp/j contained 20 Copies of the command “read text /usr/dict/words”. Total time was 12.89u 0.15s 0: % FSM 17728 states, arcs /usr/dict/words 0.6 sec 25K words 206K chars Intro to NLP - J. Eisner
12
Weights are useful here too!
slide courtesy of L. Karttunen (modified) Weights are useful here too! Network clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Wordlist c/0 l/0 e/0 a/0 r/0 e/2 v/1 e/0 compile f/4 a/0 t/0 h/1 Computes a perfect hash! Sum the weights along a word’s accepting path. Intro to NLP - J. Eisner
13
Example: Weighted acceptor
slide courtesy of L. Karttunen (modified) Example: Weighted acceptor Network clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Wordlist r/0 l/0 c/0 a/0 e/0 v/1 e/2 t/0 h/1 f/4 c l e a r 6 2 2 2 1 1 e v e compile f a t h 2 2 2 1 Successor states partition the path set Use offsets of successor states as arc weights Q: Would this work for an arbitrary numbering of the words? Compute number of paths from each state (Q: how?) A: recursively, like DFS Intro to NLP - J. Eisner
14
Example: Unweighted transducer
VP [head=vouloir,...] V[head=vouloir, tense=Present, num=SG, person=P3] ... the problem of morphology (“word shape”) - an area of linguistics veut Intro to NLP - J. Eisner
15
Example: Unweighted transducer
slide courtesy of L. Karttunen (modified) Example: Unweighted transducer VP [head=vouloir,...] veut vouloir +Pres +Sing + P3 Finite-state transducer V[head=vouloir, tense=Present, num=SG, person=P3] ... veut inflected form canonical form inflection codes v o u l i r +Pres +Sing +P3 e t the relevant path Intro to NLP - J. Eisner
16
Example: Unweighted transducer
slide courtesy of L. Karttunen Example: Unweighted transducer Bidirectional: generation or analysis Compact and fast Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, ... Research systems for many other languages, including Arabic, Malay veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical form inflection codes v o u l i r +Pres +Sing +P3 e t the relevant path Intro to NLP - J. Eisner
17
What is a function? Formally, a set of <input, output> pairs where each input ∈ domain, output ∈ co-domain. Moreover, ∀x ∈ domain, ∃ exactly one y such that <x,y> ∈ the function. square: int int = { <0,0>, <1,1>, <-1,1>, <2,4>, <-2,4>, <3,9>, <-3,9>, ... } domain(square) = {0, 1, -1, 2, -2, 3, -3, …} range(square) = {0, 1, 4, 9, ...} domain co-domain Intro to NLP - J. Eisner
18
What is a relation? Is inverse(square) a function?
square: int int = { <0,0>, <1,1>, <-1,1>, <2,4>, <-2,4>, <3,9>, <-3,9>, ... } inverse(square): int int = { <0,0>, <1,1>, <1,-1>, <4,2>, <4,-2>, <9,3>, <9,-3>, ... } Is inverse(square) a function? 0 ↦ {0} ↦ {3,-3} 7 ↦ {} ↦ {} No - we need a more general notion! A relation is any set of <input, output> pairs Intro to NLP - J. Eisner
19
Regular Relation (of strings)
Relation: like a function, but multiple outputs ok Regular: finite-state Transducer: automaton w/ outputs b ? a ? aaaaa ? a:e b:b {b} {} a:a a:c By way of review – or maybe first time for some of you! Computes functions on strings – which is essentially what phonology (e.g.) is all about (non-string representations in phonology can still be encoded as strings) b -> {b} a -> {} might look like -> {b} but it dead-ends. aaaaa -> {ac, aca, acab, acabc} Notice that if we’re not careful we get stuck – aaaaa path is no good. These relations are closed under composition & invertible, very nice {ac, aca, acab, acabc} ?:c b:b ?:b Invertible? Closed under composition? ?:a b:e Intro to NLP - J. Eisner
20
Regular Relation (of strings)
Can weight the arcs: vs. b {b} a {} aaaaa {ac, aca, acab, acabc} How to find best outputs? For aaaaa? For all inputs at once? a:e b:b a:a a:c If we weight, then we try to avoid output c’s in this case. We can do that for input b easily. For output aaaaa we can’t, since any way of getting to final state has at least one c. But we prefer ac, aca, acab over acabc which has two c’s. For a given input string, we can find set of lowest-cost paths, here ac, aca, acab. How? Answer: Dijkstra. Do you think we can strip the entire machine down to just its best paths? ?:c b:b ?:b ?:a b:e Intro to NLP - J. Eisner
21
Function from strings to ...
Acceptors (FSAs) Transducers (FSTs) a:x c:z e:y a c e {false, true} strings Unweighted Weighted a/.5 c/.7 e/.5 .3 a:x/.5 c:z/.7 e:y/.5 .3 numbers (string, num) pairs Intro to NLP - J. Eisner
22
Sample functions Acceptors (FSAs) Transducers (FSTs) {false, true}
strings Grammatical? Markup Correction Translation Unweighted numbers (string, num) pairs How grammatical? Better, how probable? Good markups Good corrections Good translations Weighted Intro to NLP - J. Eisner
23
Terminology (acceptors)
Regular language defines recognizes compiles into Regexp FSA implements accepts matches (or generates) matches String Intro to NLP - J. Eisner
24
Terminology (transducers)
Regular relation defines recognizes compiles into Regexp FST implements accepts matches (or generates) (or, transduces one string of the pair into the other) matches ? String pair Intro to NLP - J. Eisner
25
Perspectives on a Transducer
Remember these CFG perspectives: Similarly, 3 views of a transducer: Given 0 strings, generate a new string pair (by picking a path) Given one string (upper or lower), transduce it to the other kind Given two strings (upper & lower), decide whether to accept the pair FST just defines the regular relation (mathematical object: set of pairs). What’s “input” and “output” depends on what one asks about the relation. The 0, 1, or 2 given string(s) constrain which paths you can use. 3 views of a context-free rule generation (production): S NP VP parsing (comprehension): S NP VP verification (checking): S = NP VP (randsent) (parse) v o u l i r +Pres +Sing +P3 e t Intro to NLP - J. Eisner
26
Functions ab?d abcd f abcd g Intro to NLP - J. Eisner
27
Function composition: f g
Functions ab?d abcd Function composition: f g [first f, then g – intuitive notation, but opposite of the traditional math notation] Like the Unix pipe: cat x | f | g > y Example: Pass the input through a sequence of ciphers Intro to NLP - J. Eisner
28
From Functions to Relations
abcd g f 3 2 6 4 2 8 ab?d abcd abed abjd abed abd ... Intro to NLP - J. Eisner
29
From Functions to Relations
3 4 2 8 ab?d abcd 2 2 abed Relation composition: f g 6 abd ... Intro to NLP - J. Eisner
30
From Functions to Relations
3+4 2+2 2+8 ab?d abcd abed Relation composition: f g abd ... Intro to NLP - J. Eisner
31
From Functions to Relations
2+2 ab?d Pick min-cost or max-prob output abed Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. Intro to NLP - J. Eisner
32
Inverting Relations g f abcd ab?d abcd abed abjd abed abd 3 2 6 4 2 8
... Intro to NLP - J. Eisner
33
Inverting Relations g-1 f-1 ab?d abcd abcd abed abed abd abjd 3 4 2 2
8 6 abd abjd ... Intro to NLP - J. Eisner
34
Inverting Relations (f g)-1 = g-1 f-1 ab?d abcd abed abd 3+4 2+2
2+8 ab?d abcd abed (f g)-1 = g-1 f-1 abd ... Intro to NLP - J. Eisner
35
Building a lexical transducer
slide courtesy of L. Karttunen (modified) Building a lexical transducer r l c a e v t h f big | clear | clever | ear | fat | ... Regular Expression Lexicon Lexicon FSA Compiler Lexical Transducer (a single FST) composition Regular Expressions for Rules Composed Rule FSTs b i g +Adj r +Comp e one path Intro to NLP - J. Eisner
36
Building a lexical transducer
slide courtesy of L. Karttunen (modified) r l c a e v t h f big | clear | clever | ear | fat | ... Regular Expression Lexicon Lexicon FSA Actually, the lexicon must contain elements like big +Adj +Comp So write it as a more complicated expression: (big | clear | clever | fat | ...) +Adj ( | +Comp | +Sup) adjectives | (ear | father | ...) +Noun (+Sing | +Pl) nouns | ... ... Q: Why do we need a lexicon at all? Intro to NLP - J. Eisner
37
Weighted version of transducer: Assigns a weight to each string pair
slide courtesy of L. Karttunen (modified) Weighted version of transducer: Assigns a weight to each string pair suivre+Imp+SG + P2 suivre+IndP+SG+P2 suivre+IndP+SG+P1 être+IndP +SG + P1 “upper language” 4 19 payer+IndP+SG+P1 20 12 50 Weighted French Transducer 3 suis paie “lower language” paye Intro to NLP - J. Eisner
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.