600.465 - Intro to NLP - J. Eisner1 Finite-State Methods.

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Finite-State Programming Some Examples.
Advertisements

Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
Intro to NLP - J. Eisner1 Syntactic Attributes Morphology, heads, gaps, etc. Note: The properties of nonterminal symbols are often called “features.”
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Intro to NLP - J. Eisner1 Modeling Grammaticality [mostly a blackboard lecture]
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
4b Lexical analysis Finite Automata
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Intro to NLP - J. Eisner1 Finite-State Methods.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
1 Regular Expressions and Automata September Lecture #2-2.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Parsing I Miriam Butt May 2005 Jurafsky and Martin, Chapters 10, 13.
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Normal forms for Context-Free Grammars
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
Final Exam Review Cummulative Chapters 0, 1, 2, 3, 4, 5 and 7.
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
111 Computability, etc. Recap. Present homework. Grammar and machine equivalences. Turing history Homework: Equivalence assignments. Presentation proposals,
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Intro to NLP - J. Eisner1 Finite-State Methods.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Finite State Transducers for Morphological Parsing
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Human Language Technology Finite State Transducers.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
CSA2050 Introduction to Computational Linguistics Parsing I.
CSA3050: Natural Language Algorithms Finite State Devices.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
Intro to NLP - J. Eisner1 Parsing Tricks.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Finite State LanguagesCSE Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Part-of-Speech Tagging
Modeling Grammaticality
Composition is Our Friend
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
CSC NLP - Regex, Finite State Automata
Parsing Tricks Intro to NLP - J. Eisner.
Finite-State Methods Intro to NLP - J. Eisner.
4b Lexical analysis Finite Automata
Building Finite-State Machines
Finite-State and the Noisy Channel
4b Lexical analysis Finite Automata
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
David Kauchak CS159 – Spring 2019
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Modeling Grammaticality
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Lecture 5 Scanning.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

Intro to NLP - J. Eisner1 Finite-State Methods

Intro to NLP - J. Eisner2 Finite state acceptors (FSAs)   Things you may know about FSAs:   Equivalence to regexps   Union, Kleene *, concat, intersect, complement, reversal   Determinization, minimization   Pumping, Myhill-Nerode a c  Defines the language a? c* = {a, ac, acc, accc, …, , c, cc, ccc, …}

Intro to NLP - J. Eisner3 n-gram models not good enough  Want to model grammaticality  A “training” sentence known to be grammatical: BOS mouse traps catch mouse traps EOS  Resulting trigram model has to overgeneralize:  allows sentences with 0 verbs  allows sentences with 0 verbs BOS mouse traps EOS  allows sentences with 2 or more verbs  allows sentences with 2 or more verbs BOS mouse traps catch mouse traps catch mouse traps catch mouse traps EOS  Can’t remember whether it’s in subject or object (i.e., whether it’s gotten to the verb yet) trigram model must allow these trigrams

Intro to NLP - J. Eisner4  Want to model grammaticality BOS mouse traps catch mouse traps EOS  Finite-state can capture the generalization here: Finite-state models can “get it” Noun+ Verb Noun+ Noun Verb Noun preverbal states (still need a verb to reach final state) postverbal states (verbs no longer allowed) Allows arbitrarily long NPs (just keep looping around for another Noun modifier). Still, never forgets whether it’s preverbal or postverbal! (Unlike 50-gram model)

Intro to NLP - J. Eisner5 How powerful are regexps / FSAs?  More powerful than n-gram models  The hidden state may “remember” arbitrary past context  With k states, can remember which of k “types” of context it’s in  Equivalent to HMMs  In both cases, you observe a sequence and it is “explained” by a hidden path of states. The FSA states are like HMM tags.  Appropriate for phonology and morphology Word = Syllable+ = (Onset Nucleus Coda?)+ = (C+ V+ C*)+ = ( (b|d|f|…)+ (a|e|i|o|u)+ (b|d|f|…)* )+

Intro to NLP - J. Eisner6 How powerful are regexps / FSAs?  But less powerful than CFGs / pushdown automata  Can’t do recursive center-embedding  Hmm, humans have trouble processing those constructions too …   This is the rat that ate the malt.   This is the malt that the rat ate.   This is the cat that bit the rat that ate the malt.   This is the malt that the rat that the cat bit ate.   This is the dog that chased the cat that bit the rat that ate the malt.   This is the malt that [the rat that [the cat that [the dog chased] bit] ate]. finite-state can handle this pattern (can you write the regexp?) but not this pattern, which requires a CFG

Intro to NLP - J. Eisner7 How powerful are regexps / FSAs?  But less powerful than CFGs / pushdown automata  More important: Less explanatory than CFGs  An CFG without recursive center-embedding can be converted into an equivalent FSA – but the FSA will usually be far larger  Because FSAs can’t reuse the same phrase type in different places Noun Verb Noun S = duplicated structure Noun NP = NPVerbNP S = more elegant – using nonterminals like this is equivalent to a CFG converting to FSA copies the NP twice 

Intro to NLP - J. Eisner8 8 We’ve already used FSAs this way …  CFG with regular expression on the right-hand side: X  (A | B) G H (P | Q) NP  (Det |  ) Adj* N  So each nonterminal has a finite-state automaton, giving a “recursive transition network (RTN)” A B P Q GH X  Det Adj N NP  N Automaton state replaces dotted rule ( X  A G. H P )

Intro to NLP - J. Eisner9 9 We’ve already used FSAs once.. NP  rules from the WSJ grammar become a single DFA NP  ADJP ADJP JJ JJ NN NNS | ADJP DT NN | ADJP JJ NN | ADJP JJ NN NNS | ADJP JJ NNS | ADJP NN | ADJP NN NN | ADJP NN NNS | ADJP NNS | ADJP NPR | ADJP NPRS | DT | DT ADJP | DT ADJP, JJ NN | DT ADJP ADJP NN | DT ADJP JJ JJ NN | DT ADJP JJ NN | DT ADJP JJ NN NN etc. regular expression DFA ADJP DT NP  NP ADJP  ADJ P ADJP

Intro to NLP - J. Eisner10 But where can we put our weights?  CFG / RTN  bigram model of words or tags (first-order Markov Model)  Hidden Markov Model of words and tags together?? Det Adj N NP  N Det Start Adj Noun Verb Prep Stop

Intro to NLP - J. Eisner11 Another useful FSA … /usr/dict/words FSM states, arcs 0.6 sec 25K words 206K chars clear clever ear ever fat father Wordlist compile rlcae v e e t h f a Network slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner12 Weights are useful here too! slide courtesy of L. Karttunen (modified) clear0 clever1 ear2 ever3 fat4 father5 Wordlist compile Network r/0l/0 c/0 a/0e/0 v/1 e/0 e/2 t/0 h/1 f/4 a/0 Computes a perfect hash!

Intro to NLP - J. Eisner13   Successor states partition the path set   Use offsets of successor states as arc weights   Q: Would this work for an arbitrary numbering of the words? Example: Weighted acceptor slide courtesy of L. Karttunen (modified) clear0 clever1 ear2 ever3 fat4 father5 Wordlist compile Network rl c ae 1 v e 2 2 e t h f 2 a   Compute number of paths from each state (Q: how?) r/0l/0 c/0 a/0e/0 v/1 e/0 e/2 t/0 h/1 f/4 a/0 A: recursively, like DFS

Intro to NLP - J. Eisner14 Example: Unweighted transducer VP VP [head=vouloir,...] V V [head=vouloir, tense=Present, num=SG, person=P3]... veut the problem of morphology (“word shape”) - an area of linguistics

Intro to NLP - J. Eisner15 Example: Unweighted transducer veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical forminflection codes vouloir+Pres+Sing+P3 veut slide courtesy of L. Karttunen (modified) VP VP [head=vouloir,...] V V [head=vouloir, tense=Present, num=SG, person=P3]... veut the relevant path

Intro to NLP - J. Eisner16 veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical forminflection codes vouloir+Pres+Sing+P3 veut Example: Unweighted transducer   Bidirectional: generation or analysis   Compact and fast   Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese,...   Research systems for many other languages, including Arabic, Malay slide courtesy of L. Karttunen the relevant path

Intro to NLP - J. Eisner17 Example: Weighted Transducer c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:  position in upper string position in lower string Edit distance: Cost of best path relating these two strings?

Intro to NLP - J. Eisner18   Relation: like a function, but multiple outputs ok   Regular: finite-state   Transducer: automaton w/ outputs   b  ? a  ?   aaaaa  ? Regular Relation (of strings) b:b a:a a:a: a:c b:b: b:b ?:c ?:a ?:b {b} {} {ac, aca, acab, acabc}   Invertible?   Closed under composition?

Intro to NLP - J. Eisner19   Can weight the arcs:  vs.    b  {b} a  {}   aaaaa  {ac, aca, acab, acabc}   How to find best outputs?   For aaaaa?   For all inputs at once? Regular Relation (of strings) b:b a:a a:a: a:ca:c b:b: b:b ?:c?:c ?:a ?:b

Intro to NLP - J. Eisner20 Function from strings to... a:x/.5 c:z/.7  :y/.5.3 Acceptors (FSAs)Transducers (FSTs) a:x c:z :y:y a c  Unweighted Weighted a/.5 c/.7  /.5.3 {false, true}strings numbers(string, num) pairs

Intro to NLP - J. Eisner21 Sample functions Acceptors (FSAs)Transducers (FSTs) Unweighted Weighted {false, true}strings numbers(string, num) pairs Grammatical? How grammatical? Better, how likely? Markup Correction Translation Good markups Good corrections Good translations

Intro to NLP - J. Eisner22 Terminology (acceptors) String Regexp FSA accepts matches compiles into implements Regular language defines recognizes (or generates)

Intro to NLP - J. Eisner23 Terminology (transducers) String pair Regexp FST matches compiles into implements Regular relation defines recognizes (or, transduces one string of the pair into the other) accepts (or generates) ?

Intro to NLP - J. Eisner24 Perspectives on a Transducer  Remember these CFG perspectives:  Similarly, 3 views of a transducer:  Given 0 strings, generate a new string pair (by picking a path)  Given one string (upper or lower), transduce it to the other kind  Given two strings (upper & lower), decide whether to accept the pair FST just defines the regular relation (mathematical object: set of pairs). What’s “input” and “output” depends on what one asks about the relation. The 0, 1, or 2 given string(s) constrain which paths you can use. 3 views of a context-free rule   generation (production): S  NP VP   parsing (comprehension): S  NP VP   verification (checking): S = NP VP (randsent) (parse) vouloir+Pres+Sing+P3 veut

Intro to NLP - J. Eisner25 Functions ab?dabcd f  g

Intro to NLP - J. Eisner26 Functions ab?d  Function composition: f  g [first f, then g – intuitive notation, but opposite of the traditional math notation]

Intro to NLP - J. Eisner27 From Functions to Relations ab?dabcd  abed abjd ... f  g

Intro to NLP - J. Eisner28 From Functions to Relations ab?d   ... Relation composition: f  g

Intro to NLP - J. Eisner29 From Functions to Relations ab?d   ... Relation composition: f  g

Intro to NLP - J. Eisner30 From Functions to Relations ab?d  Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. Pick min-cost or max-prob output 2+2

Intro to NLP - J. Eisner31 Inverting Relations ab?dabcd  abed abjd ... f  g

Intro to NLP - J. Eisner32 Inverting Relations ab?dabcd  abed abjd ... f -1  g -1

Intro to NLP - J. Eisner33 Inverting Relations ab?d   ... (f  g) -1 = g -1  f

Intro to NLP - J. Eisner34 Building a lexical transducer Regular Expression Lexicon FSA Compiler Regular Expressions for Rules Composed Rule FSTs Lexical Transducer (a single FST) composition slide courtesy of L. Karttunen (modified) big | clear | clever | ear | fat |... rlcae v e e t h f a big +Adj r +Comp big ge one path

Intro to NLP - J. Eisner35 Building a lexical transducer  Actually, the lexicon must contain elements like big +Adj +Comp  So write it as a more complicated expression: (big | clear | clever | fat |...) +Adj (  | +Comp | +Sup)  adjectives | (ear | father |...) +Noun (+Sing | +Pl)  nouns |... ...  Q: Why do we need a lexicon at all? Regular Expression Lexicon FSA slide courtesy of L. Karttunen (modified) big | clear | clever | ear | fat |... rlcae v e e t h f a

Intro to NLP - J. Eisner36 Weighted version of transducer: Assigns a weight to each string pair payer+IndP+SG+P1 paie paye Weighted French Transducer suis suivre+Imp+SG + P2 suivre+IndP+SG+P2 suivre+IndP+SG+P1 être+IndP +SG + P1 “upper language” “lower language” slide courtesy of L. Karttunen (modified)