600.465 - Intro to NLP - J. Eisner1 Finite-State Methods.

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Finite-State Methods.
Advertisements

Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
1Basic Mathematics - Finite-State Methods in Natural-Language Processing: Basic Mathematics Ronald M. Kaplan and Martin Kay.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
Languages, grammars, and regular expressions
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
Finite state automaton (FSA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.
Introduction to English Morphology Finite State Transducers
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute July 18, 2005.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Intro to NLP - J. Eisner1 Finite-State Methods.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Morphological Processing & Stemming Using FSAs/FSTs.
/425 Declarative Methods - J. Eisner1 Constraints on Strings.
Comprehension & Compilation in Optimality Theory Jason Eisner Jason Eisner Johns Hopkins University July 8, 2002 — ACL.
Human Language Technology Finite State Transducers.
Natural Language Processing Lecture 2—1/15/2015 Susan W. Brown.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
 Relation: like a function, but multiple outputs ok  Regular: finite-state  Transducer: automaton w/ outputs  b  ? a  ?  aaaaa  ?  Invertible?
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
MA/CSSE 474 Theory of Computation Regular Expressions Intro.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Testing with the Finite-State Calculus Thursday AM Kenneth R. Beesley Xerox Research Centre Europe.
Part-of-Speech Tagging
Speech and Language Processing
Composition is Our Friend
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
Building Finite-State Machines
CSC NLP - Regex, Finite State Automata
Finite-State Methods Intro to NLP - J. Eisner.
Building Finite-State Machines
languages & relations regular expressions finite-state networks
Finite-State and the Noisy Channel
Finite-State Programming
CPSC 503 Computational Linguistics
COUNTRIES NATIONALITIES LANGUAGES.
Morphological Parsing
Presentation transcript:

Intro to NLP - J. Eisner1 Finite-State Methods

Intro to NLP - J. Eisner2 Finite state acceptors (FSAs) a c    Regexps   Union, Kleene *, concat, intersect, complement, reversal   Determinization, minimization   Pumping, Myhill-Nerode

Intro to NLP - J. Eisner3 Example: Unweighted acceptor /usr/dict/words FSM states, arcs 0.6 sec 25K words 206K chars clear clever ear ever fat father Wordlist compile rlcae v e e t h f a Network slide courtesy of L. Karttunen (modified)

Intro to NLP - J. Eisner4 Example: Weighted acceptor slide courtesy of L. Karttunen (modified) clear0 clever1 ear2 ever3 fat4 father5 Wordlist compile Network r/0l/0 c/0 a/0e/0 v/1 e/0 e/2 t/0 h/1 f/4 a/0 Computes a perfect hash!

Intro to NLP - J. Eisner5   Successor states partition the path set   Use offsets of successor states as arc weights   Q: Would this work for an arbitrary numbering of the words? Example: Weighted acceptor slide courtesy of L. Karttunen (modified) clear0 clever1 ear2 ever3 fat4 father5 Wordlist compile Network rl c ae 1 v e 2 2 e t h f 2 a   Compute number of paths from each state (Q: how?) r/0l/0 c/0 a/0e/0 v/1 e/0 e/2 t/0 h/1 f/4 a/0

Intro to NLP - J. Eisner6 Example: Unweighted transducer VP VP [head=vouloir,...] V V [head=vouloir, tense=Present, num=SG, person=P3]... veut the problem of morphology (“word shape”) - an area of linguistics

Intro to NLP - J. Eisner7 Example: Unweighted transducer veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical forminflection codes vouloir+Pres+Sing+P3 veut slide courtesy of L. Karttunen (modified) VP VP [head=vouloir,...] V V [head=vouloir, tense=Present, num=SG, person=P3]... veut the relevant path

Intro to NLP - J. Eisner8 veut vouloir +Pres +Sing + P3 Finite-state transducer inflected form canonical forminflection codes vouloir+Pres+Sing+P3 veut Example: Unweighted transducer   Bidirectional: generation or analysis   Compact and fast   Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese,...   Research systems for many other languages, including Arabic, Malay slide courtesy of L. Karttunen the relevant path

Intro to NLP - J. Eisner9   Relation: like a function, but multiple outputs ok   Regular: finite-state   Transducer: automaton w/ outputs   b  ? a  ?   aaaaa  ? Regular Relation (of strings) b:b a:a a:a: a:c b:b: b:b ?:c ?:a ?:b {b} {} {ac, aca, acab, acabc}   Invertible?   Closed under composition?

Intro to NLP - J. Eisner10   Can weight the arcs:  vs.    b  {b} a  {}   aaaaa  {ac, aca, acab, acabc}   How to find best outputs?   For aaaaa?   For all inputs at once? Regular Relation (of strings) b:b a:a a:a: a:ca:c b:b: b:b ?:c?:c ?:a ?:b

Intro to NLP - J. Eisner11 Function from strings to... a:x/.5 c:z/.7  :y/.5.3 Acceptors (FSAs)Transducers (FSTs) a:x c:z :y:y a c  Unweighted Weighted a/.5 c/.7  /.5.3 {false, true}strings numbers(string, num) pairs

Intro to NLP - J. Eisner12 Sample functions Acceptors (FSAs)Transducers (FSTs) Unweighted Weighted {false, true}strings numbers(string, num) pairs Grammatical? How grammatical? Better, how likely? Markup Correction Translation Good markups Good corrections Good translations

Intro to NLP - J. Eisner13 Functions ab?dabcd f  g

Intro to NLP - J. Eisner14 Functions ab?d  Function composition: f  g [first f, then g – intuitive notation, but opposite of the traditional math notation]

Intro to NLP - J. Eisner15 From Functions to Relations ab?dabcd  abed abjd ... f  g

Intro to NLP - J. Eisner16 From Functions to Relations ab?d   4 2 ... 8 Relation composition: f  g

Intro to NLP - J. Eisner17 From Functions to Relations ab?d  Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. Pick best output

Intro to NLP - J. Eisner18 Inverting Relations ab?dabcd  abed abjd ... f  g

Intro to NLP - J. Eisner19 Inverting Relations ab?dabcd  abed abjd ... f -1  g -1

Intro to NLP - J. Eisner20 Inverting Relations ab?d   4 2 ... 8 (f  g) -1 = g -1  f -1

Intro to NLP - J. Eisner21 Building a lexical transducer Regular Expression Lexicon FSA Compiler Regular Expressions for Rules Composed Rule FSTs Lexical Transducer (a single FST) composition slide courtesy of L. Karttunen (modified) big | clear | clever | ear | fat |... rlcae v e e t h f a big +Adj r +Comp big ge one path

Intro to NLP - J. Eisner22 Building a lexical transducer  Actually, the lexicon must contain elements like big +Adj +Comp  So write it as a more complicated expression: (big | clear | clever | fat |...) +Adj (  | +Comp | +Sup)  adjectives | (ear | father |...) +Noun (+Sing | +Pl)  nouns |... ...  Q: Why do we need a lexicon at all? Regular Expression Lexicon FSA slide courtesy of L. Karttunen (modified) big | clear | clever | ear | fat |... rlcae v e e t h f a

Intro to NLP - J. Eisner23 Weighted version of transducer: Assigns a weight to each string pair payer+IndP+SG+P1 paie paye Weighted French Transducer suis suivre+Imp+SG + P2 suivre+IndP+SG+P2 suivre+IndP+SG+P1 être+IndP +SG + P1 “upper language” “lower language” slide courtesy of L. Karttunen (modified)