Finite State Transducers for Morphological Parsing

Slides:



Advertisements
Similar presentations
Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
1Basic Mathematics - Finite-State Methods in Natural-Language Processing: Basic Mathematics Ronald M. Kaplan and Martin Kay.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Intro to NLP - J. Eisner1 Finite-State Methods.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Conformance Simulation Relation ( ) Let and be two automata over the same alphabet simulates () if there exists a simulation relation such that Note that.
Languages, grammars, and regular expressions
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Boolean Algebra and Digital Circuits
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Chapter 3: Morphology and Finite State Transducer
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Finite State Transducers
Chapter 3: Morphology and Finite State Transducer Heshaam Faili University of Tehran.
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Human Language Technology Finite State Transducers.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Speech and Language Processing
Composition is Our Friend
Two issues in lexical analysis
CSCI 5832 Natural Language Processing
Hierarchy of languages
Speech and Language Processing
CSCI 5832 Natural Language Processing
COSC 3340: Introduction to Theory of Computation
Writing Lexical Transducers Using xfst
Building Finite-State Machines
Finite-State Methods in Natural-Language Processing: Basic Mathematics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Morphological Parsing
CSCI 5832 Natural Language Processing
Presentation transcript:

Finite State Transducers for Morphological Parsing CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing 5.11.2002 CSA3050: NLP Algorithms

Resumé FSAs are equivalent to regular languages FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels. We can use FSTs to transduce between surface and lexical levels. 5.11.2002 CSA3050: NLP Algorithms

Dotted Pair Notation 1) FSA recogniser for "fox" f o x 2) FST transducers for fox/fox; goose/geese f:f o:o x:x g:g o:e s:s e:e 5.11.2002 CSA3050: NLP Algorithms

Dotted Pair Notation (2) By convention, x:y pairs lexical symbol x with surface symbol y By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x". g o:e s e 5.11.2002 CSA3050: NLP Algorithms

FSA for Number Inflection How can we augment this to produce an analysis? 5.11.2002 CSA3050: NLP Algorithms

3 Steps Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input. Create a transducer Tstems mapping words to word classes. Hook the two together. 5.11.2002 CSA3050: NLP Algorithms

Tnum example  ^ “lexical” +N +PL reg-noun-stem s # reg-noun-stem “intermediate” 5.11.2002 CSA3050: NLP Algorithms

1. Tnum: Noun Number Inflection multi-character symbols morpheme boundary ^ word boundary # 5.11.2002 CSA3050: NLP Algorithms

Tstems example # “intermediate” reg-noun-stem d:d o:o g:g f:f o:o x:x “surface” 5.11.2002 CSA3050: NLP Algorithms

Tstems example # “intermediate” m o:i u:ε s e s h e e p # “surface” irreg-pl-noun-form Tstems m o:i u:ε s e s h e e p # “surface” 5.11.2002 CSA3050: NLP Algorithms

2. Tstems Lexicon 5.11.2002 CSA3050: NLP Algorithms

Hooking Together There are two ways to hook the two transducers together Cascading: hooking the output of one transducer with the input of the other and running them in series. Composition: composing the two transducers together mathematically to create a third, equivalent transducer. 5.11.2002 CSA3050: NLP Algorithms

Hooking Together: cascading +PL reg-noun-stem +N  lexical Tnum s reg-noun-stem ^ # intermediate Tstems dog fox  s # surface 5.11.2002 CSA3050: NLP Algorithms

Composition of Relations Let R and S be binary relations. The composition of R and S written R S is defined as: (a,c)  R S if and only if (a,b)  R and (b,c)  S for all a,b,c Transducers can also be composed 5.11.2002 CSA3050: NLP Algorithms

Tnum o Tstem 5.11.2002 CSA3050: NLP Algorithms

English Spelling Rules consonant doubling: beg / begging y replacement: try/tries k insertion: panic/panicked e deletion: make/making e insertion: watch/watches Each rule can be stated in more detail ... 5.11.2002 CSA3050: NLP Algorithms

e Insertion Rule Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s Stated formally   e [x|s|z|ch]^ __ s# 5.11.2002 CSA3050: NLP Algorithms

e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels 5.11.2002 CSA3050: NLP Algorithms

e insertion as an FST 5.11.2002 CSA3050: NLP Algorithms

Incorporating Spelling Rules Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". The set of spelling rules is positioned between the surface level and the intermediate level. Parallel execution of FSTs can be carried out: by simulation: in this case FSTs must first be aligned. by first constructing a a single FST corresponding to their intersection. 5.11.2002 CSA3050: NLP Algorithms

Putting it all together execution of FSTi takes place in parallel 5.11.2002 CSA3050: NLP Algorithms

Kaplan and Kay The Xerox View FSTi are aligned but separate FSTi intersected together 5.11.2002 CSA3050: NLP Algorithms

Operations over FSTs We can perform operations over FSTs which yield other FSTs. Inversion Union Composition The inversion of T, or T-1 simply computes the inverse mapping to T. 5.11.2002 CSA3050: NLP Algorithms

Inversion T-1 T c a t ^ PL c a t ^ PL lexical lexical surface surface 5.11.2002 CSA3050: NLP Algorithms

Inversion To invert a transducer Practical consequences: we switch the order of the complex symbols, i.e. every i:o becomes o:i or we leave the transducer alone, and slightly change the parsing algorithm. Practical consequences: Transducer is reversible We can use the exactly the same transducer to perform either analysis or generation. 5.11.2002 CSA3050: NLP Algorithms

Closure Properties of FSTs Relations computed by FSTs are closed under inversion union composition not closed (in general) under intersection. However intersection is possible provided that we restrict the class of transducers. complementation subtraction 5.11.2002 CSA3050: NLP Algorithms