Download presentation
Presentation is loading. Please wait.
Published byMeryl Goodman Modified over 9 years ago
1
Finite State Transducers for Morphological Parsing
CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing CSA3050: NLP Algorithms
2
Resumé FSAs are equivalent to regular languages
FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels. We can use FSTs to transduce between surface and lexical levels. CSA3050: NLP Algorithms
3
Dotted Pair Notation 1) FSA recogniser for "fox" f o x
2) FST transducers for fox/fox; goose/geese f:f o:o x:x g:g o:e s:s e:e CSA3050: NLP Algorithms
4
Dotted Pair Notation (2)
By convention, x:y pairs lexical symbol x with surface symbol y By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x". g o:e s e CSA3050: NLP Algorithms
5
FSA for Number Inflection
How can we augment this to produce an analysis? CSA3050: NLP Algorithms
6
3 Steps Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input. Create a transducer Tstems mapping words to word classes. Hook the two together. CSA3050: NLP Algorithms
7
Tnum example ^ “lexical” +N +PL reg-noun-stem s # reg-noun-stem
“intermediate” CSA3050: NLP Algorithms
8
1. Tnum: Noun Number Inflection
multi-character symbols morpheme boundary ^ word boundary # CSA3050: NLP Algorithms
9
Tstems example # “intermediate” reg-noun-stem d:d o:o g:g f:f o:o x:x
“surface” CSA3050: NLP Algorithms
10
Tstems example # “intermediate” m o:i u:ε s e s h e e p # “surface”
irreg-pl-noun-form Tstems m o:i u:ε s e s h e e p # “surface” CSA3050: NLP Algorithms
11
2. Tstems Lexicon CSA3050: NLP Algorithms
12
Hooking Together There are two ways to hook the two transducers together Cascading: hooking the output of one transducer with the input of the other and running them in series. Composition: composing the two transducers together mathematically to create a third, equivalent transducer. CSA3050: NLP Algorithms
13
Hooking Together: cascading
+PL reg-noun-stem +N lexical Tnum s reg-noun-stem ^ # intermediate Tstems dog fox s # surface CSA3050: NLP Algorithms
14
Composition of Relations
Let R and S be binary relations. The composition of R and S written R S is defined as: (a,c) R S if and only if (a,b) R and (b,c) S for all a,b,c Transducers can also be composed CSA3050: NLP Algorithms
15
Tnum o Tstem CSA3050: NLP Algorithms
16
English Spelling Rules
consonant doubling: beg / begging y replacement: try/tries k insertion: panic/panicked e deletion: make/making e insertion: watch/watches Each rule can be stated in more detail ... CSA3050: NLP Algorithms
17
e Insertion Rule Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s Stated formally e [x|s|z|ch]^ __ s# CSA3050: NLP Algorithms
18
e insertion over 3 levels
The rule corresponds to the mapping between surface and intermediate levels CSA3050: NLP Algorithms
19
e insertion as an FST CSA3050: NLP Algorithms
20
Incorporating Spelling Rules
Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". The set of spelling rules is positioned between the surface level and the intermediate level. Parallel execution of FSTs can be carried out: by simulation: in this case FSTs must first be aligned. by first constructing a a single FST corresponding to their intersection. CSA3050: NLP Algorithms
21
Putting it all together
execution of FSTi takes place in parallel CSA3050: NLP Algorithms
22
Kaplan and Kay The Xerox View
FSTi are aligned but separate FSTi intersected together CSA3050: NLP Algorithms
23
Operations over FSTs We can perform operations over FSTs which yield other FSTs. Inversion Union Composition The inversion of T, or T-1 simply computes the inverse mapping to T. CSA3050: NLP Algorithms
24
Inversion T-1 T c a t ^ PL c a t ^ PL lexical lexical surface surface
CSA3050: NLP Algorithms
25
Inversion To invert a transducer Practical consequences:
we switch the order of the complex symbols, i.e. every i:o becomes o:i or we leave the transducer alone, and slightly change the parsing algorithm. Practical consequences: Transducer is reversible We can use the exactly the same transducer to perform either analysis or generation. CSA3050: NLP Algorithms
26
Closure Properties of FSTs
Relations computed by FSTs are closed under inversion union composition not closed (in general) under intersection. However intersection is possible provided that we restrict the class of transducers. complementation subtraction CSA3050: NLP Algorithms
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.