test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program"> test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program">
Download presentation
Presentation is loading. Please wait.
Published byDwain Matthews Modified over 8 years ago
1
Finite State Morphology Alexander Fraser & Luisa Berlanda fraser@cis.uni-muenchen.de CIS, Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe 2016 2016-25-05
2
SFST programming language for developing finite-state transducers compiler which translates programs to transducers tools for – applying transducers – printing transducers – comparing transducers
3
SFST Example Session > echo "Hello\ World\!" > test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program
4
Transducer Variables $Vroot$ = walk | talk | bark% list of verbs with regular inflection $Vinfl$ = :<> (\% regular verbal inflection [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = hat | head | trick% list of nouns with regular inflection $Ninfl$ = :<> (\% regular nominal inflection { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$% combine stems and inflectional endings
5
Homework Write a pipeline that maps all letters to lowercase and orders them backwards Family Huber has three children. Their first child is called Mia, the next one Toni and the last one Pia. Family Band has three children as well. Michael, Paul and Pia. Write a program, that can tell us the following details about the children: which family does he/she belong to, is it a son or a daugter, was he/she the first, second or third child. Output format:
6
Solution Write a pipeline that maps all letters to lowercase and orders them backwards [a-z]:[A-Z]* || [ZYXWVUTSRQPONMLKJIHFECBA]:[A-Z]*
7
Solution { }:{} ({ }:{mia} |{ }:{toni} |{ }:{pia}) |\ { }:{} ({ }:{michael} |{ }:{paul} |{ }:{pia})
8
Lexicon Files $Vroot$ = “verb.lex“ $Nroot$ = “noun.lex“ The command “filename“ reads the respective file line by line and forms the disjunction of all lines. Only the symbols : \ % are treated as operators.
9
Lexicon Files $Vroot$ = “verb.lex“ $Vinfl$ = :<> (\ [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = “noun.lex“ $Ninfl$ = :<> (\ { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$
10
Symbol Set Variables #cons# = bcdfghjklmnpqrstvwxzß #CONS# = BCDFGHJKLMNPQRSTVWXZß #Cons# = #cons# #CONS# #vowel# = aeiouäöü #VOWEL# = AEIOUÄÖÜ #Vowel# = #vowel# #VOWEL# #letter# = #vowel# #cons# #LETTER# = #VOWEL# #CONS# [#LETTER# #letter#]:[#letter# #LETTER#]* What would you get for Hallo and Ruß?
11
Solution What would you get for Hallo and Ruß? hALLO rUß
12
Alphabet The alphabet defines the set of available symbol pairs which is relevant for the wildcard symbol „.“, the negation operator „!“ and the replacement operators (introduced later). ALPHABET = [A-Z] [A-Z]:[a-z] The expression on the right-hand side is compiled into a transducer and the set of character pairs is extracted from its transitions. A:. is here identical to A:[Aa]. is identical to.:. [^A-Z]all characters appearing in the alphabet which are not uppercase letters, i.e. the set of lowercase letters..* maps mixed letter sequences to all uppercase letter sequences (analysis)
13
Alphabet fullform.lex: house house<>:s walk walk<>:i<>:n<>:g emorph.fst: ALPHABET = [a-zA-Z] [ ]:<> “fullform.lex“ ||.* reads the lexicon and deletes the grammatical markers on the surface side.
14
Orthographic Rules Replace operator: t ^─> (l _ r) applies the mapping implemented in the transducer t in the left context l and right context r. l and r are automata (i.e. transducers mapping strings to themselves.) e-elision: bake ing → baking $Morph$ = bake { }:{ing} ALPHABET = [A-Za-z] $e-elision$ = e:<> ^─> (_ _ [ei] )% delete e before e or i $Morph$ = $Morph$ || $e-elision$ % apply the rule ALPHABET = [A-Za-z] :<> % delete the marker $Morph$ ||.*
15
Orthographic Rules What does this program? $Morph$ = bake { }:{ing} | crash { }:s | happy { }:{er} | fly { }:s ALPHABET = [A-Za-z] $e-elision$ = e:<> ^-> (__ [ei]) $e-epenthesis$ = ( <>:e) ^-> ([sh]__ s) $y2i$ = y:i ^-> ([^ae] __[ ] e) $y2ie$ = y:{ie} ^-> ([^ae] __ [ ] s) $Morph$ = $Morph$ || $e-elision$ || $e-epenthesis$ || $y2i$ || $y2ie$ ALPHABET = [A-Za-z] [ ]:<> $Morph$ ||.*
16
Solution e-epenthesis: crash s → crashes y to i: happy er→ happier fly s → flies only in the analyze mode!
17
Agreement Variables $Morph$ = big { }:{er} | fat { }:{er} $cons$ = [bcdfghjklmnpqrstvwxz] $vowel$ = [aeiouy] #=g# = bdglmnpt $g$ = [#=g#] <>:[#=g#] ALPHABET = [A-Za-z] $gemination$ = $g$ ^-> ($cons$ $vowel$ __ e) $Morph$ = $Morph$ || $gemination$ ALPHABET = [A-Za-z] [ ]:<> $Morph$ ||.*
18
Solution analyze> bigger big analyze> fater no result for fater analyze> fatter fat
19
Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.