test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program"> test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.

Similar presentations


Presentation on theme: "Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology."— Presentation transcript:

1 Finite State Morphology Alexander Fraser & Luisa Berlanda fraser@cis.uni-muenchen.de CIS, Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe 2016 2016-25-05

2 SFST programming language for developing finite-state transducers compiler which translates programs to transducers tools for – applying transducers – printing transducers – comparing transducers

3 SFST Example Session > echo "Hello\ World\!" > test.fststoring a small test program > fst-compiler test.fst test.acalling the compiler test.fst: 2 > fst-mor test.ainteractive transducer usage reading transducer...transducer is loaded finished. analyze> Hello World!input Hello World!recognised analyze> Hello Worldanother input no result for Hello Worldnot recognised analyze> qterminate program

4 Transducer Variables $Vroot$ = walk | talk | bark% list of verbs with regular inflection $Vinfl$ = :<> (\% regular verbal inflection [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = hat | head | trick% list of nouns with regular inflection $Ninfl$ = :<> (\% regular nominal inflection { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$% combine stems and inflectional endings

5 Homework Write a pipeline that maps all letters to lowercase and orders them backwards Family Huber has three children. Their first child is called Mia, the next one Toni and the last one Pia. Family Band has three children as well. Michael, Paul and Pia. Write a program, that can tell us the following details about the children: which family does he/she belong to, is it a son or a daugter, was he/she the first, second or third child. Output format:

6 Solution Write a pipeline that maps all letters to lowercase and orders them backwards [a-z]:[A-Z]* || [ZYXWVUTSRQPONMLKJIHFECBA]:[A-Z]*

7 Solution { }:{} ({ }:{mia} |{ }:{toni} |{ }:{pia}) |\ { }:{} ({ }:{michael} |{ }:{paul} |{ }:{pia})

8 Lexicon Files $Vroot$ = “verb.lex“ $Nroot$ = “noun.lex“ The command “filename“ reads the respective file line by line and forms the disjunction of all lines. Only the symbols : \ % are treated as operators.

9 Lexicon Files $Vroot$ = “verb.lex“ $Vinfl$ = :<> (\ [ ]:<> |\ { }:{s} |\ { }:{ing} |\ { }:{ed}) $Nroot$ = “noun.lex“ $Ninfl$ = :<> (\ { }:{} |\ { }:{s}) $Vroot$ $Vinfl$ | $Nroot$ $Ninfl$

10 Symbol Set Variables #cons# = bcdfghjklmnpqrstvwxzß #CONS# = BCDFGHJKLMNPQRSTVWXZß #Cons# = #cons# #CONS# #vowel# = aeiouäöü #VOWEL# = AEIOUÄÖÜ #Vowel# = #vowel# #VOWEL# #letter# = #vowel# #cons# #LETTER# = #VOWEL# #CONS# [#LETTER# #letter#]:[#letter# #LETTER#]* What would you get for Hallo and Ruß?

11 Solution What would you get for Hallo and Ruß?  hALLO  rUß

12 Alphabet The alphabet defines the set of available symbol pairs which is relevant for the wildcard symbol „.“, the negation operator „!“ and the replacement operators (introduced later). ALPHABET = [A-Z] [A-Z]:[a-z] The expression on the right-hand side is compiled into a transducer and the set of character pairs is extracted from its transitions. A:. is here identical to A:[Aa]. is identical to.:. [^A-Z]all characters appearing in the alphabet which are not uppercase letters, i.e. the set of lowercase letters..* maps mixed letter sequences to all uppercase letter sequences (analysis)

13 Alphabet fullform.lex: house house<>:s walk walk<>:i<>:n<>:g emorph.fst: ALPHABET = [a-zA-Z] [ ]:<> “fullform.lex“ ||.* reads the lexicon and deletes the grammatical markers on the surface side.

14 Orthographic Rules Replace operator: t ^─> (l _ r) applies the mapping implemented in the transducer t in the left context l and right context r. l and r are automata (i.e. transducers mapping strings to themselves.) e-elision: bake ing → baking $Morph$ = bake { }:{ing} ALPHABET = [A-Za-z] $e-elision$ = e:<> ^─> (_ _ [ei] )% delete e before e or i $Morph$ = $Morph$ || $e-elision$ % apply the rule ALPHABET = [A-Za-z] :<> % delete the marker $Morph$ ||.*

15 Orthographic Rules What does this program? $Morph$ = bake { }:{ing} | crash { }:s | happy { }:{er} | fly { }:s ALPHABET = [A-Za-z] $e-elision$ = e:<> ^-> (__ [ei]) $e-epenthesis$ = ( <>:e) ^-> ([sh]__ s) $y2i$ = y:i ^-> ([^ae] __[ ] e) $y2ie$ = y:{ie} ^-> ([^ae] __ [ ] s) $Morph$ = $Morph$ || $e-elision$ || $e-epenthesis$ || $y2i$ || $y2ie$ ALPHABET = [A-Za-z] [ ]:<> $Morph$ ||.*

16 Solution e-epenthesis: crash s → crashes y to i: happy er→ happier fly s → flies only in the analyze mode!

17 Agreement Variables $Morph$ = big { }:{er} | fat { }:{er} $cons$ = [bcdfghjklmnpqrstvwxz] $vowel$ = [aeiouy] #=g# = bdglmnpt $g$ = [#=g#] <>:[#=g#] ALPHABET = [A-Za-z] $gemination$ = $g$ ^-> ($cons$ $vowel$ __ e) $Morph$ = $Morph$ || $gemination$ ALPHABET = [A-Za-z] [ ]:<> $Morph$ ||.*

18 Solution analyze> bigger  big analyze> fater  no result for fater analyze> fatter  fat

19 Thank you for your attention


Download ppt "Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology."

Similar presentations


Ads by Google