Download presentation
Presentation is loading. Please wait.
1
CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee
2
More about FSAs Transducers Equivalence of DFSAs and NFSAs Recognition as search: depth-first, breadth- search
3
Recognition using NFSAs
4
NFSA Recognition of “baaa!”
5
Breadth-first Recognition of “baaa!” should be q 2
6
Regular languages Regular languages are characterized by FSAs For every NFSA, there is an equivalent DFSA. Regular languages are closed under concatenation, Kleene closure, union.
7
Concatenation
8
Kleene Closure
9
Union
10
Morphology Definitions and Problems –What is Morphology? –Topology of Morphologies Approaches to Computational Morphology –Lexicons and Rules –Computational Morphology Approaches
11
Morphology The study of the way words are built up from smaller meaning units called Morphemes Syntax Lexeme/Inflected Lexeme Grammarssentences Morphology Morpheme/Allomorph Morphotacticswords Phonology Phoneme/Allophone Phonotacticsletters Abstract versus Realized HOP +PAST hop +ed hopped /hapt/
12
Phonology and Morphology Phonology vs. Orthography Historical spelling –night, nite –attention, mission, fish Script Limitations –Spoken English has 14 vowels heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough –English Alphabet has 5 Use vowel combinatios: far fair fare Consonantal doubling (hopping vs. hoping)
13
Syntax and Morphology Phrase-level agreement –Subject-Verb John studies hard (STUDY+3SG) –Noun-Adjective Las vacas hermosas Sub-word phrasal structures –שבספרינו –ש+ב+ספר+ים+נו –That+in+book+PL+Poss:1PL –Which are in our books conj prep noun poss pluralarticle
14
Topology of Morphologies Concatenative vs. Templatic Derivational vs. Inflectional Regular vs. Irregular
15
Concatenative Morphology Morpheme+Morpheme+Morpheme+… Stems: also called lemma, base form, root, lexeme – hope+ing hopinghop hopping Affixes –Prefixes: Antidisestablishmentarianism –Suffixes: Antidisestablishmentarianism –Infixes: hingi (borrow) – humingi (borrower) in Tagalog –Circumfixes: sagen (say) – gesagt (said) in German Agglutinative Languages –uygarlaştıramadıklarımızdanmışsınızcasına –uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına –Behaving as if you are among those whom we could not cause to become civilized
16
Templatic Morphology Roots and Patterns مكتوب ب K T B ?ومَ?? كت כתוב ב ?ו?? כת maktuub written ktuuv written
17
Templatic Morphology: Root Meaning KTB: writing “stuff” כתב מכתב כתב כתיב spelling כתובת address كتب كاتب مكتوب كتاب book مكتبة library مكتب office write writer letter
18
Derivational vs. Inflectional Word Classes –Parts of speech: noun, verb, adjectives, etc. –Word class dictates how a word combines with morphemes to form new words
19
Derivational morphology Nominalization: computerization, appointee, killer, fuzziness Formation of adjectives: computational, clueless, embraceable CatVar: Categorial Variation Database http://clipdemos.umiacs.umd.edu/catvar/
20
Inflectional morphology Adds: Tense, number, person, mood, aspect Word class doesn’t change Word serves new grammatical role Five verb forms in English Other languages have (lots more)
21
Nouns and Verbs (in English) Nouns have simple inflectional morphology –cat –cat+s, cat+’s Verbs have more complex morphology
22
Regulars and Irregulars Nouns –Cat/Cats –Mouse/Mice, Ox, Oxen, Goose, Geese Verbs –Walk/Walked –Go/Went, Fly/Flew
23
Regular (English) Verbs Morphological Form ClassesRegularly Inflected Verbs Stemwalkmergetrymap -s formwalksmergestriesmaps -ing formwalkingmergingtryingmapping Past form or –ed participlewalkedmergedtriedmapped
24
Irregular (English) Verbs Morphological Form ClassesIrregularly Inflected Verbs Stemeatcatchcut -s formeatscatchescuts -ing formeatingcatchingcutting Past formatecaughtcut -ed participleeatencaughtcut
25
“To love” in Spanish
26
Computational Morphology Finite State Morphology –Finite State Transducers (FST) Input/Output Analysis/Generation
27
Computational Morphology WORDSTEM (+FEATURES)* cats cat +N +PL catcat +N +SG cities city +N +PL geese goose +N +PL ducks (duck +N +PL) or (duck +V +3SG) mergingmerge +V +PRES-PART caught(catch +V +PAST-PART) or (catch +V +PAST)
28
Building a Morphological Parser The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only
29
Lexicon-only Morphology acclaim acclaim $N$ acclaim acclaim $V+0$ acclaimed acclaim $V+ed$ acclaimed acclaim $V+en$ acclaiming acclaim $V+ing$ acclaims acclaim $N+s$ acclaims acclaim $V+s$ acclamation acclamation $N$ acclamations acclamation $N+s$ acclimate acclimate $V+0$ acclimated acclimate $V+ed$ acclimated acclimate $V+en$ acclimates acclimate $V+s$ acclimating acclimate $V+ing$ The lexicon lists all surface level and lexical level pairs No rules …? Analysis/Generation is easy Very large for English What about Arabic or Turkish? Chinese?
30
Building a Morphological Parser The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only
31
Lexicon and Rules: FSA Inflectional Noun Morphology reg-nounIrreg-pl-nounIrreg-sg-nounplural fox cat dog geese sheep mice goose sheep mouse -s English Noun Lexicon English Noun Rule
32
Lexicon and Rules: FSA English Verb Inflectional Morphology reg-verb-stemirreg-verb-stemirreg-past-verbpastpast-partpres-part3sg walk fry talk impeach cut speak spoken sing sang caught ate eaten -ed -ing-s
33
FSA for Derivational Morphology: Adjectival Formation
34
More Complex Derivational Morphology
35
Using FSAs for Recognition: English Nouns and their Inflection
36
Morphological Parsing Finite-state automata (FSA) –Recognizer –One-level morphology Finite-state transducers (FST) –Two-level morphology PC-Kimmo (Koskenniemi 83) –input-output pair
37
Terminology for PC-Kimmo Upper = lexical tape Lower = surface tape Characters correspond to pairs, written a:b If “a:a”, write “a” for shorthand Two-level lexical entries # = word boundary ^ = morpheme boundary Other = “any feasible pair that is not in this transducer” Final states indicated with “:” and non-final states indicated with “.”
38
Four-Fold View of FSTs As a recognizer As a generator As a translator As a set relater
39
Nominal Inflection FST
40
Lexical and Intermediate Tapes
41
Spelling Rules NameRule DescriptionExample Consonant Doubling1-letter consonant doubled before -ing/-edbeg/begging E-deletionSilent e dropped before -ing and -edmake/making E-insertione added after s,z,x,ch,sh before swatch/watches Y-replacement-y changes to -ie before -s, -i before -edtry/tries K-insertionverbs ending with vowel + -c add -kpanic/panicked
42
Chomsky and Halle Notation ε → e / xszxsz ^ __ s #
43
Intermediate-to-Surface Transducer
44
State Transition Table
45
Two-Level Morphology
46
Sample Run KIMMO DEMO
47
FSTs and ambiguity Parse Example 1: unionizable – union +ize +able – un+ ion +ize +able Parse Example 2: assess – assessv – assN +essN Parse Example 3: tender – tenderAJ – tenNum+dAJ+erCMP
48
What to do about Global Ambiguity? Accept first successful structure Run parser through all possible paths Bias the search in some manner
49
Computational Morphology The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only
50
Computational Morphology The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only (next time!!)
51
Readings for next time J&M Chapter 6
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.