Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr.

Similar presentations


Presentation on theme: "CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr."— Presentation transcript:

1 CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

2 More about FSAs  Transducers  Equivalence of DFSAs and NFSAs  Recognition as search: depth-first, breadth- search

3 Recognition using NFSAs

4 NFSA Recognition of “baaa!”

5 Breadth-first Recognition of “baaa!” should be q 2

6 Regular languages  Regular languages are characterized by FSAs  For every NFSA, there is an equivalent DFSA.  Regular languages are closed under concatenation, Kleene closure, union.

7 Concatenation

8 Kleene Closure

9 Union

10 Morphology  Definitions and Problems –What is Morphology? –Topology of Morphologies  Approaches to Computational Morphology –Lexicons and Rules –Computational Morphology Approaches

11 Morphology  The study of the way words are built up from smaller meaning units called Morphemes Syntax Lexeme/Inflected Lexeme Grammarssentences Morphology Morpheme/Allomorph Morphotacticswords Phonology Phoneme/Allophone Phonotacticsletters  Abstract versus Realized HOP +PAST  hop +ed  hopped  /hapt/

12 Phonology and Morphology  Phonology vs. Orthography  Historical spelling –night, nite –attention, mission, fish  Script Limitations –Spoken English has 14 vowels heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough –English Alphabet has 5 Use vowel combinatios: far fair fare Consonantal doubling (hopping vs. hoping)

13 Syntax and Morphology  Phrase-level agreement –Subject-Verb John studies hard (STUDY+3SG) –Noun-Adjective Las vacas hermosas  Sub-word phrasal structures –שבספרינו –ש+ב+ספר+ים+נו –That+in+book+PL+Poss:1PL –Which are in our books conj prep noun poss pluralarticle

14 Topology of Morphologies  Concatenative vs. Templatic  Derivational vs. Inflectional  Regular vs. Irregular

15 Concatenative Morphology  Morpheme+Morpheme+Morpheme+…  Stems: also called lemma, base form, root, lexeme – hope+ing  hopinghop  hopping  Affixes –Prefixes: Antidisestablishmentarianism –Suffixes: Antidisestablishmentarianism –Infixes: hingi (borrow) – humingi (borrower) in Tagalog –Circumfixes: sagen (say) – gesagt (said) in German  Agglutinative Languages –uygarlaştıramadıklarımızdanmışsınızcasına –uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına –Behaving as if you are among those whom we could not cause to become civilized

16 Templatic Morphology  Roots and Patterns مكتوب ب K T B ?ومَ?? كت כתוב ב ?ו?? כת maktuub written ktuuv written

17 Templatic Morphology: Root Meaning  KTB: writing “stuff” כתב מכתב כתב כתיב spelling כתובת address كتب كاتب مكتوب كتاب book مكتبة library مكتب office write writer letter

18 Derivational vs. Inflectional  Word Classes –Parts of speech: noun, verb, adjectives, etc. –Word class dictates how a word combines with morphemes to form new words

19 Derivational morphology  Nominalization: computerization, appointee, killer, fuzziness  Formation of adjectives: computational, clueless, embraceable  CatVar: Categorial Variation Database http://clipdemos.umiacs.umd.edu/catvar/

20 Inflectional morphology  Adds: Tense, number, person, mood, aspect  Word class doesn’t change  Word serves new grammatical role  Five verb forms in English  Other languages have (lots more)

21 Nouns and Verbs (in English)  Nouns have simple inflectional morphology –cat –cat+s, cat+’s  Verbs have more complex morphology

22 Regulars and Irregulars  Nouns –Cat/Cats –Mouse/Mice, Ox, Oxen, Goose, Geese  Verbs –Walk/Walked –Go/Went, Fly/Flew

23 Regular (English) Verbs Morphological Form ClassesRegularly Inflected Verbs Stemwalkmergetrymap -s formwalksmergestriesmaps -ing formwalkingmergingtryingmapping Past form or –ed participlewalkedmergedtriedmapped

24 Irregular (English) Verbs Morphological Form ClassesIrregularly Inflected Verbs Stemeatcatchcut -s formeatscatchescuts -ing formeatingcatchingcutting Past formatecaughtcut -ed participleeatencaughtcut

25 “To love” in Spanish

26 Computational Morphology  Finite State Morphology –Finite State Transducers (FST)  Input/Output  Analysis/Generation

27 Computational Morphology WORDSTEM (+FEATURES)*  cats cat +N +PL  catcat +N +SG  cities city +N +PL  geese goose +N +PL  ducks (duck +N +PL) or (duck +V +3SG)  mergingmerge +V +PRES-PART  caught(catch +V +PAST-PART) or (catch +V +PAST)

28 Building a Morphological Parser  The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language  Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only

29 Lexicon-only Morphology acclaim acclaim $N$ acclaim acclaim $V+0$ acclaimed acclaim $V+ed$ acclaimed acclaim $V+en$ acclaiming acclaim $V+ing$ acclaims acclaim $N+s$ acclaims acclaim $V+s$ acclamation acclamation $N$ acclamations acclamation $N+s$ acclimate acclimate $V+0$ acclimated acclimate $V+ed$ acclimated acclimate $V+en$ acclimates acclimate $V+s$ acclimating acclimate $V+ing$ The lexicon lists all surface level and lexical level pairs No rules …? Analysis/Generation is easy Very large for English What about Arabic or Turkish? Chinese?

30 Building a Morphological Parser  The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language  Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only

31 Lexicon and Rules: FSA Inflectional Noun Morphology reg-nounIrreg-pl-nounIrreg-sg-nounplural fox cat dog geese sheep mice goose sheep mouse -s English Noun Lexicon English Noun Rule

32 Lexicon and Rules: FSA English Verb Inflectional Morphology reg-verb-stemirreg-verb-stemirreg-past-verbpastpast-partpres-part3sg walk fry talk impeach cut speak spoken sing sang caught ate eaten -ed -ing-s

33 FSA for Derivational Morphology: Adjectival Formation

34 More Complex Derivational Morphology

35 Using FSAs for Recognition: English Nouns and their Inflection

36 Morphological Parsing  Finite-state automata (FSA) –Recognizer –One-level morphology  Finite-state transducers (FST) –Two-level morphology PC-Kimmo (Koskenniemi 83) –input-output pair

37 Terminology for PC-Kimmo  Upper = lexical tape  Lower = surface tape  Characters correspond to pairs, written a:b  If “a:a”, write “a” for shorthand  Two-level lexical entries  # = word boundary  ^ = morpheme boundary  Other = “any feasible pair that is not in this transducer”  Final states indicated with “:” and non-final states indicated with “.”

38 Four-Fold View of FSTs  As a recognizer  As a generator  As a translator  As a set relater

39 Nominal Inflection FST

40 Lexical and Intermediate Tapes

41 Spelling Rules NameRule DescriptionExample Consonant Doubling1-letter consonant doubled before -ing/-edbeg/begging E-deletionSilent e dropped before -ing and -edmake/making E-insertione added after s,z,x,ch,sh before swatch/watches Y-replacement-y changes to -ie before -s, -i before -edtry/tries K-insertionverbs ending with vowel + -c add -kpanic/panicked

42 Chomsky and Halle Notation ε → e / xszxsz ^ __ s #

43 Intermediate-to-Surface Transducer

44 State Transition Table

45 Two-Level Morphology

46 Sample Run KIMMO DEMO

47 FSTs and ambiguity  Parse Example 1: unionizable – union +ize +able – un+ ion +ize +able  Parse Example 2: assess – assessv – assN +essN  Parse Example 3: tender – tenderAJ – tenNum+dAJ+erCMP

48 What to do about Global Ambiguity?  Accept first successful structure  Run parser through all possible paths  Bias the search in some manner

49 Computational Morphology  The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language  Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only

50 Computational Morphology  The Rules and the Lexicon –General versus Specific –Regular versus Irregular –Accuracy, speed, space –The Morphology of a language  Approaches –Lexicon only –Lexicon and Rules Finite-state Automata Finite-state Transducers –Rules only (next time!!)

51 Readings for next time  J&M Chapter 6


Download ppt "CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr."

Similar presentations


Ads by Google