Download presentation
Presentation is loading. Please wait.
Published byCurtis Parker Modified over 9 years ago
1
Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging
2
Christel Kemke 2 Morphology Overview Morphology Stemming Word Classes POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)
3
Christel Kemke 3 Morphology
4
Christel Kemke 4 Morphology Morphemes and Words Morpheme = "minimal meaning-bearing unit in a language" Combine morphemes to create words Inflection combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb) Derivation combination of a word stem with a grammatical morpheme Yields different word class, e.g. clean (verb), clean-ing (noun) Compounding combination of multiple word stems Cliticization combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve = I + have
5
Christel Kemke 5 Morphology Inflectional Morphology word stem + grammatical morphemecat + s only for nouns, verbs, and some adjectives Nouns plural: regular: +s, +es irregular: mouse - mice; ox - oxen rules for exceptions: e.g. -y -> -ieslike: butterfly - butterflies possessive: +'s, +' Verbs main verbs (sleep, eat, walk) modal verbs (can, will, should) primary verbs (be, have, do)
6
Christel Kemke 6 Morphology Inflectional Morphology (verbs) Verb Inflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form stemwalkmerge trymap -s formwalksmerges triesmaps -ing participlewalkingmerging tryingmapping past; -ed participlewalkedmerged triedmapped Morph. FormIrregularly Inflected Form stemeatcatch cut -s formeatscatches cuts -ing participleeatingcatching cutting -ed pastatecaught cut -ed participle eatencaught cut
7
Christel Kemke 7 Morphology Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: prefixun-unhappyadjective, negation suffix-lyhappilyadverb, mode -erhappieradjective, comparative 1 -esthappiestadjective, comparative 2 suffix-nesshappinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
8
Christel Kemke 8 Morphology Inflectional Morphology
9
Christel Kemke 9 Morphology Noun Inflections
10
Christel Kemke 10 Morphology Verb Inflections
11
Christel Kemke 11 Morphology Derivational Morphology
12
Christel Kemke 12 Morphology Noun Derivation
13
Christel Kemke 13 Morphology Adjective Derivation
14
Christel Kemke 14 Morphology Clitics
15
Christel Kemke 15 Morphology Verb Clitics
16
Christel Kemke 16 Morphology Methods, Algorithms
17
Christel Kemke 17 Morphology Stemming Stemming algorithms strip off word affixes yield stem only, no additional information (like plural, 3 rd person etc.) used, e.g. in web search engines famous stemming algorithm: the Porter stemmer
18
Christel Kemke 18 Morphology Stemming Methods Rule-based stemming Example rules: ATIONAL → ATE e.g., relational → relate ING → if stem contains vowel, e.g., motoring → motor
19
Christel Kemke 19 Morphology Stemming Problems Errors of ComissionErrors of Omission organizationorganEuropeanEurope doingdoeanalysisanalyzes GeneralizationGenericMatricesmatrix NumericalnumerousNoisenoisy Policypolicesparsesparsity
20
Christel Kemke 20 Morphology Tokenization, Word Segmentation Tokenization or word segmentation separate out “words” (lexical entries) from running text expand abbreviated terms E.g. I’m into I am, it’s into it is collect tokens forming single lexical entry E.g. New York marked as one single entry
21
Christel Kemke 21 Morphology Tokenization, Word Segmentation Finite state transducer (FST) Modifies input string (rules) Recognizes (stored) abbreviations and composite words See Fig.3.22 in Jurafsky, Ch.3 More of an issue in languages like Chinese
22
Christel Kemke 22 Morphology Lemmatization Lemmatization maps words with same root but different surface appearances onto the same lexeme e.g. buys, bought, buying -> buy
23
Christel Kemke 23 Morphology Morphological Processing
24
Christel Kemke 24 Morphology Word Reccognition Spelling Errors Mark non-words based on dictionary/lexicon Use “minimum editing distance” Dynamic programming Table-based Transform operations deletion, substitution, insertion Calculate minimum path Morphological Parser = FST
25
Christel Kemke 25 Morphology Morphological Processing Knowledge lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs orthographic rules: spelling, e.g. double consonant as in mapping Processing: Finite State Transducers take information above and analyze word token / generate word form
26
Christel Kemke 26 Morphology Fig. 3.3FSA for verb inflection.
27
Christel Kemke 27 Morphology Fig. 3.5More detailed FSA for adjective inflection. Fig. 3.4Simple FSA for adjective inflection.
28
Christel Kemke 28 Morphology Fig. 3.7 Compiled FSA for noun inflection.
29
Christel Kemke 29 Morphology Fig. 3.12 Lexical and intermediate tape of a FS Transducer Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.
30
Christel Kemke 30 Morphology Word Classes and POS Tagging
31
Christel Kemke 31 Morphology Word Classes Sort words into categories according to: morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities
32
Christel Kemke 32 Morphology Open vs. Closed Word Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs
33
Christel Kemke 33 Morphology Open vs. Closed Word Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals
34
Christel Kemke 34 Morphology Open Class Words: Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race,... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom
35
Christel Kemke 35 Morphology Open Class Words: Verbs Verbs denote actions, processes, and states, e.g. smoke, dream, rest, run several morphological forms, e.g. non-3rd person-eat, sleep 3rd person-eats, sleeps, progressive/-eating, sleeping present participle/ gerundive past participle-eaten, slept simple past -ate, slept
36
Christel Kemke 36 Morphology Open Class Words: Verbs (2) non-3rd personeatI eat. We eat. They eat. 3rd personeatsHe eats. She eats. It eats. progressiveeatingHe is eating. He will be eating. He has been eating. e.g. present participleHe is eating. gerundiveEating scorpions [NP] is common in China. use as adjectiveEating children [NP] are common at McDonalds. past participleeatenHe has eaten the scorpion. The scorpion was eaten. simple past ateHe ate the scorpion.
37
Christel Kemke 37 Morphology Verb Forms 1 - The five verb forms Fig.2.6. The five verb forms. (Allen, 1995, p.28)
38
Christel Kemke 38 Morphology Verb Forms 2 - The basic tenses Fig.2.7. The basic tenses. (Allen, 1995, p.29)
39
Christel Kemke 39 Morphology Verb Forms 3 - The progressive tenses Fig.2.8. The progressive tenses. (Allen, 1995, p.29)
40
40 PastPresentFuture SimpleAn action that ended at a point in the past. An action that exists, is usual, or is repeated. A plan for future action. cookedcook / cookswill cook (time clue)*e.g. He cooked yesterday.e.g. He cooks dinner every Friday.e.g. He will cook tomorrow. Progressive be + main verb +ing An action was happening (past progressive) when another action happened (simple past). An action that is happening now.An action that will be happening over time, in the future, when something else happens. was / were cookingam / is / are cookingwill be cooking (time clue)*e.g. He was cooking when the phone rang. e.g. He is cooking now.e.g. He will be cooking when you come. Perfect have + main verb An action that ended before another action or time in the past. An action that happened at an unspecified time in the past. An action that will end before another action or time in the future. had cookedhas / have cookedwill have cooked (time clue)*e.g. He had cooked the dinner when the phone rang. e.g. He has cooked many meals.e.g. He will have cooked dinner by the time you come. Perfect Progressive have + be + main verb + ing An action that happened over time, in the past, before another time or action in the past. An action occurring over time that started in the past and continues into the present. An action occurring over time, in the future, before another action or time in the future. had been cookinghas / have been cookingwill have been cooking (time clue)*e.g. He had been cooking for a long time before he took lessons. e.g. He has been cooking for over an hour. e.g. He will have been cooking all day by the time she gets home. Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htmhttp://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm
41
Christel Kemke 41 Morphology Open Class Words: Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour- white, green,... age- young, old,... value- good, bad,... not all languages have adjectives as separate class
42
Christel Kemke 42 Morphology Open Class Words: Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowlyorheavily drunk Directional or Locational adverbs specify direction or location e.g. go home, stay here
43
Christel Kemke 43 Morphology Open Class Words: Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday
44
Christel Kemke 44 Morphology Closed Word Classes Closed Class Types: Prepositions: on, under, over, at, from, to, with,... Determiners: a, an, the,... Pronouns: he, she, it, his, her, who, I,... Conjunctions: and, or, as, if, when,... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals: one, two, three,..., first, second,...
45
Christel Kemke 45 Morphology Closed Word Class: Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g.on the table spatial in two hours temporal
46
Christel Kemke 46 Morphology Closed Word Class: Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it,... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its,... Wh-Pronouns reference in question or back reference, e.g. Who did this..., Frieda, who is 80 years old...
47
Christel Kemke 47 Morphology Closed Word Class: Conjunctions Conjunctions join phrases or sentences; semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but,... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog.
48
Christel Kemke 48 Morphology Closed Word Class: Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future,... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessity of action e.g. He can take the cat home. (possible)
49
Christel Kemke 49 Morphology Closed Word Class: Copula, Modal Verbs Copula (be, do, have) and Modal Verbs (can, should,...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility)
50
Christel Kemke 50 Morphology Tagsets and POS Tagging
51
Christel Kemke 51 Morphology POS Tagging - Tagsets Tagsets for English Penn Treebank, 45 tags Brown corpus, 87 tags C5 tagset, 61 tags C7 tagset, 146 tags For references see Jurafsky, p.296 C5 and C7 tagsets are listed in Appendix C
52
52 Fig. 8.6 Penn Treebank, 45 tags
53
Christel Kemke 53 Morphology Ambiguity in POS Tagging Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)
54
Christel Kemke 54 Morphology POS Tagging - Taggers Methods for POS Tagging: Rule-Based Tagging use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun) Stochastic Tagging determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines). Learn tagging rules Problem in POS Tagging: Ambiguity Problem in POS Tagging: Which tag set to use?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.