Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging.

Similar presentations


Presentation on theme: " Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging."— Presentation transcript:

1  Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging

2  Christel Kemke 2 Morphology Overview  Morphology  Stemming  Word Classes  POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)

3  Christel Kemke 3 Morphology

4  Christel Kemke 4 Morphology Morphemes and Words Morpheme = "minimal meaning-bearing unit in a language" Combine morphemes to create words Inflection combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb) Derivation combination of a word stem with a grammatical morpheme Yields different word class, e.g. clean (verb), clean-ing (noun) Compounding combination of multiple word stems Cliticization combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve = I + have

5  Christel Kemke 5 Morphology Inflectional Morphology word stem + grammatical morphemecat + s only for nouns, verbs, and some adjectives Nouns plural: regular: +s, +es irregular: mouse - mice; ox - oxen rules for exceptions: e.g. -y -> -ieslike: butterfly - butterflies possessive: +'s, +' Verbs main verbs (sleep, eat, walk) modal verbs (can, will, should) primary verbs (be, have, do)

6  Christel Kemke 6 Morphology Inflectional Morphology (verbs) Verb Inflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form stemwalkmerge trymap -s formwalksmerges triesmaps -ing participlewalkingmerging tryingmapping past; -ed participlewalkedmerged triedmapped Morph. FormIrregularly Inflected Form stemeatcatch cut -s formeatscatches cuts -ing participleeatingcatching cutting -ed pastatecaught cut -ed participle eatencaught cut

7  Christel Kemke 7 Morphology Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: prefixun-unhappyadjective, negation suffix-lyhappilyadverb, mode -erhappieradjective, comparative 1 -esthappiestadjective, comparative 2 suffix-nesshappinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.

8  Christel Kemke 8 Morphology Inflectional Morphology

9  Christel Kemke 9 Morphology Noun Inflections

10  Christel Kemke 10 Morphology Verb Inflections

11  Christel Kemke 11 Morphology Derivational Morphology

12  Christel Kemke 12 Morphology Noun Derivation

13  Christel Kemke 13 Morphology Adjective Derivation

14  Christel Kemke 14 Morphology Clitics

15  Christel Kemke 15 Morphology Verb Clitics

16  Christel Kemke 16 Morphology Methods, Algorithms

17  Christel Kemke 17 Morphology Stemming Stemming algorithms strip off word affixes yield stem only, no additional information (like plural, 3 rd person etc.) used, e.g. in web search engines famous stemming algorithm: the Porter stemmer

18  Christel Kemke 18 Morphology Stemming Methods Rule-based stemming Example rules: ATIONAL → ATE e.g., relational → relate ING →  if stem contains vowel, e.g., motoring → motor

19  Christel Kemke 19 Morphology Stemming Problems Errors of ComissionErrors of Omission organizationorganEuropeanEurope doingdoeanalysisanalyzes GeneralizationGenericMatricesmatrix NumericalnumerousNoisenoisy Policypolicesparsesparsity

20  Christel Kemke 20 Morphology Tokenization, Word Segmentation Tokenization or word segmentation separate out “words” (lexical entries) from running text expand abbreviated terms E.g. I’m into I am, it’s into it is collect tokens forming single lexical entry E.g. New York marked as one single entry

21  Christel Kemke 21 Morphology Tokenization, Word Segmentation Finite state transducer (FST) Modifies input string (rules) Recognizes (stored) abbreviations and composite words See Fig.3.22 in Jurafsky, Ch.3 More of an issue in languages like Chinese

22  Christel Kemke 22 Morphology Lemmatization Lemmatization maps words with same root but different surface appearances onto the same lexeme e.g. buys, bought, buying -> buy

23  Christel Kemke 23 Morphology Morphological Processing

24  Christel Kemke 24 Morphology Word Reccognition Spelling Errors Mark non-words based on dictionary/lexicon Use “minimum editing distance” Dynamic programming Table-based Transform operations deletion, substitution, insertion Calculate minimum path Morphological Parser = FST

25  Christel Kemke 25 Morphology Morphological Processing Knowledge lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs orthographic rules: spelling, e.g. double consonant as in mapping Processing: Finite State Transducers take information above and analyze word token / generate word form

26  Christel Kemke 26 Morphology Fig. 3.3FSA for verb inflection.

27  Christel Kemke 27 Morphology Fig. 3.5More detailed FSA for adjective inflection. Fig. 3.4Simple FSA for adjective inflection.

28  Christel Kemke 28 Morphology Fig. 3.7 Compiled FSA for noun inflection.

29  Christel Kemke 29 Morphology Fig. 3.12 Lexical and intermediate tape of a FS Transducer Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.

30  Christel Kemke 30 Morphology Word Classes and POS Tagging

31  Christel Kemke 31 Morphology Word Classes Sort words into categories according to: morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities

32  Christel Kemke 32 Morphology Open vs. Closed Word Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs

33  Christel Kemke 33 Morphology Open vs. Closed Word Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals

34  Christel Kemke 34 Morphology Open Class Words: Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race,... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom

35  Christel Kemke 35 Morphology Open Class Words: Verbs Verbs denote actions, processes, and states, e.g. smoke, dream, rest, run several morphological forms, e.g. non-3rd person-eat, sleep 3rd person-eats, sleeps, progressive/-eating, sleeping present participle/ gerundive past participle-eaten, slept simple past -ate, slept

36  Christel Kemke 36 Morphology Open Class Words: Verbs (2) non-3rd personeatI eat. We eat. They eat. 3rd personeatsHe eats. She eats. It eats. progressiveeatingHe is eating. He will be eating. He has been eating. e.g. present participleHe is eating. gerundiveEating scorpions [NP] is common in China. use as adjectiveEating children [NP] are common at McDonalds. past participleeatenHe has eaten the scorpion. The scorpion was eaten. simple past ateHe ate the scorpion.

37  Christel Kemke 37 Morphology Verb Forms 1 - The five verb forms Fig.2.6. The five verb forms. (Allen, 1995, p.28)

38  Christel Kemke 38 Morphology Verb Forms 2 - The basic tenses Fig.2.7. The basic tenses. (Allen, 1995, p.29)

39  Christel Kemke 39 Morphology Verb Forms 3 - The progressive tenses Fig.2.8. The progressive tenses. (Allen, 1995, p.29)

40 40 PastPresentFuture SimpleAn action that ended at a point in the past. An action that exists, is usual, or is repeated. A plan for future action. cookedcook / cookswill cook (time clue)*e.g. He cooked yesterday.e.g. He cooks dinner every Friday.e.g. He will cook tomorrow. Progressive be + main verb +ing An action was happening (past progressive) when another action happened (simple past). An action that is happening now.An action that will be happening over time, in the future, when something else happens. was / were cookingam / is / are cookingwill be cooking (time clue)*e.g. He was cooking when the phone rang. e.g. He is cooking now.e.g. He will be cooking when you come. Perfect have + main verb An action that ended before another action or time in the past. An action that happened at an unspecified time in the past. An action that will end before another action or time in the future. had cookedhas / have cookedwill have cooked (time clue)*e.g. He had cooked the dinner when the phone rang. e.g. He has cooked many meals.e.g. He will have cooked dinner by the time you come. Perfect Progressive have + be + main verb + ing An action that happened over time, in the past, before another time or action in the past. An action occurring over time that started in the past and continues into the present. An action occurring over time, in the future, before another action or time in the future. had been cookinghas / have been cookingwill have been cooking (time clue)*e.g. He had been cooking for a long time before he took lessons. e.g. He has been cooking for over an hour. e.g. He will have been cooking all day by the time she gets home. Verb Tense Chart. From: http://www.athabascau.ca/courses/engl/155/support/verb_tenses.htmhttp://www.athabascau.ca/courses/engl/155/support/verb_tenses.htm

41  Christel Kemke 41 Morphology Open Class Words: Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour- white, green,... age- young, old,... value- good, bad,... not all languages have adjectives as separate class

42  Christel Kemke 42 Morphology Open Class Words: Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowlyorheavily drunk Directional or Locational adverbs specify direction or location e.g. go home, stay here

43  Christel Kemke 43 Morphology Open Class Words: Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday

44  Christel Kemke 44 Morphology Closed Word Classes Closed Class Types: Prepositions: on, under, over, at, from, to, with,... Determiners: a, an, the,... Pronouns: he, she, it, his, her, who, I,... Conjunctions: and, or, as, if, when,... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals: one, two, three,..., first, second,...

45  Christel Kemke 45 Morphology Closed Word Class: Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g.on the table spatial in two hours temporal

46  Christel Kemke 46 Morphology Closed Word Class: Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it,... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its,... Wh-Pronouns reference in question or back reference, e.g. Who did this..., Frieda, who is 80 years old...

47  Christel Kemke 47 Morphology Closed Word Class: Conjunctions Conjunctions join phrases or sentences; semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but,... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog.

48  Christel Kemke 48 Morphology Closed Word Class: Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future,... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessity of action e.g. He can take the cat home. (possible)

49  Christel Kemke 49 Morphology Closed Word Class: Copula, Modal Verbs Copula (be, do, have) and Modal Verbs (can, should,...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility)

50  Christel Kemke 50 Morphology Tagsets and POS Tagging

51  Christel Kemke 51 Morphology POS Tagging - Tagsets Tagsets for English  Penn Treebank, 45 tags  Brown corpus, 87 tags  C5 tagset, 61 tags  C7 tagset, 146 tags For references see Jurafsky, p.296 C5 and C7 tagsets are listed in Appendix C

52 52 Fig. 8.6 Penn Treebank, 45 tags

53  Christel Kemke 53 Morphology Ambiguity in POS Tagging Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)

54  Christel Kemke 54 Morphology POS Tagging - Taggers Methods for POS Tagging: Rule-Based Tagging use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun) Stochastic Tagging determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines). Learn tagging rules Problem in POS Tagging: Ambiguity Problem in POS Tagging: Which tag set to use?


Download ppt " Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging."

Similar presentations


Ads by Google