 Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Advertisements

Morphology Reading: Chap 3, Jurafsky & Martin Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Morphology.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Parts of speech & Lexical Categories
Ana Bertha Camargo Mejía
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Brief introduction to morphology
Sentence Structure By: Lisa Crawford, Edited by: UWC staff
1 Words and the Lexicon September 10th 2009 Lecture #3.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Natural Language Processing - English Grammar -
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Stemming, tagging and chunking Text analysis short of parsing.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
NLP and Speech 2004 English Grammar
Morphological analysis
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Outline of English Syntax.
Grammatical frameworks Inflectional morphology. Grammar In the Middle Ages, grammatica […] chiefly meant the knowledge or study of Latin, and were hence.
Chapter 2 A rapid overview.
Introduction to English Morphology Finite State Transducers
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Clauses and Moods by Prashanth Kamle
Glossing – Lesson 3 Omit English words that do not exist in ASL.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Paradigm based Morphological Analyzers Dr. Radhika Mamidi.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Linguistic levels of structure Sound Phoneme Morpheme Word Phrase Clause Sentence Meaning ð iː z b juː t ə f ʊ l w ɪ m ɪ n s ɛ d w iː w ɜː t r uː m ɛ n.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Parts of Speech (Lexical Categories). Parts of Speech n Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) n The building blocks of sentences n The.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Natural Language Processing
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Morphological typology
Verb phrases Main reference: Randolph Quirk and Sidney Greenbaum, A University Grammar of English, Longman: London, (3.23 – 3.55)
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
Unit 6 Unit 6 Morphology 1. 2 It is a branch of linguistics which is concerned with  the relation between meaning and form, within words and between.
Narrative tenses are the grammatical structures that you use when telling a story, or talking about situations and activities which happened at a defined.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Word classes and part of speech tagging Chapter 5.
Expanding verb phrases
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Non-finite forms of the verb
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
عمادة التعلم الإلكتروني والتعليم عن بعد
LIN1300 What is language? Dr Marie-Claude Tremblay 1.
Chapter 3 Morphology Without grammar, little can be conveyed. Without vocabulary, nothing can be conveyed. (David Wilkins ,1972) Morphology refers to.
Chapter 6 Morphology.
Telegraphic speech: two- and three-word utterances
Morphology.
Dr. Bill Vicars Lifeprint.com
Morphological Parsing
Introduction to English morphology
Introduction to Linguistics
Presentation transcript:

 Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging

 Christel Kemke 2 Morphology Overview  Morphology  Stemming  Word Classes  POS Tagging (Jurafsky, 2 nd edition, Ch. 2, 3, 5; Allen Ch. 2,3)

 Christel Kemke 3 Morphology

 Christel Kemke 4 Morphology Morphemes and Words Morpheme = "minimal meaning-bearing unit in a language" Combine morphemes to create words Inflection combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb) Derivation combination of a word stem with a grammatical morpheme Yields different word class, e.g. clean (verb), clean-ing (noun) Compounding combination of multiple word stems Cliticization combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve = I + have

 Christel Kemke 5 Morphology Inflectional Morphology word stem + grammatical morphemecat + s only for nouns, verbs, and some adjectives Nouns plural: regular: +s, +es irregular: mouse - mice; ox - oxen rules for exceptions: e.g. -y -> -ieslike: butterfly - butterflies possessive: +'s, +' Verbs main verbs (sleep, eat, walk) modal verbs (can, will, should) primary verbs (be, have, do)

 Christel Kemke 6 Morphology Inflectional Morphology (verbs) Verb Inflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form stemwalkmerge trymap -s formwalksmerges triesmaps -ing participlewalkingmerging tryingmapping past; -ed participlewalkedmerged triedmapped Morph. FormIrregularly Inflected Form stemeatcatch cut -s formeatscatches cuts -ing participleeatingcatching cutting -ed pastatecaught cut -ed participle eatencaught cut

 Christel Kemke 7 Morphology Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: prefixun-unhappyadjective, negation suffix-lyhappilyadverb, mode -erhappieradjective, comparative 1 -esthappiestadjective, comparative 2 suffix-nesshappinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.

 Christel Kemke 8 Morphology Inflectional Morphology

 Christel Kemke 9 Morphology Noun Inflections

 Christel Kemke 10 Morphology Verb Inflections

 Christel Kemke 11 Morphology Derivational Morphology

 Christel Kemke 12 Morphology Noun Derivation

 Christel Kemke 13 Morphology Adjective Derivation

 Christel Kemke 14 Morphology Clitics

 Christel Kemke 15 Morphology Verb Clitics

 Christel Kemke 16 Morphology Methods, Algorithms

 Christel Kemke 17 Morphology Stemming Stemming algorithms strip off word affixes yield stem only, no additional information (like plural, 3 rd person etc.) used, e.g. in web search engines famous stemming algorithm: the Porter stemmer

 Christel Kemke 18 Morphology Stemming Methods Rule-based stemming Example rules: ATIONAL → ATE e.g., relational → relate ING →  if stem contains vowel, e.g., motoring → motor

 Christel Kemke 19 Morphology Stemming Problems Errors of ComissionErrors of Omission organizationorganEuropeanEurope doingdoeanalysisanalyzes GeneralizationGenericMatricesmatrix NumericalnumerousNoisenoisy Policypolicesparsesparsity

 Christel Kemke 20 Morphology Tokenization, Word Segmentation Tokenization or word segmentation separate out “words” (lexical entries) from running text expand abbreviated terms E.g. I’m into I am, it’s into it is collect tokens forming single lexical entry E.g. New York marked as one single entry

 Christel Kemke 21 Morphology Tokenization, Word Segmentation Finite state transducer (FST) Modifies input string (rules) Recognizes (stored) abbreviations and composite words See Fig.3.22 in Jurafsky, Ch.3 More of an issue in languages like Chinese

 Christel Kemke 22 Morphology Lemmatization Lemmatization maps words with same root but different surface appearances onto the same lexeme e.g. buys, bought, buying -> buy

 Christel Kemke 23 Morphology Morphological Processing

 Christel Kemke 24 Morphology Word Reccognition Spelling Errors Mark non-words based on dictionary/lexicon Use “minimum editing distance” Dynamic programming Table-based Transform operations deletion, substitution, insertion Calculate minimum path Morphological Parser = FST

 Christel Kemke 25 Morphology Morphological Processing Knowledge lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs orthographic rules: spelling, e.g. double consonant as in mapping Processing: Finite State Transducers take information above and analyze word token / generate word form

 Christel Kemke 26 Morphology Fig. 3.3FSA for verb inflection.

 Christel Kemke 27 Morphology Fig. 3.5More detailed FSA for adjective inflection. Fig. 3.4Simple FSA for adjective inflection.

 Christel Kemke 28 Morphology Fig. 3.7 Compiled FSA for noun inflection.

 Christel Kemke 29 Morphology Fig Lexical and intermediate tape of a FS Transducer Fig Lexical, intermediate, and surface tape after spelling transformation.

 Christel Kemke 30 Morphology Word Classes and POS Tagging

 Christel Kemke 31 Morphology Word Classes Sort words into categories according to: morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities

 Christel Kemke 32 Morphology Open vs. Closed Word Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs

 Christel Kemke 33 Morphology Open vs. Closed Word Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals

 Christel Kemke 34 Morphology Open Class Words: Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race,... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom

 Christel Kemke 35 Morphology Open Class Words: Verbs Verbs denote actions, processes, and states, e.g. smoke, dream, rest, run several morphological forms, e.g. non-3rd person-eat, sleep 3rd person-eats, sleeps, progressive/-eating, sleeping present participle/ gerundive past participle-eaten, slept simple past -ate, slept

 Christel Kemke 36 Morphology Open Class Words: Verbs (2) non-3rd personeatI eat. We eat. They eat. 3rd personeatsHe eats. She eats. It eats. progressiveeatingHe is eating. He will be eating. He has been eating. e.g. present participleHe is eating. gerundiveEating scorpions [NP] is common in China. use as adjectiveEating children [NP] are common at McDonalds. past participleeatenHe has eaten the scorpion. The scorpion was eaten. simple past ateHe ate the scorpion.

 Christel Kemke 37 Morphology Verb Forms 1 - The five verb forms Fig.2.6. The five verb forms. (Allen, 1995, p.28)

 Christel Kemke 38 Morphology Verb Forms 2 - The basic tenses Fig.2.7. The basic tenses. (Allen, 1995, p.29)

 Christel Kemke 39 Morphology Verb Forms 3 - The progressive tenses Fig.2.8. The progressive tenses. (Allen, 1995, p.29)

40 PastPresentFuture SimpleAn action that ended at a point in the past. An action that exists, is usual, or is repeated. A plan for future action. cookedcook / cookswill cook (time clue)*e.g. He cooked yesterday.e.g. He cooks dinner every Friday.e.g. He will cook tomorrow. Progressive be + main verb +ing An action was happening (past progressive) when another action happened (simple past). An action that is happening now.An action that will be happening over time, in the future, when something else happens. was / were cookingam / is / are cookingwill be cooking (time clue)*e.g. He was cooking when the phone rang. e.g. He is cooking now.e.g. He will be cooking when you come. Perfect have + main verb An action that ended before another action or time in the past. An action that happened at an unspecified time in the past. An action that will end before another action or time in the future. had cookedhas / have cookedwill have cooked (time clue)*e.g. He had cooked the dinner when the phone rang. e.g. He has cooked many meals.e.g. He will have cooked dinner by the time you come. Perfect Progressive have + be + main verb + ing An action that happened over time, in the past, before another time or action in the past. An action occurring over time that started in the past and continues into the present. An action occurring over time, in the future, before another action or time in the future. had been cookinghas / have been cookingwill have been cooking (time clue)*e.g. He had been cooking for a long time before he took lessons. e.g. He has been cooking for over an hour. e.g. He will have been cooking all day by the time she gets home. Verb Tense Chart. From:

 Christel Kemke 41 Morphology Open Class Words: Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour- white, green,... age- young, old,... value- good, bad,... not all languages have adjectives as separate class

 Christel Kemke 42 Morphology Open Class Words: Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowlyorheavily drunk Directional or Locational adverbs specify direction or location e.g. go home, stay here

 Christel Kemke 43 Morphology Open Class Words: Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday

 Christel Kemke 44 Morphology Closed Word Classes Closed Class Types: Prepositions: on, under, over, at, from, to, with,... Determiners: a, an, the,... Pronouns: he, she, it, his, her, who, I,... Conjunctions: and, or, as, if, when,... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals: one, two, three,..., first, second,...

 Christel Kemke 45 Morphology Closed Word Class: Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g.on the table spatial in two hours temporal

 Christel Kemke 46 Morphology Closed Word Class: Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it,... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its,... Wh-Pronouns reference in question or back reference, e.g. Who did this..., Frieda, who is 80 years old...

 Christel Kemke 47 Morphology Closed Word Class: Conjunctions Conjunctions join phrases or sentences; semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but,... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog.

 Christel Kemke 48 Morphology Closed Word Class: Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future,... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessity of action e.g. He can take the cat home. (possible)

 Christel Kemke 49 Morphology Closed Word Class: Copula, Modal Verbs Copula (be, do, have) and Modal Verbs (can, should,...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility)

 Christel Kemke 50 Morphology Tagsets and POS Tagging

 Christel Kemke 51 Morphology POS Tagging - Tagsets Tagsets for English  Penn Treebank, 45 tags  Brown corpus, 87 tags  C5 tagset, 61 tags  C7 tagset, 146 tags For references see Jurafsky, p.296 C5 and C7 tagsets are listed in Appendix C

52 Fig. 8.6 Penn Treebank, 45 tags

 Christel Kemke 53 Morphology Ambiguity in POS Tagging Fig. 8.7 Ambiguity in tagging. The left column classifies words according to the number of tags, which can be used for them. The right column shows how many words fall into each class. E.g. there are 264 words which can be tagged with 3 different POS tags, and 1 word (“still”) which has 7 possible tags. (based on the Brown Corpus)

 Christel Kemke 54 Morphology POS Tagging - Taggers Methods for POS Tagging: Rule-Based Tagging use dictionary to assign POS; then use rules to disambiguate different POS/word classes (e.g. book as verb or noun) Stochastic Tagging determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines). Learn tagging rules Problem in POS Tagging: Ambiguity Problem in POS Tagging: Which tag set to use?