Natural Language Processing Chapter 3 : Morphological Analysis
Definition Morphology is the study of word formation – how words are built up from smaller pieces. When we do morphological analysis, then, we’re asking questions like, what pieces does this word have? What does each of them mean? How are they combined? Goal : Given a word that’s not in the dictionary, can we derive a root form that is in the dictionary. Morphological analysis is the process of recognizing the root form and type of a morphological variant (prefix, suffix). Given a word W: 4/25/2017 NLP
1- If W is in the dictionary, then return its definition. Algorithm 1- If W is in the dictionary, then return its definition. 2- Else apply morphology rules to identify possible root forms of W. - Each morphology rule strips a prefix or suffix from W, and sometimes adds back replacement characters, to produce a possible root form. If the root form is in the dictionary, success! - When a morphology rule succeeds, the root word definition is returned along with properties of the morphological variant. Rules must be applied recursively! Multiple derivations are common! 4/25/2017 NLP
Sample Morphology Rules 4/25/2017 NLP
Sample Morphology Rules 4/25/2017 NLP
Example of Morphological Analysis 4/25/2017 NLP
Example of Morphological Analysis 4/25/2017 NLP
Example of Morphological Analysis 4/25/2017 NLP
Basic Parts of Speech Parts of Speech: adjective, adverb, article, conjunction, noun, verb, preposition, pronoun, ... A closed class is a class that contains a relatively fixed set of words; new words are rarely introduced into the language. Ex: articles, conjunctions, pronouns, prepositions, ... An open class is a class that contains a constantly changing set of words; new words are often introduced into the language (that readily accept new members) Ex: adjectives, adverbs, nouns, verbs Examples of Closed Classes Articles: a, an, the Conjunctions: and, but, or, ... Demonstratives: this, that, these, ... 4/25/2017 NLP
Basic Parts of Speech Prepositions: to, for, with, between, at, of, ... Pronouns: I, you, me, we, he, she, him, her, ... Quantifiers كلمات غير محددة الكمية ) some, every, most, any, both, ... Articles Articles are especially problematic for natural language generation. Many noun phrases begin with an article. Ex: a newspaper, an apple, the movie But there are many exceptions, for example: The bowl was full of rice. -The bowl was full of apple. I go to college. - I go to university. She went on vacation. - She went on trip. 4/25/2017 NLP
Basic Parts of Speech Nouns Nouns: Words that represent objects, places, concepts, events. Ex: dog, city, idea, marathon Proper nouns : names of persons, city Count nouns: describe specific objects or sets of objects. Ex: dogs, cities, ideas, marathons Mass nouns: describe composites or substances. Ex: dirt, water, garbage, deer Modifiers Adjectives: words that attribute qualities to objects. Ex: wet, loud, happy, funny Noun modifiers: nouns that modify other nouns. Ex: dog food, aluminum can, song book 4/25/2017 NLP
Prepositions and Particles Basic Parts of Speech Prepositions and Particles Prepositions represent relationships, such as time, location, modification, and complements. For example: He put the book on the table. Sam gave the book to Mary. Jane walked up the stairs. Particles follow verbs and create a new meaning. For example: Greg passed out. Charlie threw up his lunch. Sometimes there is preposition/particle ambiguity: Sarah looked over the paper. 4/25/2017 NLP
Basic Parts of Speech Verbs Verbs: represent actions, commands, or assertions. Main verbs: walk, eat, believe, claim, ask, ... Auxiliary verbs: be, do, have Modals: would, should, could, can, will, may, .. Transitive verbs: take a direct object complement. Ex: eat an apple, read a book, sing a song Intransitive verbs: do not take a direct object. Ex: she laughed, he lied, I slept. Bitransitive verbs: take both a direct object and an indirect object.. I gave Mary a gift. She sang the baby a lullaby. 4/25/2017 NLP
part of speech tagging Tagging :The process of assigning a part-of-speech or other lexical class marker to each word in a corpus. Example : the girl kissed baby on cheek WORDS TAGS N V P DET 4/25/2017 NLP
part of speech tagging WORD LEMMA TAG the girl kissed baby on cheek +DET +NOUN +VPAST +PREP 4/25/2017 NLP
part of speech tagging 4/25/2017 NLP
Rule-Based Tagging Basic Idea: Assign all possible tags to words Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv. Typically more than 1000 hand-written rules, but may be machine-learned. 4/25/2017 NLP
Stochastic Tagging Based on probability of certain tag occurring given various possibilities Requires a training corpus No probabilities for words not in corpus. Training corpus may be different from test corpus. 4/25/2017 NLP
Transformation-Based Tagging (Brill Tagging) Combination of Rule-based and stochastic tagging methodologies Like rule-based because rules are used to specify tags in a certain environment Like stochastic approach because machine learning is used—with tagged corpus as input Input: tagged corpus dictionary (with most frequent tags) Usually constructed from the tagged corpus Basic Idea: Set the most probable tag for each word as a start value Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order 4/25/2017 NLP
Transformation-Based Tagging (Brill Tagging) Training is done on tagged corpus: Write a set of rule templates Among the set of rules, find one with highest score Continue from 2 until lowest score threshold is passed Keep the ordered set of rules 4/25/2017 NLP