1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 2. Some Basic Concepts: Morphology.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphology.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Introduction to Linguistics n About how many words does the average 17 year old know?
Brief introduction to morphology
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Language is very difficult to put into words. -- Voltaire What do we mean by “language”? A system used to convey meaning made up of arbitrary elements.
Morphology How to build words. What is a morpheme? Morphology is the organization of morphemes into words. –The morpheme is the smallest meaningful (invested.
Morphology I. Basic concepts and terms Derivational processes
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphological analysis
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
CS 4705 Morphology: Words and their Parts CS 4705.
Introduction to English Morphology Finite State Transducers
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Tokenization and Morphology COMP3310 Natural Language Processing Eric Atwell,
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
323 Morphology The Structure of Words 1.1 What is Morphology? Morphology is the internal structure of words. V: walk, walk+s, walk+ed, walk+ing N: dog,
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Morphology: Words and their Parts CS 4705 Slides adapted from Jurafsky, Martin Hirschberg and Dorr.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Ch4 – Features Consider the following data from Mokilese
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 2. Linguistic Essentials and Text Mining Preliminaries.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
WEEK3- MORPHOLOGY Dr. Monira I. Al-Mohizea. What is this?
Morphology An Introduction to the Structure of Words Lori Levin and Christian Monson Grammars and Lexicons Fall Term, 2004.
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Morphology!. But puns first.  In partnership with Gabe  Have you seen Ken Burns' new documentary on the impact of yeast on agricultural societies? 
Morphological typology
Natural Language Processing Chapter 2 : Morphology.
MORPHOLOGY definition; variability among languages.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
Chapter 3 Word Formation I This chapter aims to analyze the morphological structures of words and gain a working knowledge of the different word forming.
Morphology 1 : the Morpheme
INTRODUCTION ADE SUDIRMAN, S.Pd ENGLISH DEPARTMENT MATHLA’UL ANWAR UNIVERSITY.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CSC 594 Topics in AI – Natural Language Processing
عمادة التعلم الإلكتروني والتعليم عن بعد
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Speech and Language Processing
Chapter 6 Morphology.
Grammar Workshop Thursday 9th June.
CSC 594 Topics in AI – Natural Language Processing
Morphology: Parsing Words
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
Token generation - stemming
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Língua Inglesa - Aspectos Morfossintáticos
Morphological Parsing
Introduction to English morphology
Introduction to Linguistics
Presentation transcript:

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Some Basic Concepts: Morphology

2 Levels of Language Analysis 1.Phonology study of sound systems of languages 2.Morphology study of structure of words: the structure of words in a language, including patterns of inflections and derivations 3.Syntax study of organization of words in sentences: the ordering of and relationship between the words in phrases and sentences 4.Semantics study of meaning in language: the study of how meaning in language is created 5.Pragmatics study of language in use: the branch of linguistics that studies language use rather than language structure 6.Discourse study of language, especially the type of language used in a particular context or subject 7.World Knowledge

3 Some English phonemes Source: IPA chart for English

4 2. Morphology The study of how words are composed of morphemes (the smallest meaning-bearing units of a language) Two broad classes of morphemes: –Stems: “main” morpheme of the word, supplying meaning –Affixes: Bits and pieces that combine with stems to modify their meanings and grammatical functions (prefixes, suffixes, circumfixes, infixes) Unlike Trying Multiple affixes – Unreadable Source: Joyce Choi, CSE 842, Michigan State University

5 Ways to Form Words Inflection: new forms of the same word (usually in the same class) –Tense, number, mood, voice marking in verbs –Number, gender marking in nominals –Comparison of adjectives Derivation: yield different words in different class –Deverbal nominals –Denominal adjectives and verbs Compounding: new words out of two or more other words –Noun-noun compounding (e.g., doghouse) Cliticization: combine a word with a clitic (which acts syntactically like a word but in a reduced form, e.g., I’ve) Source: Joyce Choi, CSE 842, Michigan State University

6 English Inflectional Morphology Word stem combines with grammatical morpheme –Usually produces word of same class –Usually serves a grammatical role that the stem could not (e.g. agreement) like -> likes or liked bird -> birds Nouns have a simple inflectional morphology: markers for plural and markers for possessives Verbs are slightly more complex: Source: Joyce Choi, CSE 842, Michigan State University

7 Nominal Inflection Nominal morphology –Plural forms s or es Irregular forms, e.g., Goose/Geese, Mouse/Mice –Possessives children’s Source: Joyce Choi, CSE 842, Michigan State University

8 Verbal Inflection Main verbs (walk, like) are relatively regular –-s, ing, ed –And productive: ed, instant-messaged, faxed –But eat/ate/eaten, catch/caught/caught Primary (be, have, do) and modal verbs (can, will, must) are often irregular and not productive –Be: am/is/are/were/was/been/being Irregular verbs few (~250) but frequently occurring English verbal inflection is much simpler than e.g. Latin Source: Joyce Choi, CSE 842, Michigan State University

9

10 English Derivational Morphology Word stem combines with grammatical morpheme –Usually produces word of different class –More complicated than inflectional Example: nominalization – -ize verbs -> -ation nouns – generalize, realize -> generalization, realization Example: verbs, nouns -> adjectives –embrace, pity-. embraceable, pitiable –care, wit -> careless, witless Source: Joyce Choi, CSE 842, Michigan State University

11 Example: adjective -> adverb –happy -> happily More complicated to model than inflection –Less productive: *science-less, *concern-less, *go-able, *sleep- able –Meanings of derived terms harder to predict by rule Source: Joyce Choi, CSE 842, Michigan State University

12 Morphological Parsing Takes a surface input and identifying its components and underlying structure Morphological parsing: parsing a word into stem (actually the base/infinitive form) and affixes and identifying the parts and their relationships –Base form and features: goose -> goose +N +SG or goose + V geese -> goose +N +PL gooses -> goose +V +3SG

13 Inflectional Morphology Adds: –tense, number, person, mood, aspect Word class doesn’t change Word serves new grammatical role Examples –come is inflected for person and number: The pizza guy comes at noon. –las and rojas are inflected for agreement with manzanas in grammatical gender by -a and in number by –s las manzanas rojas (‘the red apples’) Source: Marti Hearst, i256, at UC Berkeley

14 Morphological Analysis Tools Porter stemmer –A simple approach: just hack off the end of the word! Does NOT convert a word to its base form!!! –Frequently used in Information Retrieval, but results are pretty ugly! Source: Marti Hearst, i256, at UC Berkeley Original ***************************** Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a nonexecutive director of this British industrial conglomerate. A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, Results ******************************* Rudolph Agnew, 55 year old and former chairman of Consolid Gold Field PLC, wa name a nonexecut director of thi British industri conglomer. A form of asbesto onc use to make Kent cigarett filter ha caus a high percentag of cancer death among a group of worker expos to it more than 30 year ago,

15 Morphological Analysis Tools WordNet’s morphy() –A slightly more sophisticated approach –Use an understanding of inflectional morphology Uses a set of Rules of Detachment Use an Exception List for irregulars Handle collocations in a special way –Do the transformation, compare the result to the WordNet dictionary –If the transformation produces a real word, then keep it, else use the original word. –For more details, see Source: Marti Hearst, i256, at UC Berkeley

16 Some morphy() output >>> wntools.morphy('dogs') 'dog' >>> wntools.morphy('running', pos='verb') 'run' >>> wntools.morphy('corpora') 'corpus' >>> Source: Marti Hearst, i256, at UC Berkeley

17 Morphological Analysis Tools Very sophisticated programs have been developed Use a techniqued called Two-Level Phonology –Has been applied to numerous languages Best known: PCKimmo –After Kimmo Koskenniemi, based in part on work by Lauri Kartunnen in 1983 –Uses: A rules file which specifies the alphabet and the phonological (or spelling) rules, A lexicon file which lists lexical items and encodes morphotactic constraints. – Commercial versions are available –inXight’s LinguistX version based on technology developed by Kaplan and others from Xerox PARC (or at least used to be) Source: Marti Hearst, i256, at UC Berkeley