Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.

Slides:



Advertisements
Similar presentations
Morphology Reading: Chap 3, Jurafsky & Martin Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide.
Advertisements

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphology.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
 Christel Kemke 1 Morphology COMP 4060 Natural Language Processing Morphology, Word Classes, POS Tagging.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Morphology Nuha Alwadaani.
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Brief introduction to morphology
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Natural Language Processing - English Grammar -
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
Morphological analysis
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 2 26 July 2007.
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Morphology: Words and their Parts CS 4705 Slides adapted from Jurafsky, Martin Hirschberg and Dorr.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Finite State Transducers
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Linguistics The ninth week. Chapter 3 Morphology  3.1 Introduction  3.2 Morphemes.
CSA3050: Natural Language Algorithms Finite State Devices.
Natural Language Processing Chapter 2 : Morphology.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Speech and Language Processing
Chapter 6 Morphology.
Natural Language Processing
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Morphological Parsing
Introduction to Linguistics
Presentation transcript:

Morphological Analysis Chapter 3

Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using morphemes –base form (stem,lemma), e.g., believe –affixes (suffixes, prefixes, infixes), e.g., un-, -able, -ly Morphological parsing = the task of recognizing the morphemes inside a word –e.g., hands, foxes, children Important for many tasks –machine translation, information retrieval, etc. –Parsing, text simplification, etc 2

Morphemes and Words Combine morphemes to create words  Inflection  combination of a word stem with a grammatical morpheme  same word class, e.g. clean (verb), clean-ing (verb)  Derivation  combination of a word stem with a grammatical morpheme  Yields different word class, e.g delight (verb), delight-ful (adj)  Compounding  combination of multiple word stems  Cliticization  combination of a word stem with a clitic  different words from different syntactic categories, e.g. I’ve = I + have 3

Inflectional Morphology word stem + grammatical morphemecat + s only for nouns, verbs, and some adjectives Nouns  plural:  regular: +s, +es irregular: mouse - mice; ox - oxen  many spelling rules: e.g. -y -> -ieslike: butterfly - butterflies  possessive: +'s, +' Verbs  main verbs (sleep, eat, walk)  modal verbs (can, will, should)  primary verbs (be, have, do) 4

Inflectional Morphology (verbs) Verb Inflections for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. FormRegularly Inflected Form stemwalkmerge trymap -s formwalksmerges triesmaps -ing participlewalkingmerging tryingmapping past; -ed participlewalkedmerged triedmapped Morph. FormIrregularly Inflected Form stemeatcatch cut -s formeatscatches cuts -ing participleeatingcatching cutting -ed pastatecaught cut -ed participle eatencaught cut 5

Noun Inflections for: regular nouns (cat, hand); irregular nouns(child, ox) Morpholog. FormRegularly Inflected Form stemcathand plural formcatshands Morph. FormIrregularly Inflected Form stemchildox plural formchildrenoxen Inflectional Morphology (nouns) 6

Inflectional and Derivational Morphology (adjectives) Adjective Inflections and Derivations: prefixun-unhappyadjective, negation suffix-lyhappilyadverb, manner suffix -ier, -iest happier, happiest comparatives suffix-nesshappinessnoun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big. 7

Derivational Morphology (nouns) 8

Derivational Morphology (adjectives) 9

Verb Clitics 10

11 Morpholgy and FSAs We’d like to use the machinery provided by FSAs to capture these facts about morphology  Recognition:  Accept strings that are in the language  Reject strings that are not  In a way that doesn’t require us to in effect list all the words in the language

12 Computational Lexicons Depending on the purpose, computational lexicons have various types of information  Between FrameNet and WordNet, we saw POS, word sense, subcategorization, semantic roles, and lexical semantic relations  For our purposes now, we care about stems, irregular forms, and information about affixes

13 Starting Simply Let’s start simply:  Regular singular nouns listed explicitly in lexicon  Regular plural nouns have an -s on the end  Irregulars listed explicitly too

14 Simple Rules

15 Now Plug in the Words Recognition of valid words But “foxs” isn’t right; we’ll see how to fix that

16 Parsing/Generation vs. Recognition We can now run strings through these machines to recognize strings in the language But recognition is usually not quite what we need  Often if we find some string in the language we might like to assign a structure to it (parsing)  Or we might have some structure and we want to produce a surface form for it (production/generation) Example  From “cats” to “cat +N +PL”

17 Finite State Transducers Add another tape Add extra symbols to the transitions On one tape we read “cats”, on the other we write “cat +N +PL”

18 FSTs

19 Applications The kind of parsing we’re talking about is normally called morphological analysis It can either be An important stand-alone component of many applications (spelling correction, information retrieval) Or simply a link in a chain of further linguistic analysis

20 Transitions c:c means read a c on one tape and write a c on the other +N:ε means read a +N symbol on one tape and write nothing on the other +PL:s means read +PL and write an s c:ca:at:t +N: ε + PL:s

21 Ambiguity Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. Didn’t matter which path was actually traversed In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result

22 Ambiguity What’s the right parse (segmentation) for Unionizable Union-ize-able Un-ion-ize-able Each represents a valid path through the morphology machine.

23 Ambiguity There are a number of ways to deal with this problem Simply take the first output found Find all the possible outputs (all paths) and return them all (without choosing) Bias the search so that only one or a few likely paths are explored

24 The Gory Details Of course, its not as easy as “cat +N +PL” “cats” As we saw earlier there are geese, mice and oxen But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes Fox and Foxes vs. Cat and Cats

25 Multi-Tape Machines To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols

26 Multi-Level Tape Machines We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape #

27 Intermediate to Surface The add an “e” rule as in fox^s# --> foxes#

28 Lexical to Intermediate Level

29 Foxes # This arrow should point straight down

30 Notes The transducers may be run in the other direction too (examples in lecture) The transducers are cascaded: The output of one layer serves as the input to the next

31 Overall Scheme # We aren’t covering the overall scheme in any more detail than this