Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.

Slides:



Advertisements
Similar presentations
Finite-state automata and Morphology
Advertisements

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Morphology.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
1.4 Linguistic signs: Morphemes and lexemes.
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphological analysis
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
LING 438/538 Computational Linguistics Sandiway Fong Lecture 14: 10/12.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Finite State Transducers
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Morphological Analysis Lim Kay Yie Kong Moon Moon Rosaida bt ibrahim Nor hayati bt jamaludin.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Linguistics The ninth week. Chapter 3 Morphology  3.1 Introduction  3.2 Morphemes.
CSA3050: Natural Language Algorithms Finite State Devices.
Natural Language Processing Chapter 2 : Morphology.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
MORPHOLOGY. Morphology The study of internal structure of words, and of the rules by which words are formed.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
MORPHOLOGY : THE STRUCTURE OF WORDS. MORPHOLOGY Morphology deals with the syntax of complex words and parts of words, also called morphemes, as well as.
Morphology 1 : the Morpheme
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
MORPHOLOGY The study of word forms.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Speech and Language Processing
Chapter 3 Morphology Without grammar, little can be conveyed. Without vocabulary, nothing can be conveyed. (David Wilkins ,1972) Morphology refers to.
Morphology: Parsing Words
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Língua Inglesa - Aspectos Morfossintáticos
CPSC 503 Computational Linguistics
Morphological Parsing
Introduction to English morphology
Presentation transcript:

Computational Morphology

Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation Computational approaches to morphology –Finite State transducers –Two level morphology –Koskenniemi’s rule formalism

Morphology S.Ananiadou3 References L. Bauer (1988) Introducing Linguistic Morphology, EUP A. Spencer (1991) Morphological Theory, Blackwells Jurafsky, D. & Martin, J. (2000) Speech and Language Processing, Chapter 3. Koskenniemi, K. & Church, K. (1988) “Complexity, two-level morphology, and Finnish”, in COLING-88, Budapest, pp Ananiadou, S. & McNaught, J. (1986) A Review of Two-level Morphology. Eurotra Research Paper. September 1986.

Morphology S.Ananiadou4 What is morphology? Morphology is the study of the way words are built up from smaller meaning bearing units, morphemes. –‘antiintellectualism’ -anti -ism -al -intellect Free and bound morphemes –intellect (free) –anti- -ism, -al (bound) Stems and affixes Complex words contain a central morpheme, which contributes the basic meaning, and a collection of other morphemes serving to modify this meaning in different ways.

Morphology S.Ananiadou5 ‘disagreements’ agree (stem) dis- -ment -s (affixes) dis- prefix -ment suffix -s suffix English doesn’t stack more than 4-5 affixes, Turkish 10 affixes. Agglutinative language. Two broad classes of ways to form words from morphemes: inflection and derivation. Inflection: is the combination of a word stem with a grammatical morpheme, resulting in a word of the same class –cat-s cats play-ed Derivation: combination of a word stem with a grammatical morpheme, resulting in a word of a different class –agree -ment

Morphology S.Ananiadou6 English Inflectional Morphology English nouns have two kinds of inflection: plural & possessive –cat cats / ibis ibises / finch finches / box boxes –llama’s / children’s / llamas’ English verbal inflection is more complicated –main verbs (eat/sleep) –modal verbs (can / will/ should) –primary verbs (be, have, do) (see Quirk et al: Grammar of English Language) –Regular verbs (walk / walks / walking / walked) –Irregular verbs (eat / eats / eating / ate / eaten)

Morphology S.Ananiadou7 Derivational Morphology Syntactic category changing e.g. nominalization computerize  computerization Suffix Base Noun/Verb/adjective Derived Noun -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness -al computation computational -able like likeable -less clue clueless

Morphology S.Ananiadou8 Derivation is less productive Affixes attach to stems and to each other according to certain constraints Level Ordering in Derivation In English we distinguish 2 types of affixation –class I affixation (+) –class II affixation (#) –Class I occurs before class II I -> ion, ity, ate, ive, ic... II -> y, ly, like, ful, ness, less, hood … danger-ous1-ness2 *fear-less2-ity1 *tender-ness2-ous1

Morphology S.Ananiadou9 Members of the same family may appear in any order with respect to each other fear-less-ness tender-ness-less Ordering Hypothesis about occurrence of morphological processes occur or morphotactics Class I affixation Class II affixation Inflection Compounding

Morphology S.Ananiadou10 Finite State Morphological Parsing Take an input like ‘ cats ’ and produce output forms like ‘ cat +N +PL’ (morphological features) In order to build a morphological parser we need: –lexicon –morphotactics –orthographic rules (spelling rules) model the changes occurring when two morphemes combine e.g. city  cities How to use FSA to model morphotactic information FST as a way of modeling morphological features in the lexicon How to use FSTs to model orthographic rules

Morphology S.Ananiadou11 Lexicon and Morphotactics A lexicon is a repository of words Since we cannot list every word in the language, computational lexicons are structured as a list of stems and affixes with a representation of the morphotactics. One way to model morphotactics is the finite- state automaton q0 q1 q2 Reg-nounPlural Irregular-pl-noun Irreg-sg-noun

Morphology S.Ananiadou12 reg-noun irreg-pl-noun irreg-sg-noun plural fox geese goose -s cat sheep sheep dog mice mouse aardvark reg-verb irreg-verb- irreg-past past past-part pres-part 3sg stem stem verb walk cut caught -ed -ed -ing -s fry speak ate talk sing eaten impeach sang spoken English derivational morphology is more complex than inflectional morphology, automata for modeling are complex

Morphology S.Ananiadou13 Morphotactics for English adjectives big, bigger, biggest cool, cooler, coolest, coolly red, redder, reddesr clear, clearer, clearest, clearly, unclear, unclearly happy, happier, happiest, happily unhappy, unhappier, unhappiest, unhappily real, unreal, really we need to set up classes of roots and specify which can occur with which suffixes –Adj-root1 would include adjectives that can occur with un- and -ly (clear, happy, real) –Adj-root2 will include adjectives that can’t (big, cool, red)

Morphology S.Ananiadou14 An FSA for a fragment of English adjective morphology q0 un- q1 adj-root1 q2 adj-root1  q3 adj-root2 q4 -er -est -er, -ly, -est q5

Morphology S.Ananiadou15 We can use FSAs to solve the problem of morphological recognition; determining whether an input string of letters makes up a legitimate English word or not. –We do this by taking the morphotactic FSAs and plugging in each sub-lexicon into the FSA –we expand each arc (reg-noun-stem arc) with all the morphemes that make up the set of reg-noun-stem.

Morphology S.Ananiadou16 Morphological parsing with FSTs Given input cats, we want output cat + N +PL telling us that cat is a plural noun We do this via two-level morphology (TLM) –TLM represents a word as a correspondence between a lexical level, which represents a simple concatenation of morphemes making up a word, and the surface level, which represents the actual spelling of the final word. –Morphological parsing is implemented by building mapping rules that map letter sequences like cats on the surface level into morpheme and features sequences like cat + N + PL on the lexical level –the automaton used for this mapping is the finite-state transducer or FST

Morphology S.Ananiadou17 FST FST maps sets of symbols via a finite automaton We visualize an FST as a two-tape automaton which recognizes or generates pairs of strings. FST defines a relation between sets of strings; an FST is a machine that reads one string and generates another lexical c a t +N +PL c a t s surface

Morphology S.Ananiadou18 An FST accepts a language over pairs of symbols, as in:  = { a : a, b : b, ! : !, a : !, a : ,  : !} For TLM we view an FST as having two tapes; the upper or lexical tape, is composed from characters from the left side of the a : b pairs, the lower or surface tape, is composed of characters from the right side of the a : b pairs. Dictionary, text: each consist of a sequence of items –items of the dictionary are expressed according to an alphabet which consists of {a…z}, 0 (empty character), + morpheme boundary character, set of archi-phonemes e.g. S for {s, z} –items of text are expressed by a subset of this alphabet {a…z, 0}

Morphology S.Ananiadou19 We can build an FST morphological parser out of a morphotactic FSA and lexica by adding an extra lexical tape and the appropriate morphological features q0 Reg-noun-stem q1 +N:  q4 +PL: ^s# q7 Irreg-sg-noun-f q2 q5 +SG:# +N:  Irreg-pl-noun-f q3 q6 +PL:# +N: 

Morphology S.Ananiadou20 Koskeniemmi’s work In this model, all FST’s treating individual phenomena operate in parallel –so rule ordering and interactions between rules is not necessary part of morphological description –all FSTs share the same two heads but otherwise operate completely independently –heads move at the same time –to have an overall correspondence between lexical and surface string, two heads must have reached the end of two strings, and all the FSTs must be in a final state –when all FSTs agree, a correspondence is reached –if only one FST blocks while scanning the two strings then the proposed correspondence is rejected

Morphology S.Ananiadou21 ……… t r i e s ………… FST1 FST2 FST3 … FSTn ………… t r Y + s ….. text d o g 0 s surface tape      FST dictionary      lexical tape d o g + S sequence of mappings  d,d   o,o   g,g   0,+   s,S  the morpheme boundary + corresponds to nothing on the surface; the S archiphoneme / grapheme corresponds to surface s.

Morphology S.Ananiadou22 Koskenniemi’s rule formalism The general form of a rule is CP op LC --- RC –CP = correspondence part; this is a concrete or abstract character pair whose occurrence is restricted by the rule –op = an operator, one of four types; four types of rules –LC, RC = left context, right context The Rules  Exclusion rule: a : b /  LC - RC a may not be realised as b, in the context LC-RC a:b not allowed in given context

Morphology S.Ananiadou23  Context restriction rule a:b  LC --RC a may be realised as b only in the given context, and nowhere else; a:b allowed in given context  Surface coercion rule a:b  LC-RC a must be realised as b in the given context; a:b required in given context  Composite rule a:b  LC--RC this rule is a combination of context restriction and surface coercion; a lexical a must correspond to surface b in the given context, and this correspondence is licit only in that context; a:b required in given context and nowhere else

Morphology S.Ananiadou24 Example of Koskenniemi’s rule formalism Treats epenthesis in English –Epenthesis: a morpheme boundary +, is realised as an ‘e’ on the surface when it follows ‘ch’, ‘sh’, ‘s’, ‘x’, ‘z’ or ‘y/i’ and occurs before an ‘s’. Otherwise the lexical character + corresponds to 0 on the surface (empty string) –foxes, churches, spies (+:e) –+/e  { { c | s (h) } | S | y/i} --s CP op LC RC –CP, LC, RC consist of sequences of pairs, the first member of a pair drawn from the lexical alphabet, the second from the surface alphabet