Morphology 1 Introduction Morphology Morphological Analysis (MA)

Slides:



Advertisements
Similar presentations
Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Writing Lexical Transducers Using xfst
Brief introduction to morphology
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
Morphology I. Basic concepts and terms Derivational processes
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
Morphological analysis
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Towards unsupervised induction of morphophonological rules Erwin Chan University of Pennsylvania Morphochallenge workshop 19 Sept 2007.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Ch4 – Features Consider the following data from Mokilese
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
Reasons to Study Lexicography  You love words  It can help you evaluate dictionaries  It might make you more sensitive to what dictionaries have in.
Finite State Transducers
Chapter 3: Morphology and Finite State Transducer Heshaam Faili University of Tehran.
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic By: Mohammed A. Attia Abbas Al-Julaih Natural Language Processing ICS.
CSA3050: Natural Language Algorithms Finite State Devices.
Natural Language Processing Chapter 2 : Morphology.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
Chapter 3 Word Formation I This chapter aims to analyze the morphological structures of words and gain a working knowledge of the different word forming.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Composition is Our Friend
Morphology: Parsing Words
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Writing Lexical Transducers Using xfst
Morphological Parsing
Presentation transcript:

Morphology 1 Introduction Morphology Morphological Analysis (MA) Using FS techniques in MA Automatic learning of the morphology of a language Introduction: marc general i la motivació d’aquest treball

Morphology 2 Morphology Result of morphologic analysis Structure of a word as a composition of morphemes Related to word formation rules Functions Flexion Derivation Composition Result of morphologic analysis Morphosyntactic categorization (POS) e.g. Parole tagset (VMIP1S0), more than 150 categories for Spanish e.g. Penn Treebank tagset (VBD), about 30 categories for English Morphological features Number, case, gender, lexical functions

Morphology 3 Morphologic analysis Problems Decompose a word into a concatenation of morphemes Usually some of the morphemes contain the meaning One (root or stem) in flexion and derivation More than one in composition The other (affixes) provide morphological features Problems Phonological alterations in morpheme concatenation Morphotactics Which morphemes can be concatenated with which others

Morphology 4 Problems Affixes flexive Affixes  derivative Affixes Suffixes, prefixes, infixes, interfixes flexive Affixes  derivative Affixes Derivation implies sometimes a semantic change not always predictible Meaning extensions Lexical rules A derivative suffix can be followed by a flexive suffix love => lover => lovers Flexion does not change POS, sometimes derivation does Flexion affects other words in the sentence agreement

Morphology 5 Morphotactics Phonological alterations (Morphophonology) Word formation rules Valid combinations between morphemes Simple concatenation Complex models root/pattern Regularity language dependent Phonological alterations (Morphophonology) Changes when concatenating morphemes Source: Phonology, morphology, orthography variable in number and complexity e.g. vocalic harmony

Morphology 6 1 morpheme: 2 morphemes: 3 morphemes: 4 morphemes: evitar evitable = evitar + able 3 morphemes: inevitable = in + evitar + able 4 morphemes: inevitabilidad = in + evitar + able + idad

Morphology 7 number verbal form gender house houses cheval chevaux Flexive Morphology number house houses cheval chevaux casa casas verbal form walk walkes walked walking amo amas aman ... gender niño niña

Morphology 8 Form Source Derivative Morphology Without change barcelonés Prefix inevitable Suffix importantísimo Source verb => adjective tardar => tardío verb => noun sufrir => sufrimiento noun => noun actor => actorazo noun => adjective atleta => atlético adjective => adjective rojo => rojizo adjective => adverb alegre => alegremente

Morphological Analysis 1 Types of morphological analyzers Formaries Dictionaries of word forms efficiency Languages with few variants (e.g. English) extensibility Possibility of building and maintenance from a morphological generator Languages with high flexive variation derivation, composition FS techniques FSA 1 level analyzers FST > 1 level analyzers

Morphological Analysis 2 2 levels morphological analyzers General model for languages with morpheme concatenation Independence between lingware and analyzer Valid for analysis and generation Distinction between lexical and superficial levels Parallel rules for morphophonology Simple implementation

Morphological Analysis 3 Morphological rules Define the relations betweens characters (surface) and morphemes and map strings of characters and the morphemic structure of the word. Spelling rules Perform at the level of the letters forming the word. Can be used to define the valid phomological alterations. Ritchie, Pulman, Black, Russell, 1987

Morphological Analysis 4 input: form output lemma + morphological features Input Output cat cat + N + sg cats cat + N + pl cities city + N + pl merging merge + V + pres_part caught (catch + V + past) or (catch + V + past_part)

Morphological Analysis 5 reg_noun irreg_pl_noun irreg_sg_noun plural fox sheep sheep -s cat mice mouse dog 1 2 reg_noun plural (-s) irreg_pl_noun irreg_sg_noun Morphotactics

Morphological Analysis 6 f o x s  c a t d g n e y m u i fog cat dog donkey mouse mice Letter Transducers

Morphological Analysis 7 upper level lexic cat + N cat + N + pl lower level surface cat cats c:c a:a t:t +N: +pl:s

Morphological Analysis 8 Using FST As a recognizer From a pair of input strings (one lexical and the other superficial) and answers if one is transduction of the other. As a generator generated pairs of strings As a translator From a superficial string generates its lexical transduction

Morphological Analysis 9 reg_noun irreg_pl_noun irreg_sg_noun plural fox sheep sheep s cat m o:i u: ce mouse dog g o:e o:e se goose 1 2 reg_noun +pl:s irreg_pl_noun irreg_sg_noun 3 4 5 6 +N: +sg: +pl:

Morphological Analysis 10 lexical level f o x +N +pl intermediate level f o x ^ s superficial level f o x e s morphotactics spelling rules

Morphological Analysis 11 f o x c a t d g n e y m u s o:i +N: +pl:^s +sg: +u: +pl: fog cat dog donkey mouse mice

Morphological Analysis 12 Spelling rules name description example consonant doubling single letter consonant beg/begging doubled before -ing/-ed e deletion silent e dropped before -ing/-ed make/making e insertion e added after -s,-z,-x,-ch,-sh before -s watch/watches y replacement -y changes to -ie before -s, to i before -ed try/tries k insertion verbs ending with voyel +c add -k panic/panicked

Morphological Analysis 13 Spelling rules: e-insertion :e  [xsz]^: ___ s#  decomposition / :e [xsz]^: ___ s# : / [xsz]^: ___ s#

Morphological Analysis 14 epenthesis + : e <=> {< {s:s c:c} h:h> s:s x:x z:z} --- s:s context <=> => context restriction <= surface coercion C: {...} V: {a,e,i,o,u,y} C2: {...} =: whatever example: box + s box e s

Morphological Analysis 15 e-deletion e : 0 <=> = :C2 --- <+:0 V:= > or <C:C V:V> --- < +:0 e:e > or <c:c g:g> --- < +:0 {e:e i:i} > or l:0 --- +:0 or c:c --- < +:0 a:0 t:t b:b> mov e + ed mov ed agre e + ed agre ed

Morphological Analysis 16 a-deletion a : 0 <=> <c:c e:0 +:0> --- t:t redu c e + a t ion redu c t ion ... contexto izdo foco contexto ... dcho

Morphological Analysis 17 lexical level f o x +N +pl Lexicon-FST intermediate level f o x ^ s spelling rules FST1 FST2 ... FSTn superficial level f o x e s

Morphological Analysis 18 Lexicon-FST Lexicon-FST Lexicon-FST • FSTA FST1 ... FSTn FSTA= FST1  ...  FSTn intersection composition

Automatic morphology learning 1 Problem Paradigm stem + affixea Obtaining the stems Classification of stems into models Learning part of the morphology (e.g. derivational) Two approaches No previous morphologic knowledge is available Goldsmith, 2001 Brent, 1999 Snover, Brent, 2001, 2002 Morphologic knowledge can be used Oliver at al, 2002

Automatic morphology learning 2 Automatic morphological analysis Identification of borders betwen morphemes Zellig Harris {prefix, suffix} conditional entropy bigrams and trigrams with high probability of forming a morpheme Learning of patterns or rules of mapping between pairs of words Global approach (top-down) Golsdmith, Brent, de Marcken

Automatic morphology learning 3 Goldsmith’s system based on MDL (Minimum Description Length) Initial Partition: word -> stem + suffix split-all-words A good candidate to {stem, suffix} splitting in a word has to be a good candidate in many other words MI (mutual information) strategy Faster convergence Learning Signatures {signatures, stem, suffixes} MDL

Automatic morphology learning 4 Semi-automatic morphological analysis Oliver, 2004 Starts with a set of manually written morphological rules TL:TF:Desc lemma ending form ending POS Lists of non flexive classes , closed classes and irregular words Corpora Serbo-Croatian 9 Mw Russian 16 Mw