October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.

Slides:



Advertisements
Similar presentations
Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011.
5/16/ ICS 482 Natural Language Processing Words & Transducers-Morphology - 1 Muhammed Al-Mulhem March 1, 2009.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
Morphological analysis
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Morphology 1 Introduction Morphology Morphological Analysis (MA)
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Chapter 3: Morphology and Finite State Transducer
Finite State Transducers
Chapter 3: Morphology and Finite State Transducer Heshaam Faili University of Tehran.
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Morphological Processing & Stemming Using FSAs/FSTs.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Human Language Technology Finite State Transducers.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
Natural Language Processing Chapter 2 : Morphology.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
Composition is Our Friend
Morphology: Parsing Words
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Writing Lexical Transducers Using xfst
CPSC 503 Computational Linguistics
Morphological Parsing
Presentation transcript:

October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006Advanced Topics in NLP2 Acknowledgement This lecture is largely based on material from Jurafsky & Martin chapter 3

October 2006Advanced Topics in NLP3 Resumé FSAs are equivalent to regular languages FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels. We can use FSTs to transduce between surface and lexical levels.

October 2006Advanced Topics in NLP4 Morphological Parsing Given the input cats, we’d like to output cat +N +Pl, telling us that cat is a plural noun. Given the Spanish input bebo, we’d like to output beber +V +PInd +1P +Sg telling us that bebo is the present indicative first person singular form of the Spanish verb beber, ‘to drink’.

October 2006Advanced Topics in NLP5 Two-Level Paradigm from Jurafsky & Martin

October 2006Advanced Topics in NLP6 English Plural surfacelexical catcat+N+Sg catscat+N+Pl foxesfox+N+Pl micemouse+N+Pl sheepsheep+N+Pl sheep+N+Sg

October 2006Advanced Topics in NLP7 Morphological Anlayser To build a morphological analyser we need: lexicon: the list of stems and affixes, together with basic information about them morphotactics: the model of morpheme ordering (eg English plural morpheme follows the noun rather than a verb) orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies)

October 2006Advanced Topics in NLP8 Lexicon & Morphotactics Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms. For this to be possible the word parts must first be classified into sublexicons. The FSA defines the morphotactics (ordering constraints).

October 2006Advanced Topics in NLP9 Sublexicons to classify the list of word parts reg-nounirreg-pl- noun irreg-sg- noun plural catmicemouse-s foxsheep geesegoose

October 2006Advanced Topics in NLP10 FSA Expresses Morphotactics (ordering model)

October 2006Advanced Topics in NLP11 Towards the Analyser We can use lexc or xfst to build such an FSA (see lex1.lexc) To augment this to produce an analysis we must create a transducer T num which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English.

October 2006Advanced Topics in NLP12 Three Levels of Analysis

October 2006Advanced Topics in NLP13 1. T num : Noun Number Inflection multi-character symbols morpheme boundary ^ word boundary #

October 2006Advanced Topics in NLP14 Towards the Analyser We do this by first allowing the lexicon itself to also have two levels. Since surface geese maps to lexical goose, the new lexical entry will be “g:g o:e o:e s:s e:e” (see lex2.lexc) We must also add the appropriate morphological labels (see lex3.lexc)

October 2006Advanced Topics in NLP15 Intermediate Form to Surface The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g. cat^s  cats fox^s  foxes fly^s  flies The rules which describe these changes are called orthographic rules or "spelling rules".

October 2006Advanced Topics in NLP16 More English Spelling Rules consonant doubling: beg / begging y replacement: try/tries k insertion: panic/panicked e deletion: make/making e insertion: watch/watches Each rule can be stated in more detail...

October 2006Advanced Topics in NLP17 Spelling Rules Chomsky & Halle (1968) invented a special notation for spelling rules. A very similar notation is embodied in the "conditional replacement" rules of xfst. E -> F || L _ R which means replace E with F when it appears between left context L and right context R

October 2006Advanced Topics in NLP18 A Particular Spelling Rule This rule does e-insertion ^ -> e || x _ s#

October 2006Advanced Topics in NLP19 e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels

October 2006Advanced Topics in NLP20 e insertion as an FST

October 2006Advanced Topics in NLP21 Incorporating Spelling Rules Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". The set of spelling rules is positioned between the surface level and the intermediate level. Parallel execution of FSTs can be carried out: –by simulation: in this case FSTs must first be aligned. –by first constructing a a single FST corresponding to their intersection.

October 2006Advanced Topics in NLP22 Putting it all together execution of FST i takes place in parallel

October 2006Advanced Topics in NLP23 Kaplan and Kay The Xerox View FSTi are aligned but separate FSTi intersected together