CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Intro to NLP - J. Eisner1 Finite-State Methods.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
Brief introduction to morphology
Autosegmental Phonology
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
Languages, grammars, and regular expressions
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)
Normal forms for Context-Free Grammars
Turing Machines CS 105: Introduction to Computer Science.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Definition and Properties of the Production Function Lecture II.
Lecture 1 Introduction: Linguistic Theory and Theories
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Finite State Machinery - I Fundamentals Recognisers and Transducers.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Human Language Technology Finite State Transducers.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
 2005 SDU Lecture13 Reducibility — A methodology for proving un- decidability.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
Principles Rules or Constraints
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Computability Review homework. Video. Variations. Definitions. Enumerators. Hilbert's Problem. Algorithms. Summary Homework: Give formal definition of.
Lecture 17 Undecidability Topics:  TM variations  Undecidability June 25, 2015 CSCE 355 Foundations of Computation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Lecture 15: Theory of Automata:2014 Finite Automata with Output.
BİL711 Natural Language Processing
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Composition is Our Friend
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
Writing Lexical Transducers Using xfst
Morphological Parsing
CSCI 5832 Natural Language Processing
Statistical NLP Winter 2009
Presentation transcript:

CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology

CSA405 Lecture 2lev2 The Problem So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example: en + large + ment + s This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc) The problem is that this is an unrealistic assumption to make.

CSA405 Lecture 2lev3 English Spelling Rules Final consonant doubling begin + ing = beginning s to es church + s = churches y to i carry + ed = carried Final e deletion rake + ing = raking n to m in + practical = impractical

CSA405 Lecture 2lev4 Semitic Languages dhalt dahal dahlet dhalna dhaltu dahlu Deletion of vowel Changes or insertion of vowel Non-concatenative morphology [in examples h should be crossed]

CSA405 Lecture 2lev5 Handling Spelling Rules Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon. The solution is to suppose that two strings are involved: The surface string: that which appears on the page The lexical string: that which is used to index items in the lexicon. What kind of mapping exists between the two strings?

CSA405 Lecture 2lev6 Lexical Transformations SURFACE STRING LEXICAL STRING

CSA405 Lecture 2lev7 Phonological Rules Morphological rules are a reflection of phonological changes. Assumption: lexical/surface transformation is rule governed. Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s

CSA405 Lecture 2lev8 Typical Phonological Rule Typical rule has the following shape Phon1 -> Phon2//Lcontext __ Rcontext Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext Example [B] -> [P] // __ # B is pronounced like P if it is word final (cf kelb)

CSA405 Lecture 2lev9 Properties of Phonological Rules within the Generative Tradition Rules are rewrite rules Rules apply sequentially Rules are ordered Rules may act upon their own output (cyclic rules) Effects of rules are not always reversible Collections of rules have Turing power

CSA405 Lecture 2lev10 C. Douglas Johnson (1972) A theory of phonology with the right properties could be implemented using only finite state machinery. Each rule is associated with a finite state transducer (FST). All rules operated in simultaneously, thus eliminating the delicate problems of ordering associated with sequential cascades of rules. The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs.

CSA405 Lecture 2lev11 Finite State Machinery FS Automaton For recognition and generation of regular languages. All operations over regular languages have corresponding operations over corresponding FSAs FS Transducer Like FSAs but with output as well as input For recognition and generation of regular relations. Some operations over regular languages do not have corresponding operations over corresponding FSTs

CSA405 Lecture 2lev12 Kimmo Koskenniemi (1983) Worked on morphology of Finnish and came up with a system of finite state transducers. Came up with a computational framework for executing collections of finite state transducers in parallel.

CSA405 Lecture 2lev13 Koskenniemi’s Model SURFACE STRING LEXICAL STRING FST1 FST2 FST3 … FSTn Interpreter executes round-robin keeping FSTs in lock-step before moving head

CSA405 Lecture 2lev14 Martin Kay and Ron Kaplan (1981) Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing. In particular, they studied the problems of –How to combine FSTs in parallel (computing the intersection of regular relations) –How to combine FSTs in series (computing the composition of FSTs). Restrictions on rules have pleasant consequences

CSA405 Lecture 2lev15 Restrictions on Rules With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs. They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions.

CSA405 Lecture 2lev16 Summary Generative Phonology Chomsky Generative Tradition Multilevel Cascades of Rules Johnson Parallel Rules Kaplan/Kay Calculus Koskiniemmi Parallel Rules KIMMO PC-Kimmo Xerox Tools xfst/twolc/lexc