November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.

Slides:



Advertisements
Similar presentations
Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Beesley 2000 Introduction to the xfst Interface Review Introduction to Morphology Relations and Transducers Introduction to xfst.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.
Beesley 2001 Finite-State Technology and Linguistic Applications March 2001 Xerox Research Centre Europe Grenoble Laboratory 6, chemin de Maupertuis.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Writing Lexical Transducers Using xfst
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
Brief introduction to morphology
Autosegmental Phonology
Context-Free Grammars Lecture 7
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Introduction to English Morphology Finite State Transducers
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute August 1, 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Computer Science Department Data Structure & Algorithms Problem Solving with Stack.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
CSC312 Automata Theory Lecture # 2 Languages.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2010.
Finite State Transducers for Morphological Parsing
Finite State Machinery - I Fundamentals Recognisers and Transducers.
Introduction to Parsing
CS 182 Sections slides created by Eva Mok modified by JGM March
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic By: Mohammed A. Attia Abbas Al-Julaih Natural Language Processing ICS.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
Morphological typology
Natural Language Processing Chapter 2 : Morphology.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
BİL711 Natural Language Processing
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
G. Pullaiah College of Engineering and Technology
Composition is Our Friend
Modeling Arithmetic, Computation, and Languages
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Prepare to partition your brain to learn a whole new formalism.
CSCI 5832 Natural Language Processing
Finite Automata and Formal Languages
Writing Lexical Transducers Using xfst
Advanced Filtering and Flag Diacritics
languages & relations regular expressions finite-state networks
Morphological Parsing
COMPILER CONSTRUCTION
Presentation transcript:

November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation

November 2003Computational Morphology VI2 Reference Ken Beesely and Lauri Karttunen, Finite State Non-Concatenative Morphotactics, Proceedings of SIGPHON-2000

November 2003Computational Morphology VI3 Koskenniemi 1983 "Only restricted infixation and reduplication can be handled adequately with the present system. Some extensions or revisions will be necessary for an adequate description of languages possessing extensive infixation or reduplication"

November 2003Computational Morphology VI4 Non-Concatenative Languages Most languages build words by stringing together morphemes like beads on a string. The word-building processes of prefixation and suffixation can be straightforwardly modeled in finite state terms by concatenation. But some languages also exhibit non- concatenative morphotactics.

November 2003Computational Morphology VI5 Non-Concatenative Phenomena 1. Reduplication In Malay bagi (bag) bagi-bagi (bags) Although this may appear concatenative, it does not involve concatenating a predictible morpheme – like "s". Instead the entire stem is copied no matter what its length. In general language class (ww | w  L) is context sensitive, but if L is finite, we can construct an FS network that encodes it.

November 2003Computational Morphology VI6 General Solution for Reduplication Therefore, assuming the number of words subject to reduplication is finite, it is possible to construct a lexical transducer for languages like Malay. To handle reduplication, a new operator ^n is introduced: A^n denotes n concatenations of A.

November 2003Computational Morphology VI7 Remarks from Beesley on Context Sensitivity finite-state grammars (cannot handle unlimited nesting or non-nested terminal dependencies) context-free (can handle unlimited nesting, such as matched parentheses in arithmetic expressions, but cannot handle non-nested dependencies between terminals) context-sensitive (can also handle non-nested dependencies between terminals, as in dogdog where terminal elements 1 and 4 have to be the same, 2 and 5 have to be the same, and 3 and 6 have to be the same. These dependencies cross, so they're not nested.

November 2003Computational Morphology VI8 Non-Concatenation 2. Interdigitation In Arabic and Maltese, prefixes and suffixes attach to stems in the usual concatenative way, but stems themselves are formed by a process known as interdigitation. An example of occurs with the Arabic stem "katab" (wrote). This stem is composed of three elements 1.the all consonant root ktb 2.an abstract consonant-vowel template CVCVC 3.a vocalisation aa (in this case signifying perfect tense and active voice)

November 2003Computational Morphology VI9 Interdigitation The same root ktb can combine with the same template CVCVC and a different vocalism ui (signifying imperfect aspect and passive voice) to produce "kutib" (was written). The same root ktb can combine with a different template CVVCVC and the vocalism ui to produce "kuutib" – another form of the verb.

November 2003Computational Morphology VI10 Intermediate Result: Template + Root d v v r v s

November 2003Computational Morphology VI11 Final Result: Intermediate Result + Vocalism d u u r i s

November 2003Computational Morphology VI12 Merge In this case the filler language contains an infinite set of strings (i, ui, uui …) but only one path can be constructed because all strings end in i. Hence the earlier vowels must be "u". This need not always be the case (eg if the filler language were u*i*).

November 2003Computational Morphology VI13 Merge Operators To introduce the merge operation into the Xerox calculus new operators,.. have been introduced. These differ only in the order of arguments. [T.. T] represent the same merge operation with F and T as filler and template respectively.

November 2003Computational Morphology VI14 The Composite Transducer With these operators the network above can be compiled by using the following expression: [d r s].m>. [C V V C V C].<m. [u* i]

November 2003Computational Morphology VI15 Merge c v v c v c d r s i u template vocalism root

November 2003Computational Morphology VI16 Compile-Replace Regular expressions are compiled into networks as usual, but in addition, the compiler is then applied to its own output. Central idea: –transduce to a language that has the format of regular expressions. –The compile-replace algorithm then replaces the regular expression with the result of its own compilation.

November 2003Computational Morphology VI17 Compile Replace Simple Example 0:^[ a * 0:^] This network maps the string a* to ^[ a* ^] (i.e. the same RE but with special delimiters) Application of CR to the lower side of the network eliminates the markers, compile the RE a* and maps the upper side to to the language resulting from the compilation.

November 2003Computational Morphology VI18 The result of compiling ^[ a* ^] a *:a a:0 *:0 0:a To answer the question: what does this network do? Figure out what it does in upward and downward directions

November 2003Computational Morphology VI19 The result of compiling ^[ a* ^] a *:a a:0 *:0 0:a When applied in the upward direction, this transducer maps any string of the infinite a* language into the regular expression from which it was compiled. When applied in the downward direction, it maps from a* to all the strings in the language a*, {0, a, aa,...}

November 2003Computational Morphology VI20 Compile-Replace: 1 Copy input path to output path until ^[ is encountered on indicated (in our case lower) side of the network. Extract path until closing delimiter ^]. 0:^[ a * 0:^] a:a *:*

November 2003Computational Morphology VI21 Compile-Replace: 2 Symbols along indicated side are concatenated into a string and eliminated from the path leaving just the symbols on the opposite side. The remaining net is The extracted string is compiled into a second network using the standard network compiler a * a

November 2003Computational Morphology VI22 Compile-Replace: 3 The 2 networks are combined together using the cross product operator. The result is spliced between the origin and destination states of the regular expression path. a * a a *:a a:0 *:0 0:a

November 2003Computational Morphology VI23 Reduplication Revisited Applying compile-replace to this transducer Lexical: b a g i +Noun +Plural Surface: ^[ [b a g i] ^ 2 ^] yields this one Lexical: b a g i +Noun +Plural Surface: b a g i b a g i

November 2003Computational Morphology VI24 Interdigitation Revisited Applying compile-replace to this transducer Up: k i t e b +Verb +Past +3Sg Do:[k t b].m>. [C V C V C].<m. [i e] yields this one Up: k i t e b +Verb +Past +3Sg Do: k i t e b

November 2003Computational Morphology VI25 Remember: Two Central Problems Morphotactics: constraints on combinations of morphemes governing the formation of valid words. unbelievable vs. believeunable Phonological/Orthographical Alternation (spelling rules): how morphemes are realised in particular environments fly + s = flies

November 2003Computational Morphology VI26 Xerox Perspective Morphotactics: handle with lexc Phonological/Orthographical Alternation (spelling rules): handle with xfst Morphotactics Rules FST Lexicon FST Lexical Transducer Alternations.o. lexc xfst