CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology
CSA405 Lecture 2lev2 The Problem So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example: en + large + ment + s This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc) The problem is that this is an unrealistic assumption to make.
CSA405 Lecture 2lev3 English Spelling Rules Final consonant doubling begin + ing = beginning s to es church + s = churches y to i carry + ed = carried Final e deletion rake + ing = raking n to m in + practical = impractical
CSA405 Lecture 2lev4 Semitic Languages dhalt dahal dahlet dhalna dhaltu dahlu Deletion of vowel Changes or insertion of vowel Non-concatenative morphology [in examples h should be crossed]
CSA405 Lecture 2lev5 Handling Spelling Rules Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon. The solution is to suppose that two strings are involved: The surface string: that which appears on the page The lexical string: that which is used to index items in the lexicon. What kind of mapping exists between the two strings?
CSA405 Lecture 2lev6 Lexical Transformations SURFACE STRING LEXICAL STRING
CSA405 Lecture 2lev7 Phonological Rules Morphological rules are a reflection of phonological changes. Assumption: lexical/surface transformation is rule governed. Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s
CSA405 Lecture 2lev8 Typical Phonological Rule Typical rule has the following shape Phon1 -> Phon2//Lcontext __ Rcontext Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext Example [B] -> [P] // __ # B is pronounced like P if it is word final (cf kelb)
CSA405 Lecture 2lev9 Properties of Phonological Rules within the Generative Tradition Rules are rewrite rules Rules apply sequentially Rules are ordered Rules may act upon their own output (cyclic rules) Effects of rules are not always reversible Collections of rules have Turing power
CSA405 Lecture 2lev10 C. Douglas Johnson (1972) A theory of phonology with the right properties could be implemented using only finite state machinery. Each rule is associated with a finite state transducer (FST). All rules operated in simultaneously, thus eliminating the delicate problems of ordering associated with sequential cascades of rules. The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs.
CSA405 Lecture 2lev11 Finite State Machinery FS Automaton For recognition and generation of regular languages. All operations over regular languages have corresponding operations over corresponding FSAs FS Transducer Like FSAs but with output as well as input For recognition and generation of regular relations. Some operations over regular languages do not have corresponding operations over corresponding FSTs
CSA405 Lecture 2lev12 Kimmo Koskenniemi (1983) Worked on morphology of Finnish and came up with a system of finite state transducers. Came up with a computational framework for executing collections of finite state transducers in parallel.
CSA405 Lecture 2lev13 Koskenniemi’s Model SURFACE STRING LEXICAL STRING FST1 FST2 FST3 … FSTn Interpreter executes round-robin keeping FSTs in lock-step before moving head
CSA405 Lecture 2lev14 Martin Kay and Ron Kaplan (1981) Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing. In particular, they studied the problems of –How to combine FSTs in parallel (computing the intersection of regular relations) –How to combine FSTs in series (computing the composition of FSTs). Restrictions on rules have pleasant consequences
CSA405 Lecture 2lev15 Restrictions on Rules With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs. They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions.
CSA405 Lecture 2lev16 Summary Generative Phonology Chomsky Generative Tradition Multilevel Cascades of Rules Johnson Parallel Rules Kaplan/Kay Calculus Koskiniemmi Parallel Rules KIMMO PC-Kimmo Xerox Tools xfst/twolc/lexc