Download presentation
Presentation is loading. Please wait.
Published byAustyn Haxby Modified over 9 years ago
1
Computational Morphology
2
Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation Computational approaches to morphology –Finite State transducers –Two level morphology –Koskenniemi’s rule formalism
3
Morphology S.Ananiadou3 References L. Bauer (1988) Introducing Linguistic Morphology, EUP A. Spencer (1991) Morphological Theory, Blackwells Jurafsky, D. & Martin, J. (2000) Speech and Language Processing, Chapter 3. Koskenniemi, K. & Church, K. (1988) “Complexity, two-level morphology, and Finnish”, in COLING-88, Budapest, pp.335-339. Ananiadou, S. & McNaught, J. (1986) A Review of Two-level Morphology. Eurotra Research Paper. September 1986.
4
Morphology S.Ananiadou4 What is morphology? Morphology is the study of the way words are built up from smaller meaning bearing units, morphemes. –‘antiintellectualism’ -anti -ism -al -intellect Free and bound morphemes –intellect (free) –anti- -ism, -al (bound) Stems and affixes Complex words contain a central morpheme, which contributes the basic meaning, and a collection of other morphemes serving to modify this meaning in different ways.
5
Morphology S.Ananiadou5 ‘disagreements’ agree (stem) dis- -ment -s (affixes) dis- prefix -ment suffix -s suffix English doesn’t stack more than 4-5 affixes, Turkish 10 affixes. Agglutinative language. Two broad classes of ways to form words from morphemes: inflection and derivation. Inflection: is the combination of a word stem with a grammatical morpheme, resulting in a word of the same class –cat-s cats play-ed Derivation: combination of a word stem with a grammatical morpheme, resulting in a word of a different class –agree -ment
6
Morphology S.Ananiadou6 English Inflectional Morphology English nouns have two kinds of inflection: plural & possessive –cat cats / ibis ibises / finch finches / box boxes –llama’s / children’s / llamas’ English verbal inflection is more complicated –main verbs (eat/sleep) –modal verbs (can / will/ should) –primary verbs (be, have, do) (see Quirk et al: Grammar of English Language) –Regular verbs (walk / walks / walking / walked) –Irregular verbs (eat / eats / eating / ate / eaten)
7
Morphology S.Ananiadou7 Derivational Morphology Syntactic category changing e.g. nominalization computerize computerization Suffix Base Noun/Verb/adjective Derived Noun -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness -al computation computational -able like likeable -less clue clueless
8
Morphology S.Ananiadou8 Derivation is less productive Affixes attach to stems and to each other according to certain constraints Level Ordering in Derivation In English we distinguish 2 types of affixation –class I affixation (+) –class II affixation (#) –Class I occurs before class II I -> ion, ity, ate, ive, ic... II -> y, ly, like, ful, ness, less, hood … danger-ous1-ness2 *fear-less2-ity1 *tender-ness2-ous1
9
Morphology S.Ananiadou9 Members of the same family may appear in any order with respect to each other fear-less-ness tender-ness-less Ordering Hypothesis about occurrence of morphological processes occur or morphotactics Class I affixation Class II affixation Inflection Compounding
10
Morphology S.Ananiadou10 Finite State Morphological Parsing Take an input like ‘ cats ’ and produce output forms like ‘ cat +N +PL’ (morphological features) In order to build a morphological parser we need: –lexicon –morphotactics –orthographic rules (spelling rules) model the changes occurring when two morphemes combine e.g. city cities How to use FSA to model morphotactic information FST as a way of modeling morphological features in the lexicon How to use FSTs to model orthographic rules
11
Morphology S.Ananiadou11 Lexicon and Morphotactics A lexicon is a repository of words Since we cannot list every word in the language, computational lexicons are structured as a list of stems and affixes with a representation of the morphotactics. One way to model morphotactics is the finite- state automaton q0 q1 q2 Reg-nounPlural Irregular-pl-noun Irreg-sg-noun
12
Morphology S.Ananiadou12 reg-noun irreg-pl-noun irreg-sg-noun plural fox geese goose -s cat sheep sheep dog mice mouse aardvark reg-verb irreg-verb- irreg-past past past-part pres-part 3sg stem stem verb walk cut caught -ed -ed -ing -s fry speak ate talk sing eaten impeach sang spoken English derivational morphology is more complex than inflectional morphology, automata for modeling are complex
13
Morphology S.Ananiadou13 Morphotactics for English adjectives big, bigger, biggest cool, cooler, coolest, coolly red, redder, reddesr clear, clearer, clearest, clearly, unclear, unclearly happy, happier, happiest, happily unhappy, unhappier, unhappiest, unhappily real, unreal, really we need to set up classes of roots and specify which can occur with which suffixes –Adj-root1 would include adjectives that can occur with un- and -ly (clear, happy, real) –Adj-root2 will include adjectives that can’t (big, cool, red)
14
Morphology S.Ananiadou14 An FSA for a fragment of English adjective morphology q0 un- q1 adj-root1 q2 adj-root1 q3 adj-root2 q4 -er -est -er, -ly, -est q5
15
Morphology S.Ananiadou15 We can use FSAs to solve the problem of morphological recognition; determining whether an input string of letters makes up a legitimate English word or not. –We do this by taking the morphotactic FSAs and plugging in each sub-lexicon into the FSA –we expand each arc (reg-noun-stem arc) with all the morphemes that make up the set of reg-noun-stem.
16
Morphology S.Ananiadou16 Morphological parsing with FSTs Given input cats, we want output cat + N +PL telling us that cat is a plural noun We do this via two-level morphology (TLM) –TLM represents a word as a correspondence between a lexical level, which represents a simple concatenation of morphemes making up a word, and the surface level, which represents the actual spelling of the final word. –Morphological parsing is implemented by building mapping rules that map letter sequences like cats on the surface level into morpheme and features sequences like cat + N + PL on the lexical level –the automaton used for this mapping is the finite-state transducer or FST
17
Morphology S.Ananiadou17 FST FST maps sets of symbols via a finite automaton We visualize an FST as a two-tape automaton which recognizes or generates pairs of strings. FST defines a relation between sets of strings; an FST is a machine that reads one string and generates another lexical c a t +N +PL c a t s surface
18
Morphology S.Ananiadou18 An FST accepts a language over pairs of symbols, as in: = { a : a, b : b, ! : !, a : !, a : , : !} For TLM we view an FST as having two tapes; the upper or lexical tape, is composed from characters from the left side of the a : b pairs, the lower or surface tape, is composed of characters from the right side of the a : b pairs. Dictionary, text: each consist of a sequence of items –items of the dictionary are expressed according to an alphabet which consists of {a…z}, 0 (empty character), + morpheme boundary character, set of archi-phonemes e.g. S for {s, z} –items of text are expressed by a subset of this alphabet {a…z, 0}
19
Morphology S.Ananiadou19 We can build an FST morphological parser out of a morphotactic FSA and lexica by adding an extra lexical tape and the appropriate morphological features q0 Reg-noun-stem q1 +N: q4 +PL: ^s# q7 Irreg-sg-noun-f q2 q5 +SG:# +N: Irreg-pl-noun-f q3 q6 +PL:# +N:
20
Morphology S.Ananiadou20 Koskeniemmi’s work In this model, all FST’s treating individual phenomena operate in parallel –so rule ordering and interactions between rules is not necessary part of morphological description –all FSTs share the same two heads but otherwise operate completely independently –heads move at the same time –to have an overall correspondence between lexical and surface string, two heads must have reached the end of two strings, and all the FSTs must be in a final state –when all FSTs agree, a correspondence is reached –if only one FST blocks while scanning the two strings then the proposed correspondence is rejected
21
Morphology S.Ananiadou21 ……… t r i e s ………… FST1 FST2 FST3 … FSTn ………… t r Y + s ….. text d o g 0 s surface tape FST dictionary lexical tape d o g + S sequence of mappings d,d o,o g,g 0,+ s,S the morpheme boundary + corresponds to nothing on the surface; the S archiphoneme / grapheme corresponds to surface s.
22
Morphology S.Ananiadou22 Koskenniemi’s rule formalism The general form of a rule is CP op LC --- RC –CP = correspondence part; this is a concrete or abstract character pair whose occurrence is restricted by the rule –op = an operator, one of four types; four types of rules –LC, RC = left context, right context The Rules Exclusion rule: a : b / LC - RC a may not be realised as b, in the context LC-RC a:b not allowed in given context
23
Morphology S.Ananiadou23 Context restriction rule a:b LC --RC a may be realised as b only in the given context, and nowhere else; a:b allowed in given context Surface coercion rule a:b LC-RC a must be realised as b in the given context; a:b required in given context Composite rule a:b LC--RC this rule is a combination of context restriction and surface coercion; a lexical a must correspond to surface b in the given context, and this correspondence is licit only in that context; a:b required in given context and nowhere else
24
Morphology S.Ananiadou24 Example of Koskenniemi’s rule formalism Treats epenthesis in English –Epenthesis: a morpheme boundary +, is realised as an ‘e’ on the surface when it follows ‘ch’, ‘sh’, ‘s’, ‘x’, ‘z’ or ‘y/i’ and occurs before an ‘s’. Otherwise the lexical character + corresponds to 0 on the surface (empty string) –foxes, churches, spies (+:e) –+/e { { c | s (h) } | S | y/i} --s CP op LC RC –CP, LC, RC consist of sequences of pairs, the first member of a pair drawn from the lexical alphabet, the second from the surface alphabet
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.