Presentation is loading. Please wait.

Presentation is loading. Please wait.

October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.

Similar presentations


Presentation on theme: "October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing."— Presentation transcript:

1 October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

2 October 2006Advanced Topics in NLP2 Acknowledgement This lecture is largely based on material from Jurafsky & Martin chapter 3

3 October 2006Advanced Topics in NLP3 Resumé FSAs are equivalent to regular languages FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels. We can use FSTs to transduce between surface and lexical levels.

4 October 2006Advanced Topics in NLP4 Morphological Parsing Given the input cats, we’d like to output cat +N +Pl, telling us that cat is a plural noun. Given the Spanish input bebo, we’d like to output beber +V +PInd +1P +Sg telling us that bebo is the present indicative first person singular form of the Spanish verb beber, ‘to drink’.

5 October 2006Advanced Topics in NLP5 Two-Level Paradigm from Jurafsky & Martin

6 October 2006Advanced Topics in NLP6 English Plural surfacelexical catcat+N+Sg catscat+N+Pl foxesfox+N+Pl micemouse+N+Pl sheepsheep+N+Pl sheep+N+Sg

7 October 2006Advanced Topics in NLP7 Morphological Anlayser To build a morphological analyser we need: lexicon: the list of stems and affixes, together with basic information about them morphotactics: the model of morpheme ordering (eg English plural morpheme follows the noun rather than a verb) orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies)

8 October 2006Advanced Topics in NLP8 Lexicon & Morphotactics Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms. For this to be possible the word parts must first be classified into sublexicons. The FSA defines the morphotactics (ordering constraints).

9 October 2006Advanced Topics in NLP9 Sublexicons to classify the list of word parts reg-nounirreg-pl- noun irreg-sg- noun plural catmicemouse-s foxsheep geesegoose

10 October 2006Advanced Topics in NLP10 FSA Expresses Morphotactics (ordering model)

11 October 2006Advanced Topics in NLP11 Towards the Analyser We can use lexc or xfst to build such an FSA (see lex1.lexc) To augment this to produce an analysis we must create a transducer T num which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English.

12 October 2006Advanced Topics in NLP12 Three Levels of Analysis

13 October 2006Advanced Topics in NLP13 1. T num : Noun Number Inflection multi-character symbols morpheme boundary ^ word boundary #

14 October 2006Advanced Topics in NLP14 Towards the Analyser We do this by first allowing the lexicon itself to also have two levels. Since surface geese maps to lexical goose, the new lexical entry will be “g:g o:e o:e s:s e:e” (see lex2.lexc) We must also add the appropriate morphological labels (see lex3.lexc)

15 October 2006Advanced Topics in NLP15 Intermediate Form to Surface The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g. cat^s  cats fox^s  foxes fly^s  flies The rules which describe these changes are called orthographic rules or "spelling rules".

16 October 2006Advanced Topics in NLP16 More English Spelling Rules consonant doubling: beg / begging y replacement: try/tries k insertion: panic/panicked e deletion: make/making e insertion: watch/watches Each rule can be stated in more detail...

17 October 2006Advanced Topics in NLP17 Spelling Rules Chomsky & Halle (1968) invented a special notation for spelling rules. A very similar notation is embodied in the "conditional replacement" rules of xfst. E -> F || L _ R which means replace E with F when it appears between left context L and right context R

18 October 2006Advanced Topics in NLP18 A Particular Spelling Rule This rule does e-insertion ^ -> e || x _ s#

19 October 2006Advanced Topics in NLP19 e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels

20 October 2006Advanced Topics in NLP20 e insertion as an FST

21 October 2006Advanced Topics in NLP21 Incorporating Spelling Rules Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". The set of spelling rules is positioned between the surface level and the intermediate level. Parallel execution of FSTs can be carried out: –by simulation: in this case FSTs must first be aligned. –by first constructing a a single FST corresponding to their intersection.

22 October 2006Advanced Topics in NLP22 Putting it all together execution of FST i takes place in parallel

23 October 2006Advanced Topics in NLP23 Kaplan and Kay The Xerox View FSTi are aligned but separate FSTi intersected together


Download ppt "October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing."

Similar presentations


Ads by Google