Download presentation
Presentation is loading. Please wait.
1
CPSC 503 Computational Linguistics
(Finish FSA) Morphology and Finite State Transducers Lecture 3 Giuseppe Carenini 5/10/2019 CPSC503 Spring 2004
2
Today 1/21 Finish: FSA (D and ND recognition) English Morphology
FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing FSA modelling English morphology can tell you whether a string is or is not an English word Morphological parsing from the (spelling of a word) to a morpheme and feature sequence 5/10/2019 CPSC503 Spring 2004
3
Assumed knowledge RegExp definition, basic ops, usage
FSA and table representation Differences between D and ND FSA Definitions of Formal Language and Generative Formalism 5/10/2019 CPSC503 Spring 2004
4
Recognition Example 3 3 5/10/2019 CPSC503 Spring 2004
5
FSA Recognition Algorithm (slide)
Assume input on a tape Start in the start state pointing at the beginning of the tape Repeat this process, until you run out of tape Examine the current input symbol Consult the table (If a transition is allowed) Go to a new state and update the tape pointer (Else Fail). Now, if you are in an accept state, accept the string, otherwise Fail If a transition is allowed … State of the algorithm is a machine state and a pointer to the input 5/10/2019 CPSC503 Spring 2004
6
Do not confuse Machine state: qi Algorithm/recognition state: (qi,n)
(machine state, position on the input tape) State in which the machine is after seeing n symbols what is the meaning of a recognition state? 5/10/2019 CPSC503 Spring 2004
7
ND Recognition (cannot use the same!)
Assume input on a tape Start in the start state pointing at the beginning of the tape Examine the current input symbol Consult the table (If a transition is allowed) Go to a new state and update the tape pointer (Else Fail). Repeat this process, until you run out of tape Now, if you are in an accept state accept the string otherwise Fail If a transition is allowed … State of the algorithm is a machine state and a pointer to the input 5/10/2019 CPSC503 Spring 2004
8
Non-Deterministic Recognition Key ideas
Transition can lead to multiple states The algorithm may need to explore all of them Save alternative states in an agenda Explore alternatives one at the time. For deterministic: if there is a path trough the machine that leads to a final state 5/10/2019 CPSC503 Spring 2004
9
Recognition Example NDFSA
-Transition can lead to multiple states - Explore one alternative at the time b a a a ! \ 5/10/2019 CPSC503 Spring 2004
10
search space for input baaa!
Recognition as Search search space for input baaa! You can think of the process I have described as a search in the space of reachable recognition states Do not confuse them with machine states They comprise a machine state and a pointer to the input tape State-Space Search 5/10/2019 CPSC503 Spring 2004
11
How is the agenda managed?
Stack -> LIFO -> depth-first Queue -> FIFO -> breath-first Agenda (LIFO) Agenda (FIFO) a a a bcde bcde b c e fghcde cdefgh d nghcde defgh Computer- science graduates Depth first – infinite loop Breath first – never terminate – large amount of memory f g h i l ghcde efghi m hcde fghilm n o ocde ghilmn accept …. 5/10/2019 CPSC503 Spring 2004
12
End of Chp. 2 TO DO: read 2.3 Regular Languages
How to build the FSA for a RegExp 5/10/2019 CPSC503 Spring 2004
13
Summary State Machines (no prob.)
Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners Finite-state methods are particularly useful in dealing with the lexicon (words) So we’ll switch to talking about some facts about words and then come back to computational methods 5/10/2019 CPSC503 Spring 2004
14
English Morphology Def. The study of how words are formed from minimal meaning-bearing units (morphemes) We can usefully divide morphemes into two classes Stems: The core meaning bearing units Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions Each one come up with a word and say what the stem is Un-happy unbeliveably Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes Example: unhappily 5/10/2019 CPSC503 Spring 2004
15
Word Classes For now word classes: nouns, verbs, adjectives and adverbs. We’ll go into the gory details in Ch 8 Word class determines to a large degree the way that stems and affixes combine we have in mind familiar notions like Right now we’re concerned with word classes because the way that stems and affixes combine is based to a large degree on the word class of the stem 5/10/2019 CPSC503 Spring 2004
16
English Morphology We can also divide morphology up into two broad classes Inflectional Derivational 5/10/2019 CPSC503 Spring 2004
17
Inflectional Morphology
The resulting word: Has the same word class as the original Serves a grammatical/semantic purpose different from the original Inflectional morphology concerns the combination of stems and affixes where the resulting word 5/10/2019 CPSC503 Spring 2004
18
Nouns, Verbs and Adjectives (English)
Nouns are simple (not really) Markers for plural and possessive Verbs are only slightly more complex Markers appropriate to the tense of the verb Adjectives Markers for comparative and superlative 5/10/2019 CPSC503 Spring 2004
19
Regulars and Irregulars
Some words misbehave (refuse to follow the rules) Mouse/mice, goose/geese, ox/oxen Go/went, fly/flew The terms regular and irregular will be used to refer to words that follow the rules and those that don’t. Ok so it gets a little complicated by the fact that 5/10/2019 CPSC503 Spring 2004
20
Regular and Irregular Verbs
Regulars… Walk, walks, walking, walked, walked Irregulars Eat, eats, eating, ate, eaten Catch, catches, catching, caught, caught Cut, cuts, cutting, cut, cut 5/10/2019 CPSC503 Spring 2004
21
Derivational Morphology
Derivational morphology is the messy stuff that no one ever taught you. Changes of word class Less Productive ( -ant V -> N only with V of Latin origin!) More complex than inflection for a number of reasons Less productive (you cannot add and “able” to any verb) many more exception than inflectional workerwriter 5/10/2019 CPSC503 Spring 2004
22
Derivational Examples
Verb/Adj to Noun -ation computerize computerization -ee appoint appointee -er kill killer -ness fuzzy fuzziness 5/10/2019 CPSC503 Spring 2004
23
Derivational Examples
Noun/Verb to Adj -al Computation Computational -able Embrace Embraceable -less Clue Clueless - al can also be V-> N 5/10/2019 CPSC503 Spring 2004
24
Compute Many paths are possible… Start with compute
Computer -> computerize -> computerization Computation -> computational Computer -> computerize -> computerizable Compute -> computee 5/10/2019 CPSC503 Spring 2004
25
Summary State Machines (no prob.)
Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners Finite-state methods are particularly useful in dealing with the lexicon (words) So we’ll switch to talking about some facts about words and then come back to computational methods 5/10/2019 CPSC503 Spring 2004
26
FSAs and Morphology GOAL1: recognize whether a string is an English word PLAN: First we’ll capture the morphotactics (the rules governing the ordering of affixes in a language) Then we’ll add in the actual stems Goal: recognize whether a string is an English word First we’ll capture… We could list all of them and just perform a linear search What is the problem with this idea 5/10/2019 CPSC503 Spring 2004
27
FSA for Portion of N Inflectional Morphology
Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) ending –y preceded by a consonant change the –y to -i 5/10/2019 CPSC503 Spring 2004
28
Adding the Stems But it does not express that:
We expand each arc with the morphemes that make up the set… The resulting FSA is defined at the level of words This incorrectly accept foxs we will see later on how to deal with this But it does not express that: Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) Reg nouns ending –y preceded by a consonant change the –y to -i 5/10/2019 CPSC503 Spring 2004
29
Small Fragment of V and N Derivational Morphology
[nouni] eg. hospital [adjal] eg. formal [adjous] eg. arduous [verbj] eg. speculate [verbk] eg. conserve An FSA for a small portion fragment of English derivational morphology It gives an idea of its extreme complexity! (there are exceptions to many of these constraints!) Some models of English derivation are based on the more complex context-free grammar 5/10/2019 CPSC503 Spring 2004
30
GOAL2: Morphological Parsing/Generation (vs. Recognition)
Recognition is usually not quite what we need. Usually given a word we need to find: the stem and its class and properties (parsing) Or we have a stem and its class and properties and we want to produce the word (production/generation) Example (parsing) From “cats” to “cat +N +PL” From “lies” to …… lie +N +PL or lie +V +3SG 5/10/2019 CPSC503 Spring 2004
31
Finite State Transducers
FSA cannot help…. The simple story Add another tape Add extra symbols to the transitions On one tape we read “cats”, on the other we write “cat +N +PL” 5/10/2019 CPSC503 Spring 2004
32
FSTs parsing generation 5/10/2019 CPSC503 Spring 2004
33
Simple Example c:c a:a t:t +N: ε +PL:s Transitions:
c:c means read a c on one tape and write a c on the other +N:ε means read a +N symbol on one tape and write nothing on the other +PL:s means read +PL and write an s Or the opposite ! 5/10/2019 CPSC503 Spring 2004
34
Next Time Finish FST and morphological analysis Porter Stemmer
Review of probability and information theory Read 2.3 and Chp. 3 ! 5/10/2019 CPSC503 Spring 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.