Download presentation
Presentation is loading. Please wait.
Published byFrederick Webster Modified over 9 years ago
1
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model
2
MORPHOLOGY AND FSTS 2
3
FST as Translator FR: ce bill met de le baume sur une blessure EN: this bill puts balm on a sore wound 3 Last Class
4
FST Application Examples Case folding: –He said he said Tokenization: –“He ran.” “ He ran. “ POS tagging: –They can fish PRO VERB NOUN 4
5
FST Application Examples Pronunciation: –B AH T EH R B AH DX EH R Morphological generation: –Fox s Foxes Morphological analysis: –cats cat s 5
6
Roadmap Motivation: –Representing words A little (mostly English) Morphology Stemming 6
7
The Lexicon Goal: Represent all the words in a language Approach? 7
8
The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? 8
9
The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished 9
10
The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished Other languages? 10
11
The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished Other languages? –Wildly impractical »Turkish: 40,000 forms/verb; uygarlas¸tıramadıklarımızdanmıs¸sınızcasına “(behaving) as if you are among those whom we could not civilize” 11
12
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes 12
13
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. 13
14
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix 14
15
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible 15
16
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking 16
17
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog) 17
18
Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible impossible Suffix: e.g., walk walking Infix: e.g., hingi humingi (Tagalog) Circumfix: e.g., sagen gesagt (German) 18
19
Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports 19
20
Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… 20
21
Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… How can we match? 21
22
Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… How can we match? –Convert surface forms to common base form Stemming or morphological analysis 22
23
Two Perspectives Stemming: –writing 23
24
Two Perspectives Stemming: –writing write (or writ) –Beijing 24
25
Two Perspectives Stemming: –writing write (or writ) –Beijing Beije Morphological Analysis: 25
26
Two Perspectives Stemming: –writing write (or writ) –Beijing Beije Morphological Analysis: –writing write+V+prog 26
27
Two Perspectives Stemming: –writing write (or writ) –Beijing Beije Morphological Analysis: –writing write+V+prog –cats cat + N + pl –writes write+V+3rdpers+Sg 27
28
Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising televise
29
Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising televise Most popular: Porter stemmer
30
Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising televise Most popular: Porter stemmer Task: Given surface form, produce base form –Typically, removes suffixes
31
Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising televise Most popular: Porter stemmer Task: Given surface form, produce base form –Typically, removes suffixes Model: –Rule cascade –No lexicon!
32
Stemming Used in many NLP/IR applications For building equivalence classes Connect Connected Connecting Connection Connections Porter Stemmer, simple and efficient Website: http://www.tartarus.org/~martin/PorterStemmer http://www.tartarus.org/~martin/PorterStemmer On patas: ~/dropbox/12-13/570/porter Same class; suffixes irrelevant 32
33
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2
34
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε
35
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE
36
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing
37
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes
38
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros:
39
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros: Simple, fast, buildable for a variety of languages Cons:
40
Porter Stemmer Rule cascade: –Rule form: (condition) PATT1 PATT2 E.g. stem contains vowel, ING -> ε ATIONAL ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros: Simple, fast, buildable for a variety of languages Cons: Overaggressive and underaggressive
41
STEMMING & EVAL 41
42
Evaluating Performance Measures of Stemming Performance rely on similar metrics used in IR: –Precision: measure of the proportion of selected items the system got right precision = tp / (tp + fp) # of correct answers / # of answers given –Recall: measure of the proportion of the target items the system selected recall = tp / (tp + fn) # of correct answers / # of possible correct answers –Rule of thumb: as precision increases, recall drops, and vice versa Metrics widely adopted in Stat NLP 42
43
Precision and Recall Take a given stemming task –Suppose there are 100 words that could be stemmed –A stemmer gets 52 of these right (tp) –But it inadvertently stems 10 others (fp) Precision = 52 / (52 + 10) =.84 Recall = 52 / (52 + 48) =.52 43
44
Precision and Recall Take a given stemming task –Suppose there are 100 words that could be stemmed –A stemmer gets 52 of these right (tp) –But it inadvertently stems 10 others (fp) Precision = 52 / (52 + 10) =.84 Recall = 52 / (52 + 48) =.52 Note: easy to get precision of 1.0. Why? 44
45
45
46
WEIGHTED AUTOMATA & MARKOV CHAINS
47
PFA Definition A Probabilistic Finite-State Automaton is a 6-tuple: –A set of states Q –An alphabet Σ –A set of transitions: δsubset Q x Σ x Q –Initial state probabilities: Q R + –Transition probabilities: δ R + –Final state probabilities: Q R +
48
PFA Recap Subject to constraints: Computing sequence probabilities
49
PFA Example Example –I(q 0 )=1 –I(q 1 )=0 –F(q 0 )=0 –F(q 1 )=0.2 –P(q 0,a,q 1 )=1; P(q 1,b,q 1 ) =0.8 –P(ab n ) = I(q 0 )*P(q 0,a,q 1 )*P(q 1,b,q 1 ) n *F(q 1 ) – = 0.8 n *0.2
50
Markov Chain A Markov Chain is a special case of a PFA in which the sequence uniquely determines which states the automaton will go through. Markov Chains can not represent inherently ambiguous problems –Can assign probability to unambiguous sequences
51
Markov Chain for Words
52
Markov Chain for Pronunciation Observations: 0/1
53
Markov Chain for Walking through Groningen
54
Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t
55
Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A
56
Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A Distinguished start and final states –q 0,q F
57
Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A Distinguished start and final states –q 0,q F Current state only depends on previous state
58
Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … []
59
Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … What’s missing? []
60
Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … What’s missing? Starting probabilities. []
61
Markov Models Exercise –Build the transition probability matrix over this set of data The duck died. The car killed the duck. The duck died under her car. We duck under the car. We retrieve the poor duck. –Build the starting probability matrix
62
Markov Models Exercise –Given your model, what’s the probability for each of the following sentences? The duck died under her car. We duck under the car. The duck under the car. We retrieve killed the duck. We the poor duck died. We retrieve the poor duck under the car. –For a given start state (The, We), what’s the most likely string (of the above)?
64
HMMs Next class
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.