Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.

Similar presentations


Presentation on theme: "Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model."— Presentation transcript:

1 Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model

2 MORPHOLOGY AND FSTS 2

3 FST as Translator FR: ce bill met de le baume sur une blessure EN: this bill puts balm on a sore wound 3 Last Class

4 FST Application Examples Case folding: –He said  he said Tokenization: –“He ran.”  “ He ran. “ POS tagging: –They can fish  PRO VERB NOUN 4

5 FST Application Examples Pronunciation: –B AH T EH R  B AH DX EH R Morphological generation: –Fox s  Foxes Morphological analysis: –cats  cat s 5

6 Roadmap Motivation: –Representing words A little (mostly English) Morphology Stemming 6

7 The Lexicon Goal: Represent all the words in a language Approach? 7

8 The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? 8

9 The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished 9

10 The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished Other languages? 10

11 The Lexicon Goal: Represent all the words in a language Approach? –Enumerate all words? Doable for English –Typical for ASR (Automatic Speech Recognition) –English is morphologically relatively impoverished Other languages? –Wildly impractical »Turkish: 40,000 forms/verb; uygarlas¸tıramadıklarımızdanmıs¸sınızcasına “(behaving) as if you are among those whom we could not civilize” 11

12 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes 12

13 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. 13

14 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix 14

15 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning- bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible  impossible 15

16 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible  impossible Suffix: e.g., walk  walking 16

17 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible  impossible Suffix: e.g., walk  walking Infix: e.g., hingi  humingi (Tagalog) 17

18 Morphological Parsing Goal: Take a surface word form and generate a linguistic structure of component morphemes A morpheme is the minimal meaning-bearing unit in a language. –Stem: the morpheme that forms the central meaning unit in a word –Affix: prefix, suffix, infix, circumfix Prefix: e.g., possible  impossible Suffix: e.g., walk  walking Infix: e.g., hingi  humingi (Tagalog) Circumfix: e.g., sagen  gesagt (German) 18

19 Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports 19

20 Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… 20

21 Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… How can we match? 21

22 Surface Variation & Morphology Searching (a la Bing) for documents about: –Televised sports Many possible surface forms: –Televised, television, televise,.. –Sports, sport, sporting,… How can we match? –Convert surface forms to common base form Stemming or morphological analysis 22

23 Two Perspectives Stemming: –writing  23

24 Two Perspectives Stemming: –writing  write (or writ) –Beijing 24

25 Two Perspectives Stemming: –writing  write (or writ) –Beijing  Beije Morphological Analysis: 25

26 Two Perspectives Stemming: –writing  write (or writ) –Beijing  Beije Morphological Analysis: –writing  write+V+prog 26

27 Two Perspectives Stemming: –writing  write (or writ) –Beijing  Beije Morphological Analysis: –writing  write+V+prog –cats  cat + N + pl –writes  write+V+3rdpers+Sg 27

28 Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising  televise

29 Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising  televise Most popular: Porter stemmer

30 Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising  televise Most popular: Porter stemmer Task: Given surface form, produce base form –Typically, removes suffixes

31 Stemming Simple type of morphological analysis Supports matching using base form e.g. Television, televised, televising  televise Most popular: Porter stemmer Task: Given surface form, produce base form –Typically, removes suffixes Model: –Rule cascade –No lexicon!

32 Stemming Used in many NLP/IR applications For building equivalence classes Connect Connected Connecting Connection Connections Porter Stemmer, simple and efficient Website: http://www.tartarus.org/~martin/PorterStemmer http://www.tartarus.org/~martin/PorterStemmer On patas: ~/dropbox/12-13/570/porter Same class; suffixes irrelevant 32

33 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2

34 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε

35 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE

36 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing

37 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes

38 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros:

39 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros: Simple, fast, buildable for a variety of languages Cons:

40 Porter Stemmer Rule cascade: –Rule form: (condition) PATT1  PATT2 E.g. stem contains vowel, ING -> ε ATIONAL  ATE –Rule partial order: Step1a: -s Step1b: -ed, -ing Step 2-4: derivational suffixes Step 5: cleanup Pros: Simple, fast, buildable for a variety of languages Cons: Overaggressive and underaggressive

41 STEMMING & EVAL 41

42 Evaluating Performance Measures of Stemming Performance rely on similar metrics used in IR: –Precision: measure of the proportion of selected items the system got right precision = tp / (tp + fp) # of correct answers / # of answers given –Recall: measure of the proportion of the target items the system selected recall = tp / (tp + fn) # of correct answers / # of possible correct answers –Rule of thumb: as precision increases, recall drops, and vice versa Metrics widely adopted in Stat NLP 42

43 Precision and Recall Take a given stemming task –Suppose there are 100 words that could be stemmed –A stemmer gets 52 of these right (tp) –But it inadvertently stems 10 others (fp) Precision = 52 / (52 + 10) =.84 Recall = 52 / (52 + 48) =.52 43

44 Precision and Recall Take a given stemming task –Suppose there are 100 words that could be stemmed –A stemmer gets 52 of these right (tp) –But it inadvertently stems 10 others (fp) Precision = 52 / (52 + 10) =.84 Recall = 52 / (52 + 48) =.52 Note: easy to get precision of 1.0. Why? 44

45 45

46 WEIGHTED AUTOMATA & MARKOV CHAINS

47 PFA Definition A Probabilistic Finite-State Automaton is a 6-tuple: –A set of states Q –An alphabet Σ –A set of transitions: δsubset Q x Σ x Q –Initial state probabilities: Q  R + –Transition probabilities: δ  R + –Final state probabilities: Q  R +

48 PFA Recap Subject to constraints: Computing sequence probabilities

49 PFA Example Example –I(q 0 )=1 –I(q 1 )=0 –F(q 0 )=0 –F(q 1 )=0.2 –P(q 0,a,q 1 )=1; P(q 1,b,q 1 ) =0.8 –P(ab n ) = I(q 0 )*P(q 0,a,q 1 )*P(q 1,b,q 1 ) n *F(q 1 ) – = 0.8 n *0.2

50 Markov Chain A Markov Chain is a special case of a PFA in which the sequence uniquely determines which states the automaton will go through. Markov Chains can not represent inherently ambiguous problems –Can assign probability to unambiguous sequences

51 Markov Chain for Words

52 Markov Chain for Pronunciation Observations: 0/1

53 Markov Chain for Walking through Groningen

54 Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t

55 Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A

56 Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A Distinguished start and final states –q 0,q F

57 Markov Chain: “First-order observable Markov Model” A set of states –Q = q 1, q 2 …q N; the state at time t is q t Transition probabilities: –a set of probabilities A = a 01 a 02 …a n1 …a nn. –Each a ij represents the probability of transitioning from state i to state j –The set of these is the transition probability matrix A Distinguished start and final states –q 0,q F Current state only depends on previous state

58 Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … []

59 Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … What’s missing? []

60 Markov Models The parameters of a MM can be arranged in matrices The A-matrix for the set of transition probabilities: p 11 p 12 …p 1j A =p 21 p 22 …p 2j … What’s missing? Starting probabilities. []

61 Markov Models Exercise –Build the transition probability matrix over this set of data The duck died. The car killed the duck. The duck died under her car. We duck under the car. We retrieve the poor duck. –Build the starting probability matrix

62 Markov Models Exercise –Given your model, what’s the probability for each of the following sentences? The duck died under her car. We duck under the car. The duck under the car. We retrieve killed the duck. We the poor duck died. We retrieve the poor duck under the car. –For a given start state (The, We), what’s the most likely string (of the above)?

63

64 HMMs Next class


Download ppt "Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model."

Similar presentations


Ads by Google