Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/9/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.

Similar presentations


Presentation on theme: "6/9/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."— Presentation transcript:

1 6/9/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini

2 6/9/2015CPSC503 Winter 20092 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

3 6/9/2015CPSC503 Winter 20093 Today Sep 18 Dealing with spelling errors –Noisy channel model –Bayes rule applied to Noisy channel model (single and multiple spelling errors) Min Edit Distance ? Start n-grams models: Language Models

4 6/9/2015CPSC503 Winter 20094 Background knowledge Morphological analysis P(x) (prob. distribution) joint p(x,y) conditional p(x|y) Bayes rule Chain rule

5 6/9/2015CPSC503 Winter 20095 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, fun,... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild dig. Find the most likely substitution word in this context

6 6/9/2015CPSC503 Winter 20096 Spelling: Data.05% -3% - 38% 80% of misspelled words, single error –insertion (toy -> tony) –deletion (tuna -> tua) –substitution (tone -> tony) –transposition (length -> legnth) Types of errors –Typographic (more common, user knows the correct spelling… the -> rhe) –Cognitive (user doesn’t know…… piece -> peace)

7 6/9/2015CPSC503 Winter 20097 Noisy Channel An influential metaphor in language processing is the noisy channel model Special case of Bayesian classification signal noisy signal

8 6/9/2015CPSC503 Winter 20098 Goal: Find the most likely word given some observed (misspelled) word Bayes and the Noisy Channel: Spelling Non-word isolated

9 6/9/2015CPSC503 Winter 20099 Problem P(w|O) is hard/impossible to get (why?) P(wine|winw)=

10 6/9/2015CPSC503 Winter 200910 Solution 1. Apply Bayes Rule 2. Simplify priorlikelihood

11 6/9/2015CPSC503 Winter 200911 Estimate of prior P(w) (Easy) smoothing Always verify…

12 6/9/2015CPSC503 Winter 200912 Estimate of P(O|w) is feasible (Kernighan et. al ’90) For one-error misspelling: Estimate the probability of each possible error type e.g., insert a after c, substitute f with h P(O|w) equal to the probability of the error that generated O from w e.g., P( cbat| cat) = P(insert b after c)

13 6/9/2015CPSC503 Winter 200913 Estimate P(error type) (e.g substitution: sub[x,y]) and count matrix ……… a b c d … abc 5 8 8 15 #Times b was incorrectly used for a Large corpus compute confusion matrices Count(a)= # of a in corpus

14 6/9/2015CPSC503 Winter 200914 Corpus: Example … On 16 January, he sais [sub[i,y] 3] that because of astronaut safety tha [del[a,t] 4] would be no more space shuttle missions to miantain [tran[a,i] 2] and upgrade the orbiting telescope……..

15 6/9/2015CPSC503 Winter 200915 Final Method single error (1) Given O, collect all the w i that could have generated O by one error. E.g., O=acress => w 1 = actress (t deletion), w 2 = across (sub o with e), … … (3) Sort and display top-n to user word prior Probability of the error generating O from w 1 (2) For all the w i compute: How to do (1): Generate all the strings that could have generated O by one error (how?). Keep the words

16 6/9/2015CPSC503 Winter 200916 Example: O = acress …stellar and versatile acress whose… _ _ _ _ _ 1988 AP newswire corpus 44 million words

17 6/9/2015CPSC503 Winter 200917 Evaluation “correct” system 0 1 2 other

18 6/9/2015CPSC503 Winter 200918 Corpora: issues to remember Zero counts in the corpus: Just because an event didn’t happen in the corpus doesn’t mean it won’t happen e.g., cress has not really zero probability Getting a corpus that matches the actual use. e.g., Kids don’t misspell the same way that adults do

19 6/9/2015CPSC503 Winter 200919 Multiple Spelling Errors (BEFORE) Given O collect all the w i that could have generated O by one error……. (NOW) Given O collect all the w i that could have generated O by 1..k errors How? (for two errors): Collect all the strings that could have generated O by one error, then collect all the w i that could have generated one of those strings by one error Etc.

20 6/9/2015CPSC503 Winter 200920 Final Method multiple errors (1) Given O, for each w i that can be generated from O by a sequence of edit operations EdOp i,save EdOp i. (3) Sort and display top-n to user word prior Probability of the errors generating O from w i (2) For all the w i compute:

21 6/9/2015CPSC503 Winter 200921 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, funnel... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild dig. Find the most likely sub word in this context

22 6/9/2015CPSC503 Winter 200922 Real Word Spelling Errors Collect a set of common sets of confusions: C={C 1.. C n } e.g.,{(Their/they’re/there), (To/too/two), (Weather/whether), (lave, have)..} Whenever c’  C i is encountered Compute the probability of the sentence in which it appears Substitute all c  C i (c ≠ c’) and compute the probability of the resulting sentence Choose the higher one

23 Want to play with Spelling Correction: minimal noisy channel model implementation (Python) http://www.norvig.com/spell-correct.html 6/9/2015CPSC503 Winter 200923 By the way Peter Norvig is Director of Research at Google Inc.

24 6/9/2015CPSC503 Winter 200924 Today Sep 18 Dealing with spelling errors –Noisy channel model –Bayes rule applied to Noisy channel model (single and multiple spelling errors) Min Edit Distance ? Start n-grams models: Language Models

25 6/9/2015CPSC503 Winter 200925 Minimum Edit Distance Def. Minimum number of edit operations (insertion, deletion and substitution) needed to transform one string into another. gumbo gumb gum gam delete o delete b substitute u by a

26 6/9/2015CPSC503 Winter 200926 Minimum Edit Distance Algorithm Dynamic programming (very common technique in NLP) High level description: –Fills in a matrix of partial comparisons –Value of a cell computed as “simple” function of surrounding cells –Output: not only number of edit operations but also sequence of operations

27 6/9/2015CPSC503 Winter 200927 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

28 6/9/2015CPSC503 Winter 200928 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

29 6/9/2015CPSC503 Winter 200929 Min edit distance and alignment See demo

30 6/9/2015CPSC503 Winter 200930 Today Sep 18 Dealing with spelling errors –Noisy channel model –Bayes rule applied to Noisy channel model (single and multiple spelling errors) Min Edit Distance ? Start n-grams models: Language Models

31 6/9/2015CPSC503 Winter 200931 Key Transition Up to this point we’ve mostly been discussing words in isolation Now we’re switching to sequences of words And we’re going to worry about assigning probabilities to sequences of words

32 6/9/2015CPSC503 Winter 200932 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

33 6/9/2015CPSC503 Winter 200933 Only Spelling? A.Assign a probability to a sentence Part-of-speech tagging Word-sense disambiguation Probabilistic Parsing B.Predict the next word Speech recognition Hand-writing recognition Augmentative communication for the disabled ABAB Impossible to estimate 

34 6/9/2015CPSC503 Winter 200934 Decompose: apply chain rule Chain Rule: Applied to a word sequence from position 1 to n:

35 6/9/2015CPSC503 Winter 200935 Example Sequence “The big red dog barks” P(The big red dog barks)= P(The) * P(big|the) * P(red|the big)* P(dog|the big red)* P(barks|the big red dog) Note - P(The) is better expressed as: P(The| ) written as P(The| )

36 6/9/2015CPSC503 Winter 200936 Not a satisfying solution  Even for small n (e.g., 6) we would need a far too large corpus to estimate: Markov Assumption: the entire prefix history isn’t necessary. unigram bigram trigram

37 6/9/2015CPSC503 Winter 200937 Prob of a sentence: N-Grams unigram bigram trigram

38 6/9/2015CPSC503 Winter 200938 Bigram The big red dog barks P(The big red dog barks)= P(The| ) * P(big|the) * P(red|big)* P(dog|red)* P(barks|dog) Trigram?

39 6/9/2015CPSC503 Winter 200939 Estimates for N-Grams bigram..in general

40 6/9/2015CPSC503 Winter 200940 Next Time Finish N-Grams (Chp. 4) Model Evaluation (sec. 4.4) No smoothing 4.5-4.7 Start Hidden Markov-Model


Download ppt "6/9/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."

Similar presentations


Ads by Google