6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.

6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini

6/9/2015CPSC503 Winter 20102 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

6/9/2015CPSC503 Winter 20103 Today Sep 21 Dealing with spelling errors –Noisy channel model –Bayes rule applied to Noisy channel model (single and multiple spelling errors) Min Edit Distance ? Start n-grams models: Language Models

6/9/2015CPSC503 Winter 20104 Background knowledge Morphological analysis P(x) (prob. distribution) joint P(x,y) conditional P(x|y) Bayes rule Chain rule

6/9/2015CPSC503 Winter 20105 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, fun,... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild dig. Find the most likely substitution word in this context

6/9/2015CPSC503 Winter 20106 Spelling: Data.05% -3% - 38% 80% of misspelled words, single error –insertion (toy -> tony) –deletion (tuna -> tua) –substitution (tone -> tony) –transposition (length -> legnth) Types of errors –Typographic (more common, user knows the correct spelling… the -> rhe) –Cognitive (user doesn’t know…… piece -> peace)

6/9/2015CPSC503 Winter 20107 Noisy Channel An influential metaphor in language processing is the noisy channel model Special case of Bayesian classification signal noisy signal

6/9/2015CPSC503 Winter 20108 Goal: Find the most likely word given some observed (misspelled) word Bayes and the Noisy Channel: Spelling Non-word isolated

6/9/2015CPSC503 Winter 20109 Problem P(w|O) is hard/impossible to get (why?) P(wine|winw)=

6/9/2015CPSC503 Winter 201010 Solution 1. Apply Bayes Rule 2. Simplify priorlikelihood

6/9/2015CPSC503 Winter 201011 Estimate of prior P(w) (Easy) smoothing Always verify…

6/9/2015CPSC503 Winter 201012 Estimate of P(O|w) is feasible (Kernighan et. al ’90) For one-error misspelling: Estimate the probability of each possible error type e.g., insert a after c, substitute f with h P(O|w) equal to the probability of the error that generated O from w e.g., P( cbat| cat) = P(insert b after c)

6/9/2015CPSC503 Winter 201013 Estimate P(error type) (e.g substitution: sub[x,y]) and count matrix ……… a b c d … abc 5 8 8 15 #Times b was incorrectly used for a Large corpus compute confusion matrices Count(a)= # of a in corpus

6/9/2015CPSC503 Winter 201014 Corpus: Example … On 16 January, he sais [sub[i,y] 3] that because of astronaut safety tha [del[a,t] 4] would be no more space shuttle missions to miantain [tran[a,i] 2] and upgrade the orbiting telescope……..

6/9/2015CPSC503 Winter 201015 Final Method single error (1) Given O, collect all the w i that could have generated O by one error. E.g., O=acress => w 1 = actress (t deletion), w 2 = across (sub o with e), … … (3) Sort and display top-n to user word prior Probability of the error generating O from w 1 (2) For all the w i compute: How to do (1): Generate all the strings that could have generated O by one error (how?). Keep the words

Example: collect all the w i that could have generated “acress” by one error. a c r e s s 6/9/2015CPSC503 Winter 201016 # of deletions # of transpositions # of alternations # of insertions

6/9/2015CPSC503 Winter 201017 Example: O = acress …stellar and versatile acress whose… _ _ _ _ _ 1988 AP newswire corpus 44 million words

6/9/2015CPSC503 Winter 201018 Evaluation “correct” system 0 1 2 other

6/9/2015CPSC503 Winter 201019 Corpora: issues to remember Zero counts in the corpus: Just because an event didn’t happen in the corpus doesn’t mean it won’t happen e.g., cress has not really zero probability Getting a corpus that matches the actual use. e.g., Kids don’t misspell the same way that adults do

6/9/2015CPSC503 Winter 201020 Multiple Spelling Errors (BEFORE) Given O collect all the w i that could have generated O by one error……. (NOW) Given O collect all the w i that could have generated O by 1..k errors How? (for two errors): Collect all the strings that could have generated O by one error, then collect all the w i that could have generated one of those strings by one error Etc.

6/9/2015CPSC503 Winter 201021 Final Method multiple errors (1) Given O, for each w i that can be generated from O by a sequence of edit operations EdOp i,save EdOp i. (3) Sort and display top-n to user word prior Probability of the errors generating O from w i (2) For all the w i compute:

6/9/2015CPSC503 Winter 201022 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, funnel... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild dig. Find the most likely sub word in this context

6/9/2015CPSC503 Winter 201023 Real Word Spelling Errors Collect a set of common sets of confusions: C={C 1.. C n } e.g.,{(Their/they’re/there), (To/too/two), (Weather/whether), (lave, have)..} Whenever c’  C i is encountered Compute the probability of the sentence in which it appears Substitute all c  C i (c ≠ c’) and compute the probability of the resulting sentence Choose the highest one

Want to play with Spelling Correction: minimal noisy channel model implementation (Python) http://www.norvig.com/spell-correct.html 6/9/2015CPSC503 Winter 201024 By the way Peter Norvig is Director of Research at Google Inc. (He will be visiting our dept. on Thurs!)

6/9/2015CPSC503 Winter 201026 Minimum Edit Distance Def. Minimum number of edit operations (insertion, deletion and substitution) needed to transform one string into another. gumbo gumb gum gam delete o delete b substitute u by a

6/9/2015CPSC503 Winter 201027 Minimum Edit Distance Algorithm Dynamic programming (very common technique in NLP) High level description: –Fills in a matrix of partial comparisons –Value of a cell computed as “simple” function of surrounding cells –Output: not only number of edit operations but also sequence of operations

6/9/2015CPSC503 Winter 201028 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0)) ed[i,j]

6/9/2015CPSC503 Winter 201029 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

6/9/2015CPSC503 Winter 201030 Min edit distance and alignment See demo

6/9/2015CPSC503 Winter 201032 Key Transition Up to this point we’ve mostly been discussing words in isolation Now we’re switching to sequences of words And we’re going to worry about assigning probabilities to sequences of words

6/9/2015CPSC503 Winter 201033 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

6/9/2015CPSC503 Winter 201034 Only Spelling? A.Assign a probability to a sentence Part-of-speech tagging Word-sense disambiguation Probabilistic Parsing B.Predict the next word Speech recognition Hand-writing recognition Augmentative communication for the disabled ABAB Impossible to estimate 

6/9/2015CPSC503 Winter 201035 Decompose: apply chain rule Chain Rule: Applied to a word sequence from position 1 to n:

6/9/2015CPSC503 Winter 201037 Not a satisfying solution  Even for small n (e.g., 6) we would need a far too large corpus to estimate: Markov Assumption: the entire prefix history isn’t necessary. unigram bigram trigram

6/9/2015CPSC503 Winter 201038 Prob of a sentence: N-Grams unigram bigram trigram

6/9/2015CPSC503 Winter 201040 Estimates for N-Grams bigram..in general

6/9/2015CPSC503 Winter 201041 Next Time N-Grams (Chp. 4) Model Evaluation (sec. 4.4) No smoothing 4.5-4.7

6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.

Similar presentations

Presentation on theme: "6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.

Similar presentations

Presentation on theme: "6/9/2015CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."— Presentation transcript:

Similar presentations

About project

Feedback