Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability.

Similar presentations


Presentation on theme: "Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability."— Presentation transcript:

1 Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability Models Algorithms for Automatic Correction Conclusion

2 Motivation and Formulation A set of words: the vocabulary  Single-word correction: Given any character string S that may or may not belong to , match S with the most likely word W in . Example:  = {is, are, am} iis  isae  arean  am

3 Motivation and Formulation Multiple-word correction: Given a series of character string S 1 S 2 …S m, each of which may or may not belong to , match them with the most likely word series W 1 W 2 …W m formed by words from . Example:  = {I, is, are, am} ii bn  I am

4 Motivation and Formulation Given a word w, what do we mean by the most likely word for w in  ?  Needs some probability models How to find the most likely word for w?  Needs to develop algorithms

5 Probability Models: Typical Typos Errors in the transition of mental states –Repeating characters: iis  is –Skipping characters: ae  are Mentally right, but the finger wrongly land in a nearby key –an  am

6 Probability Models The Word Model: for each word w, how do we probabilistically transition from one mental state of trying to type some character in the word to another.e.g. Ideally: a  r  e but things like: a  a  r  e a  ecould happen.

7 Probability Models The keyboard model : ( i.e. the acoustic model in speech recognition ) for a mental state of trying to type a character c in a word what is the probability distribution over the actual keys touched. e.g. Ideally: you want to type a  you touch a but you might touch b, q, z, s, w, x, …

8 Probability Models The Language Model: ( i.e. the sentence model ) How do we put words together to form sentences? The language model is not absolutely necessary for single-word correction, but it can further improve the accuracy and multiple-word correction by considering the context.

9 Probability Models The Language Model: ( i.e. the sentence model ) For example, a bigram language model shows how likely each individual word will appear in a sentence and how likely one word will follow another word. Such knowledge can help : e.g. you see two words: I an I an are much more likely generated from I am than from I a

10 Algorithms Calculate the probability of generating a character string S of s characters when trying to type a word W of w characters. O(sw 2 ) using dynamic programming O(w s ) using a naïve approach

11 Algorithms Single-word correction: Determine the most likely word from a vocabulary of v words (with maximally w characters per word) for a string S of s characters. O(vsw 2 ) using dynamic programming For each word W in the vocabulary, calculate the probability of generating S from W, weighted by individual word frequency, find the most like one.

12 Algorithms Multiple-word correction: Determine the most likely word series W 1 W 2 …W m of m words from a vocabulary of v words (with maximally w characters in each word there) for m strings S 1 S 2 …S m of (with maximally s characters in each string).

13 Conclusion Similar modeling and analysis applicable to speech recognition Mathematical structures provides powerful tools for modeling and analysis Design and analysis of algorithms important to real-world problem solving Mathematical structures and algorithms: two key components of modern AI research.


Download ppt "Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability."

Similar presentations


Ads by Google