A BAYESIAN APPROACH TO SPELLING CORRECTION. ‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering.

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/25.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Normalizing Microtext Zhenzhen Xue, Dawei Yin and Brian D. Davison Lehigh University.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Pattern Recognition and Machine Learning
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
Midterm Review CS4705 Natural Language Processing.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
CS 4705 Probabilistic Approaches to Pronunciation and Spelling.
CS 4705 Lecture 5 Probabilistic Approaches to Pronunciation and Spelling.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
CS276 – Information Retrieval and Web Search Checking in. By the end of this week you need to have: Watched the online videos corresponding to the first.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning, Pandu Nayak.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
LING 438/538 Computational Linguistics
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong. Administrivia Next Monday – guest lecture from Dr. Jerry Ball of the Air Force Research Labs to be continued.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
1 CSA4050: Advanced Topics in NLP Spelling Models.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
CS276: Information Retrieval and Web Search
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Chapter 5: Spell Checking
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
12/7/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-SPECIFIC VOICE TRANSCRIPTION Curry Guinn Dave Crist.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Autumn Web Information retrieval (Web IR) Handout #3:Dictionaries and tolerant retrieval Mohammad Sadegh Taherzadeh ECE Department, Yazd University.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Using the Web for Language Independent Spellchecking and Auto correction Authors: C. Whitelaw, B. Hutchinson, G. Chung, and G. Ellis Google Inc. Published.
January 2012Spelling Models1 Human Language Technology Spelling Models.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong 1.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
Tolerant Retrieval Some of these slides are based on Stanford IR Course slides at 1.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
Spelling Correction and the Noisy Channel Real-Word Spelling Correction.
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Statistical NLP: Lecture 13
Sections 6-4 & 7-5 Estimation and Inferences Variation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CS621/CS449 Artificial Intelligence Lecture Notes
CS4705 Natural Language Processing
CSA3180: Natural Language Processing
CPSC 503 Computational Linguistics
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture 13 Corpus Linguistics I CS 4705.
CPSC 503 Computational Linguistics
CS276: Information Retrieval and Web Search
CS621/CS449 Artificial Intelligence Lecture Notes
Presentation transcript:

A BAYESIAN APPROACH TO SPELLING CORRECTION

‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering an ‘original signal’ distorted by a `noisy channel’: – Speech recognition – Spelling correction – OCR / handwriting recognition – (less felicitously perhaps): pronunciation variation This metaphor has provided the justification for the Bayesian approach to statistical NLP,which has found application also outside these application areas

Spelling Errors They are leaving in about fifteen minuets to go to her house The study was conducted mainly be John Black. The design an construction of the system will take more than one year. Hopefully, all with continue smoothly in my absence. Can they lave him my messages? I need to notified the bank of this problem. He is trying to fine out.

Handwriting recognition From Woody Allen’s Take the Money and Run (1969) – Allen (a bank robber), walks up to the teller and hands her a note that reads. "I have a gun. Give me all your cash." The teller, however, is puzzled, because he reads "I have a gub." "No, it's gun", Allen says. "Looks like 'gub' to me," the teller says, then asks another teller to help him read the note, then another, and finally everyone is arguing over what the note means.

Spelling errors How common are spelling errors? –.005% in carefully edited newswire – 1-3% in `normal’ human written text – 20% of web queries are misspelled (Google includes spelling correction algorithms) – 38% in applications like directory lookup – Handwriting recognition errors: Apple Newton: 2-3%

Types of spelling errors Damerau (1964): 80% of all misspelled words (non- word errors) caused by SINGLE-ERROR MISSPELLINGS: – INSERTION: the  ther – DELETION: the  th – SUBSTITUTION: the  thw – TRANSPOSITION: the  hte

Dealing with spelling errors (Kukich, 1992) 3 increasingly broader problems: – NON-WORD ERROR DETECTION: ‘graffe’ instead of ‘giraffe’ – ISOLATED WORD-ERROR CORRECTION: replacing ‘graffe’ with ‘giraffe’ without looking at context – CONTEXT-DEPENDENT ERROR DETECTION / CORRECTION: detecting also spelling errors that result in a real world

Detecting non-word errors: Dictionaries Peterson, 1986: large dictionaries may do more damage than good – wont – veery Damerau and Mays (1989): no evidence this was the case

The Noisy Channel model

Bayesian inference `Bayesian inference’ is the name given to techniques typically used in diagnostics to identify the CAUSE of certain OBSERVATIONS The name ‘Bayesian’ comes from the fact that Bayes’ rule is used to ‘turn around’ a problem: from one of finding statistics about the posterior probability of the CAUSE to one of finding the posterior probability of the OBSERVATIONS

Bayesian inference: the equations (These are equations that we will encounter again and again for different tasks) The statistical formulation of the problem of finding the most likely `explanation’ for the observation: Using Bayes’ Rule, this probability can be `turned around’:

Bayesian equations, 2 Some of these quantities are easy to compute, but others much less so – especially P(O) Fortunately, we don’t really need to compute this term!! (It’s the same for ALL `explanations’) This equation is a pattern that we will encounter again and again.

Applying the Bayesian Method to Spelling: Kernigham et al, 1990 correct takes words rejected by spell and generates a list of potential correct words Two steps: 1. Proposing candidate corrections 2. Scoring the candidates An example of isolated word-error correction

Proposing candidate corrections The noisy channel assumption: misspelled word the result of a `noisy channel’ – the typist performing a single MISTYPING OPERATION Four possible operations: – INSERTION: x  xy – DELETION: xy  x – SUBSTITUTION: y  x – REVERSAL: xy  yx At most one operation involved (cfr. Damerau, 1964)

Example: acress ErrorCorrectionCorr letterError letter PositionType acressactresst-2del acresscress-a0ins acresscaresscaac0transp acressaccesscr2sub acressacrossoe3sub acressacres-s5ins

Scoring the candidates Choose the correction with the highest probability: P(c): MLE estimation in a 44M words corpus, with smoothing (Good-Turing) cfreq(c)p(c) actress cress caress access

A simplification THE TRAINING CORPUS: acress actress actress acress acres acres acres WOULD WANT: Likelihoods: P(acress|actress) = 1/3 P(acress|acress) = ¼ APPROXIMATE WITH: P(acress|actress) = del[ct,c] / count[ct] = 1/3 (?)

Confusion matrices Difficult to compute directly, but can be estimated by looking at LOCAL FACTORS only Entry [m,n] in a CONFUSION MATRIX for SUBSTITUTION will tell us how often n is used instead of m Kernighan et al used four confusion matrices: – del[x,y] (number of times x is typed instead of correct xy) – ins[x,y] (number of times xy is typed instead of correct x) – sub[x,y] (number of times y is typed instead of correct x) – trans[x,y] (number of times yx is typed instead of correct xy)

Estimating the likelihood of a typo

Resulting likelihoods cFreq(c)P(c)P(t|c)P(t|c)p(c)% actress x10-937% cress x10-120% caress x10-130% access x10-110% across x10-918% acres x10-921% acres x10-923%

Evaluation (3 judges, 329 triples) MethodResultsPerc. Correct correct 286/32987±1.9% no prior263/32980 ±2.2% no channel247/32975 ±2.4% neither172/32952 ±2.8 Judge 1271/27399 ±0.5 Judge 2271/27599 ±0.7 Judge 3271/28196 ±1.1

More sophisticated methods MINIMUM EDIT DISTANCE: allow for the possibility of more than one problem N-GRAM models: use context (detect ‘real words’)

References Jurafsky and Martin, chapter 5 Kernighan, M. D., Church, K. W., and Gale, W. A. (1990). A spelling correction method based on a noisy channel model. COLING-90, Karen Kukich (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), More recent work: – Brill, E. and Moore, R. An improved error model for noisy channel spelling correction Proc. ACL 2000