Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Indexing DNA Sequences Using q-Grams
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/25.
Spelling correction as an iterative process that exploits the collective knowledge of web users Silviu Cucerzan and Eric Brill July, 2004 Speaker: Mengzhe.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
A BAYESIAN APPROACH TO SPELLING CORRECTION. ‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 12: Sequence Analysis Martin Russell.
Midterm Review CS4705 Natural Language Processing.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
CS 4705 Probabilistic Approaches to Pronunciation and Spelling.
CS 4705 Lecture 5 Probabilistic Approaches to Pronunciation and Spelling.
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Dynamic Time Warping Applications and Derivation
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
CS276 – Information Retrieval and Web Search Checking in. By the end of this week you need to have: Watched the online videos corresponding to the first.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.
Principles of Pattern Recognition
LING 438/538 Computational Linguistics
Speech and Language Processing
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong. Administrivia Next Monday – guest lecture from Dr. Jerry Ball of the Air Force Research Labs to be continued.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
1 CSA4050: Advanced Topics in NLP Spelling Models.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Minimum Edit Distance Definition of Minimum Edit Distance.
Chapter 5: Spell Checking
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CSA3202 Human Language Technology HMMs for POS Tagging.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Autumn Web Information retrieval (Web IR) Handout #3:Dictionaries and tolerant retrieval Mohammad Sadegh Taherzadeh ECE Department, Yazd University.
January 2012Spelling Models1 Human Language Technology Spelling Models.
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong 1.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
CS 224S / LINGUIST 285 Spoken Language Processing
Automatic Speech Recognition
CS621/CS449 Artificial Intelligence Lecture Notes
CSA3180: Natural Language Processing
CPSC 503 Computational Linguistics
LECTURE 15: REESTIMATION, EM AND MIXTURES
CS621/CS449 Artificial Intelligence Lecture Notes
CSCI 5582 Artificial Intelligence
Presentation transcript:

Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189

Outline  Introduction  5.1 Dealing with Spelling Errors  5.2 Spelling Error Patterns  5.3 Detecting Non-Word Errors  5.4 Probabilistic Models  5.5 Applying the Bayesian Method to Spelling  5.6 Minimum Edit Distance  5.7 English Pronunciation Variation  5.8 The Bayesian Method for Pronunciation  5.9 Weighted Automata  5.10 Pronunciation in Humans

Introduction  Introduce the problems of detecting and correcting spelling errors  Summarize typical human spelling error patterns  The essential probabilistic architecture:  Bayes Rule  Noisy channel model  The essential algorithm  Dynamic programming  Viterbi algorithm  Minimum edit distance algorithm  Forword algorithm  Weighted automaton 3 / 40

5.1 Dealing with Spelling Errors (1/2)  The detection and correction of spelling error  integral part of modern word-processors  Applications in which even the individual letter aren’t guaranteed to be accurately identified  Optical character recognition (OCR)  On-line handwriting recognition  Detection and correction of spelling errors, mainly in typed text  OCR systems often  misread “D” as “O” or “ri” as “n”  producing ‘mis-spelled’ words like dension for derision 4 / 40

5.1 Dealing with Spelling Errors (2/2)  Kukich (1992) breaks the field down into three increasingly broader problems:  non-word error detection (graffe for giraffe)  isolated-word error correction (correcting graffe to giraffe)  context-dependent error detection and correction -there for three, dessert for desert, piece for peace 5 / 40

5.2 Spelling Error Patterns (1/2)  Single-error misspellings - Damerau (1964)  insertion: mistyping the as ther  deletion: mistyping the as th  substitution: mistyping the as thw  transposition: mistyping the as the  Kukich (1992) breaks down human typing error  Typographic errors (spell as speel)  Cognitive errors (separate as seperate) 6 / 40

5.2 Spelling Error Patterns (2/2)  OCR errors are usually grouped into five classes  substitutions (e →c)  multi-substitutions (m →rn, he →b)  space deletions or insertions  failures (u →~)   framing errors 7 / 40

5.3 Detecting Non-word Errors  Detecting non-word errors in text is done by the use of dictionary  dictionaries would need to be kept small  large dictionaries contain very rare words that resemble misspellings of other words 8 / 40

5.4 Probabilistic Models (1/3)  The intuition of the noisy channel model is to treat the surface form as an instance of the lexical form  to build a model of the channel so that we can figure out how it modified this “true” word and recover it  source of noise  variation in pronunciation, variation in the realization of phones, acoustic variation due to the channel 9 / 40

5.4 Probabilistic Models (2/3)  string of phones (say [ni])  word corresponds to this string of phones  consider all possible words  P (word | observation) is highest  (5.1)  : our estimate of the correct w  O : the observation sequence [ni]  function argmax x f(x) : the x such that f(x) is maximized 10 / 40

5.4 Probabilistic Models (3/3)  (5.2)  (5.3)  substituting (5.2) into (5.1) to get (5.3)  we can ignore P(O). Why?  (5.4)  P(w) is called the Prior probability  P(O|w) is called the likelihood 11 / 40

5.5 Applying the Bayesian Method to Spelling (1/5) 12 / 40

5.5 Applying the Bayesian Method to Spelling (2/5) 13 / 40

5.5 Applying the Bayesian Method to Spelling (3/5)  p(acress|across) → number of times that e was substituted for 0 in some large corpus of error  confusion matrix  a square 26 * 26 table  number of times one letter was incorrectly used instead of another  [o,e] in a substitution confusion matrix -count of times e was substitution for o 14 / 40

5.5 Applying the Bayesian Method to Spelling (4/5)  del[x,y] contains the number of times in the training set that the characters xy in the correct word were typed as x  ins[x,y] contains the number of times in the training set that the character x in the correct word was typed as xy  sub[x,y] the number of times that x was typed as y  trans[x,y] the number of times that xy was typed as yx 15 / 40

5.5 Applying the Bayesian Method to Spelling (5/5) 16 / 40

5.6 Minimum Edit Distance (1/6)  string distance - some metric of how alike two strings are to each other  minimum edit distance - the minimum number of editing operations needed to transform one string into another  operation - insertion, deletion, substitution  For example  the gap between intention and execution is five operation  trace, alignment, operation list (Figure 5.4.) 17 / 40

5.6 Minimum Edit Distance (2/6) 18 / 40

5.6 Minimum Edit Distance (3/6)  Levenshtein distance  assign a particular cost or weight to each of operations  simplest weighting factor  three operation has a cost of 1  Levenshtein distance between intention and execution is 5  alternate version - substitutions has a cost of 2 (why?)  The minimum edit distance is computed by dynamic programming 19 / 40

5.6 Minimum Edit Distance (4/6)  Dynamic programming  large problem can be solved by properly combining the solution to various subproblems  minimum edit distance for spelling error correction  Viterbi and the forward for speech recognition  CYK and Earley for parsing 20 / 40

5.6 Minimum Edit Distance (5/6) 21/ 40

5.6 Minimum Edit Distance (6/6) 22 / 40

5.8 The Bayesian Method for Pronunciation (1/6)  Bayesian algorithm can be used to solve what is often called the pronunciation subproblem in speech recognition  when [ni] occurs after the word I at the beginning of a sentence  investigation of the Switchboard corpus produces a total of 7 words  the, neat, need, new, knee, to, you (Chapter 4 참고 )  two components  candidate generation  candidate scoring 23 / 40

5.8 The Bayesian Method for Pronunciation (2/6)  Speech recognizers often use an alternative architecture, trading off speech for storage  each pronunciation is expanded in advance with all possible variants, which are then pre-stored with their scores  Thus there is no need for candidate generation  the word [ni] is simply stored with the list of words that can generate it 24 / 40

5.8 The Bayesian Method for Pronunciation (3/6)   y represents the sequence of phones  w represents the candidate word  it turns out that confusion matrices don't do as well for pronunciation  the changes in pronunciation between a lexical and surface form are much greater  probabilistic models of pronunciation variation include a lot more factors than a simple confusion matrix can include  One simple way to generate pronunciation likelihoods is via probabilistic rules 25 / 40

5.8 The Bayesian Method for Pronunciation (4/6)  a word-initial [δ] becomes [n] if the preceding word ended in [n] or sometimes [m]   ncout : number of times lexical [δ] is realized word initially by surface [n] when the previous word ends in a nasal  envcount : total number of times lexical [δ] occurs when the previous word ends in a nasal 26 / 40

5.8 The Bayesian Method for Pronunciation (5/6) 27/ 40

5.8 The Bayesian Method for Pronunciation (6/6)  Decision Tree Models of Pronunciation Variation 28 / 40

5.9 Weighted Automata (1/12)  Weighted Automata  simple augmentation of the finite automaton  each arc is associated with a probability  the probability on all the arcs leaving a node must sum to 1 29/ 40

5.9 Weighted Automata (2/12) 30 / 40

5.9 Weighted Automata (3/12) 31 / 40

5.9 Weighted Automata (4/12) 3 2/ 40

5.9 Weighted Automata (5/12) 3 3/ 40

5.9 Weighted Automata (6/12) 3 4/ 40

5.9 Weighted Automata (7/12) 35 / 40

5.9 Weighted Automata (8/12) 36 / 40

5.9 Weighted Automata (9/12) 37 / 40

5.9 Weighted Automata (10/12) 38 / 40

5.9 Weighted Automata (11/12) 39 / 40

5.9 Weighted Automata (12/12) 40 / 40