Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.

Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189

Outline  Introduction  5.1 Dealing with Spelling Errors  5.2 Spelling Error Patterns  5.3 Detecting Non-Word Errors  5.4 Probabilistic Models  5.5 Applying the Bayesian Method to Spelling  5.6 Minimum Edit Distance  5.7 English Pronunciation Variation  5.8 The Bayesian Method for Pronunciation  5.9 Weighted Automata  5.10 Pronunciation in Humans

Introduction  Introduce the problems of detecting and correcting spelling errors  Summarize typical human spelling error patterns  The essential probabilistic architecture:  Bayes Rule  Noisy channel model  The essential algorithm  Dynamic programming  Viterbi algorithm  Minimum edit distance algorithm  Forword algorithm  Weighted automaton 3 / 40

5.1 Dealing with Spelling Errors (1/2)  The detection and correction of spelling error  integral part of modern word-processors  Applications in which even the individual letter aren’t guaranteed to be accurately identified  Optical character recognition (OCR)  On-line handwriting recognition  Detection and correction of spelling errors, mainly in typed text  OCR systems often  misread “D” as “O” or “ri” as “n”  producing ‘mis-spelled’ words like dension for derision 4 / 40

5.1 Dealing with Spelling Errors (2/2)  Kukich (1992) breaks the field down into three increasingly broader problems:  non-word error detection (graffe for giraffe)  isolated-word error correction (correcting graffe to giraffe)  context-dependent error detection and correction -there for three, dessert for desert, piece for peace 5 / 40

5.2 Spelling Error Patterns (1/2)  Single-error misspellings - Damerau (1964)  insertion: mistyping the as ther  deletion: mistyping the as th  substitution: mistyping the as thw  transposition: mistyping the as the  Kukich (1992) breaks down human typing error  Typographic errors (spell as speel)  Cognitive errors (separate as seperate) 6 / 40

5.2 Spelling Error Patterns (2/2)  OCR errors are usually grouped into five classes  substitutions (e →c)  multi-substitutions (m →rn, he →b)  space deletions or insertions  failures (u →~)   framing errors 7 / 40

5.3 Detecting Non-word Errors  Detecting non-word errors in text is done by the use of dictionary  dictionaries would need to be kept small  large dictionaries contain very rare words that resemble misspellings of other words 8 / 40

5.4 Probabilistic Models (1/3)  The intuition of the noisy channel model is to treat the surface form as an instance of the lexical form  to build a model of the channel so that we can figure out how it modified this “true” word and recover it  source of noise  variation in pronunciation, variation in the realization of phones, acoustic variation due to the channel 9 / 40

5.4 Probabilistic Models (2/3)  string of phones (say [ni])  word corresponds to this string of phones  consider all possible words  P (word | observation) is highest  (5.1)  : our estimate of the correct w  O : the observation sequence [ni]  function argmax x f(x) : the x such that f(x) is maximized 10 / 40

5.4 Probabilistic Models (3/3)  (5.2)  (5.3)  substituting (5.2) into (5.1) to get (5.3)  we can ignore P(O). Why?  (5.4)  P(w) is called the Prior probability  P(O|w) is called the likelihood 11 / 40

5.5 Applying the Bayesian Method to Spelling (1/5) 12 / 40

5.5 Applying the Bayesian Method to Spelling (3/5)  p(acress|across) → number of times that e was substituted for 0 in some large corpus of error  confusion matrix  a square 26 * 26 table  number of times one letter was incorrectly used instead of another  [o,e] in a substitution confusion matrix -count of times e was substitution for o 14 / 40

5.5 Applying the Bayesian Method to Spelling (4/5)  del[x,y] contains the number of times in the training set that the characters xy in the correct word were typed as x  ins[x,y] contains the number of times in the training set that the character x in the correct word was typed as xy  sub[x,y] the number of times that x was typed as y  trans[x,y] the number of times that xy was typed as yx 15 / 40

5.6 Minimum Edit Distance (1/6)  string distance - some metric of how alike two strings are to each other  minimum edit distance - the minimum number of editing operations needed to transform one string into another  operation - insertion, deletion, substitution  For example  the gap between intention and execution is five operation  trace, alignment, operation list (Figure 5.4.) 17 / 40

5.6 Minimum Edit Distance (2/6) 18 / 40

5.6 Minimum Edit Distance (3/6)  Levenshtein distance  assign a particular cost or weight to each of operations  simplest weighting factor  three operation has a cost of 1  Levenshtein distance between intention and execution is 5  alternate version - substitutions has a cost of 2 (why?)  The minimum edit distance is computed by dynamic programming 19 / 40

5.6 Minimum Edit Distance (4/6)  Dynamic programming  large problem can be solved by properly combining the solution to various subproblems  minimum edit distance for spelling error correction  Viterbi and the forward for speech recognition  CYK and Earley for parsing 20 / 40

5.6 Minimum Edit Distance (5/6) 21/ 40

5.6 Minimum Edit Distance (6/6) 22 / 40

5.8 The Bayesian Method for Pronunciation (1/6)  Bayesian algorithm can be used to solve what is often called the pronunciation subproblem in speech recognition  when [ni] occurs after the word I at the beginning of a sentence  investigation of the Switchboard corpus produces a total of 7 words  the, neat, need, new, knee, to, you (Chapter 4 참고 )  two components  candidate generation  candidate scoring 23 / 40

5.8 The Bayesian Method for Pronunciation (2/6)  Speech recognizers often use an alternative architecture, trading off speech for storage  each pronunciation is expanded in advance with all possible variants, which are then pre-stored with their scores  Thus there is no need for candidate generation  the word [ni] is simply stored with the list of words that can generate it 24 / 40

5.8 The Bayesian Method for Pronunciation (3/6)   y represents the sequence of phones  w represents the candidate word  it turns out that confusion matrices don't do as well for pronunciation  the changes in pronunciation between a lexical and surface form are much greater  probabilistic models of pronunciation variation include a lot more factors than a simple confusion matrix can include  One simple way to generate pronunciation likelihoods is via probabilistic rules 25 / 40

5.8 The Bayesian Method for Pronunciation (4/6)  a word-initial [δ] becomes [n] if the preceding word ended in [n] or sometimes [m]   ncout : number of times lexical [δ] is realized word initially by surface [n] when the previous word ends in a nasal  envcount : total number of times lexical [δ] occurs when the previous word ends in a nasal 26 / 40

5.8 The Bayesian Method for Pronunciation (5/6) 27/ 40

5.8 The Bayesian Method for Pronunciation (6/6)  Decision Tree Models of Pronunciation Variation 28 / 40

5.9 Weighted Automata (1/12)  Weighted Automata  simple augmentation of the finite automaton  each arc is associated with a probability  the probability on all the arcs leaving a node must sum to 1 29/ 40

5.9 Weighted Automata (2/12) 30 / 40

5.9 Weighted Automata (4/12) 3 2/ 40

Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.

Similar presentations

Presentation on theme: "Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.

Similar presentations

Presentation on theme: "Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189."— Presentation transcript:

Similar presentations

About project

Feedback