600.465 - Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Intro to NLP - J. Eisner1 Finite-State Methods.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Speech Recognition. What makes speech recognition hard?
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
FSA and HMM LING 572 Fei Xia 1/5/06.
Intro to NLP - J. Eisner1 Probabilistic CKY.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Languages, grammars, and regular expressions
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Transliteration T BHARGAVA REDDY (Knowledge sharing)
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Graphical models for part of speech tagging
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Intro to NLP - J. Eisner1 Finite-State Methods.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Finite State Transducers for Morphological Parsing
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Introduction to CL & NLP CMSC April 1, 2003.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
Intro to NLP - J. Eisner1 Bayes’ Theorem.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
ICS 482: Natural language Processing Pre-introduction
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Intro to NLP - J. Eisner1 Building Finite-State Machines.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CS 224S / LINGUIST 285 Spoken Language Processing
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Part-of-Speech Tagging
[These slides are missing most examples and discussion from class …]
Basic Parsing with Context Free Grammars Chapter 13
CPSC 503 Computational Linguistics
Finite-State Methods Intro to NLP - J. Eisner.
Building Finite-State Machines
CSA3180: Natural Language Processing
Finite-State and the Noisy Channel
Finite-State Programming
CPSC 503 Computational Linguistics
Presentation transcript:

Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel

Intro to NLP - J. Eisner2 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y

Intro to NLP - J. Eisner3 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y correct spelling typos misspelling

Intro to NLP - J. Eisner4 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y (lexicon space)* delete spaces text w/o spaces

Intro to NLP - J. Eisner5 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y (lexicon space)* pronunciation speech language model acoustic model

Intro to NLP - J. Eisner6 Noisy Channel Model noisy channel X  Y real language X yucky language Y want to recover X from Y tree delete everything but terminals text probabilistic CFG

Intro to NLP - J. Eisner7 Noisy Channel Model noisy channel X  Y real language X yucky language Y p(X) p(Y | X) p(X,Y) * = want to recover x  X from y  Y choose x that maximizes p(x | y) or equivalently p(x,y)

Speech Recognition by FST Composition (Pereira & Riley 1996) Intro to NLP - J. Eisner8 p(word seq) p(phone seq | word seq) p(acoustics | phone seq).o. ə: phone context phone context CAT:k æ t trigram language model

Intro to NLP - J. Eisner9 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:D/0.63 a:C/0.07 b:C/0.24 b:D/0.06 Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”?

Intro to NLP - J. Eisner10 Noisy Channel Model p(X) p(Y | X) p(X,Y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:D/0.63 a:C/0.07 b:C/0.24 b:D/0.06 Suppose y=“C”; what is best “x”?

Intro to NLP - J. Eisner11 Noisy Channel Model p(X) p(Y | X) p(X, y) * = a:D/0.9 a:C/0.1 b:C/0.8 b:D/0.2 a:a/0.7 b:b/0.3.o. = a:C/0.07 b:C/0.24.o.* C:C/1 (Y=y)? restrict just to paths compatible with output “C” best path

Intro to NLP - J. Eisner12 Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b O(k) deletion arcs O(k) insertion arcs O(k 2 ) substitution arcs O(k) no-change arcs

Intro to NLP - J. Eisner13 Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b O(k) deletion arcs O(k) insertion arcs O(k 2 ) substitution arcs O(k) identity arcs Likely edits = high-probability arcs Stochastic

Intro to NLP - J. Eisner14 clara Edit Distance Transducer a:   :a b:   :b a:b b:a a:a b:b Stochastic.o. = caca.o. c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:   :c c:c  :c l:c  :c a:c  :c r:c  :c a:c  :c c:  l:  a:  r:  a:   :a c:a  :a l:a  :a a:a  :a r:a  :a a:a  :a c:  l:  a:  r:  a:  Best path (by Dijkstra’s algorithm)

Speech Recognition by FST Composition (Pereira & Riley 1996) Intro to NLP - J. Eisner15 p(word seq) p(phone seq | word seq) p(acoustics | phone seq).o. trigram language model pronunciation model acoustic model.o. observed acoustics

Intro to NLP - J. Eisner16 Word Segmentation  What does this say?  And what other words are substrings?  Could segment with parsing (how?), but slow.  Given L = a “lexicon” FSA that matches all English words.  How to apply to this problem?  What if Lexicon is weighted?  From unigrams to bigrams?  Smooth L to include unseen words? theprophetsaidtothecity

Intro to NLP - J. Eisner17 Spelling correction  Spelling correction also needs a lexicon L  But there is distortion …  Let T be a transducer that models common typos and other spelling errors  ance (  ) ence(deliverance,...)  e   (deliverance,...)    e // Cons _ Cons(athlete,...)  rr  r (embarrasş occurrence, …)  ge  dge(privilege, …)  etc.  Now what can you do with L.o. T ?  Should T and L have probabilities?  Want T to include “all possible” errors …

Intro to NLP - J. Eisner18 Morpheme Segmentation  Let L be a machine that matches all Turkish words  Same problem as word segmentation  Just at a lower level: morpheme segmentation  Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına = uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na (behaving) as if you are among those whom we could not cause to become civilized  Some constraints on morpheme sequence: bigram probs  Generative model – concatenate then fix up joints  stop + -ing = stopping, fly + -s = flies, vowel harmony  Use a cascade of transducers to handle all the fixups  But this is just morphology!  Can use probabilities here too (but people often don’t)

Intro to NLP - J. Eisner19 More Engineering Applications  Markup  Dates, names, places, noun phrases; spelling/grammar errors?  Hyphenation  Informative templates for information extraction (FASTUS)  Word segmentation (use probabilities!)  Part-of-speech tagging (use probabilities – maybe!)  Translation  Spelling correction / edit distance  Phonology, morphology: series of little fixups? constraints?  Speech  Transliteration / back-transliteration  Machine translation?  Learning …