600.465 - Intro to NLP - J. Eisner1 Bayes’ Theorem.

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Foundations of Artificial Intelligence 1 Bayes’ Rule - Example  Medical Diagnosis  suppose we know from statistical data that flu causes fever in 80%
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
EXAMPLE 1 Construct a probability distribution
EXAMPLE 1 Construct a probability distribution Let X be a random variable that represents the sum when two six-sided dice are rolled. Make a table and.
Probability & Certainty: Intro Probability & Certainty.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Intro to NLP - J. Eisner1 Probabilistic CKY.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Bayesian learning finalized (with high probability)
Bayesian Models. Agenda Project WebCT Late HW Math –Independence –Conditional Probability –Bayes Formula & Theorem Steyvers, et al 2003.
Pattern Classification, Chapter 1 1 Basic Probability.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Probability & Certainty: Intro Probability & Certainty.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
PROBABILITY David Kauchak CS451 – Fall Admin Midterm Grading Assignment 6 No office hours tomorrow from 10-11am (though I’ll be around most of the.
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
2. Mathematical Foundations
Principles of Pattern Recognition
1 NA387 Lecture 6: Bayes’ Theorem, Independence Devore, Sections: 2.4 – 2.5.
Lab 4 1.Get an image into a ROS node 2.Find all the orange pixels (suggest HSV) 3.Identify the midpoint of all the orange pixels 4.Explore the findContours.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
PROBABILITY David Kauchak CS159 – Spring Admin  Posted some links in Monday’s lecture for regular expressions  Logging in remotely  ssh to vpn.cs.pomona.edu.
Chapter 7: Probability Lesson 6: Probability Distributions Mrs. Parziale.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
1 How to Use Probabilities The Crash Course Jason Eisner.
Baye’s Theorem Working with Conditional Probabilities.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
NLP. Introduction to NLP Formula for joint probability –p(A,B) = p(B|A)p(A) –p(A,B) = p(A|B)p(B) Therefore –p(B|A)=p(A|B)p(B)/p(A) Bayes’ theorem is.
Warm ups. Return Quizzes Let’s go over them!! 9-8 Special Products Objective: To identify and expand the three special products.
Molecular Systematics
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
1 How to Use Probabilities The Crash Course Jason Eisner, JHU.
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
Inference Algorithms for Bayes Networks
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Finding Roots With the Remainder Theorem. Dividing Polynomials Remember, when we divide a polynomial by another polynomial, we get a quotient and a remainder.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Check it out! : Standard Normal Calculations.
Solving 2 step equations. Two step equations have addition or subtraction and multiply or divide 3x + 1 = 10 3x + 1 = 10 4y + 2 = 10 4y + 2 = 10 2b +
Statistics Tutorial.
5.3 Definite Integrals. Example: Find the area under the curve from x = 1 to x = 2. The best we can do as of now is approximate with rectangles.
Bayes for Beginners Anne-Catherine Huys M. Berk Mirza Methods for Dummies 20 th January 2016.
HL2 Math - Santowski Lesson 93 – Bayes’ Theorem. Bayes’ Theorem  Main theorem: Suppose we know We would like to use this information to find if possible.
Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
1 COMP 791A: Statistical Language Processing Mathematical Essentials Chap. 2.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham.
1 How to Use Probabilities The Crash Course – Intro to NLP – J. Eisner2 Goals of this lecture Probability notation like p(X | Y): –What does.
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayes’ Theorem Intro to NLP - J. Eisner.
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Natural Language Processing
Introduction to Probability
CSCI 5832 Natural Language Processing
Suppose you roll two dice, and let X be sum of the dice. Then X is
Statistical NLP: Lecture 4
Finite-State and the Noisy Channel
LECTURE 23: INFORMATION THEORY REVIEW
Bayes for Beginners Luca Chech and Jolanda Malamud
Presentation transcript:

Intro to NLP - J. Eisner1 Bayes’ Theorem

Let’s revisit this – Intro to NLP – J. Eisner2 Remember Language ID? Let p(X) = probability of text X in English Let q(X) = probability of text X in Polish Which probability is higher? –(we’d also like bias toward English since it’s more likely a priori – ignore that for now) “Horses and Lukasiewicz are on the curriculum.” p(x 1 = h, x 2 = o, x 3 = r, x 4 = s, x 5 = e, x 6 = s, …)

Intro to NLP - J. Eisner3 Bayes’ Theorem  p(A | B) = p(B | A) * p(A) / p(B)  Easy to check by removing syntactic sugar  Use 1: Converts p(B | A) to p(A | B)  Use 2: Updates p(A) to p(A | B)  Stare at it so you’ll recognize it later

Intro to NLP - J. Eisner4 Language ID  Given a sentence x, I suggested comparing its prob in different languages:  p(SENT=x | LANG=english)(i.e., p english (SENT=x))  p(SENT=x | LANG=polish)(i.e., p polish (SENT=x))  p(SENT=x | LANG=xhosa)(i.e., p xhosa (SENT=x))  But surely for language ID we should compare  p(LANG=english | SENT=x)  p(LANG=polish | SENT=x)  p(LANG=xhosa | SENT=x)

Intro to NLP - J. Eisner5 a posteriori a priorilikelihood (what we had before) Language ID sum of these is a way to find p(SENT=x); can divide back by that to get posterior probs  For language ID we should compare  p(LANG=english | SENT=x)  p(LANG=polish | SENT=x)  p(LANG=xhosa | SENT=x)  For ease, multiply by p(SENT=x) and compare  p(LANG=english, SENT=x)  p(LANG=polish, SENT=x)  p(LANG=xhosa, SENT=x)  Must know prior probabilities; then rewrite as  p(LANG=english) * p(SENT=x | LANG=english)  p(LANG=polish)* p(SENT=x | LANG=polish)  p(LANG=xhosa) * p(SENT=x | LANG=xhosa)

likelihood p(SENT=x | LANG=english) p(SENT=x | LANG=polish) p(SENT=x | LANG=xhosa) Intro to NLP - J. Eisner6 Let’s try it! joint probability ====== p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) from a very simple model: a single die whose sides are the languages of the world from a set of trigram dice (actually 3 sets, one per language) best best compromise probability of evidence p(SENT=x) total over all ways of getting SENT=x “First we pick a random LANG, then we roll a random SENT with the LANG dice.” prior prob p(LANG=english) * p(LANG=polish)* p(LANG=xhosa) *

joint probability Let’s try it! p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) ====== probability of evidence best compromise p(SENT=x) total probability of getting SENT=x one way or another! “First we pick a random LANG, then we roll a random SENT with the LANG dice.” … Intro to NLP - J. Eisner7 posterior probability p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x) / = 7/ / = 8/ / = 5/20 best add up normalize (divide by a constant so they’ll sum to 1) given the evidence SENT=x, the possible languages sum to 1

joint probability Intro to NLP - J. Eisner8 Let’s try it! p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) ====== probability of evidence best compromise p(SENT=x) total over all ways of getting x

Intro to NLP - J. Eisner9 General Case (“noisy channel”) mess up a into b ab “noisy channel” “decoder” most likely reconstruction of a p(A=a) p(B=b | A=a) language  text text  speech spelled  misspelled English  French maximize p(A=a | B=b) = p(A=a) p(B=b | A=a) / (B=b) = p(A=a) p(B=b | A=a) /  a’ p(A=a’) p(B=b | A=a’)

Intro to NLP - J. Eisner10 likelihood a posteriori a priori Language ID  For language ID we should compare  p(LANG=english | SENT=x)  p(LANG=polish | SENT=x)  p(LANG=xhosa | SENT=x)  For ease, multiply by p(SENT=x) and compare  p(LANG=english, SENT=x)  p(LANG=polish, SENT=x)  p(LANG=xhosa, SENT=x)  which we find as follows (we need prior probs!):  p(LANG=english) * p(SENT=x | LANG=english)  p(LANG=polish)* p(SENT=x | LANG=polish)  p(LANG=xhosa) * p(SENT=x | LANG=xhosa)

Intro to NLP - J. Eisner11 likelihood a posteriori a priori General Case (“noisy channel”)  Want most likely A to have generated evidence B  p(A = a1 | B = b)  p(A = a2 | B = b)  p(A = a3 | B = b)  For ease, multiply by p(B=b) and compare  p(A = a1, B = b)  p(A = a2, B = b)  p(A = a3, B = b)  which we find as follows (we need prior probs!):  p(A = a1) * p(B = b | A = a1)  p(A = a2) * p(B = b | A = a2)  p(A = a3) * p(B = b | A = a3)

Intro to NLP - J. Eisner12 likelihood a posteriori a priori Speech Recognition  For baby speech recognition we should compare  p(MEANING=gimme | SOUND=uhh)  p(MEANING=changeme | SOUND=uhh)  p(MEANING=loveme | SOUND=uhh)  For ease, multiply by p(SOUND=uhh) & compare  p(MEANING=gimme, SOUND=uhh)  p(MEANING=changeme, SOUND=uhh)  p(MEANING=loveme, SOUND=uhh)  which we find as follows (we need prior probs!):  p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme)  p(MEAN=changeme)* p(SOUND=uhh | MEAN=changeme)  p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)

Intro to NLP - J. Eisner13 Life or Death!  p(hoof) = so p(  hoof) =  p(positive test |  hoof) = 0.05 “false pos”  p(negative test | hoof) = x  0 “false neg” so p(positive test | hoof) = 1-x  1  What is p(hoof | positive test)?  don’t panic - still very small! < 1/51 for any x Does Epitaph have hoof- and-mouth disease? He tested positive – oh no! False positive rate only 5%