Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Worksheet I. Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Decision Making Under Risk Continued: Bayes’Theorem and Posterior Probabilities MGS Chapter 8 Slides 8c.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Parameter Estimation using likelihood functions Tutorial #1
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Visual Recognition Tutorial
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Maximum likelihood (ML) and likelihood ratio (LR) test
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Pattern Recognition Topic 2: Bayes Rule Expectant mother:
Maximum likelihood (ML)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 4-2 Basic Concepts of Probability.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Naive Bayes Classifier
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Mathematics topic handout: Conditional probability & Bayes Theorem Dr Andrew French. PAGE 1www.eclecticon.info Conditional Probability.
November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian statistics Probabilities for everything.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Computer Performance Modeling Dirk Grunwald Spring ‘96 Jain, Chapter 12 Summarizing Data With Statistics.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Introduction to Hypothesis Testing
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham.
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Lecture 1.31 Criteria for optimal reception of radio signals.
Probability Theory and Parameter Estimation I
Naive Bayes Classifier
Ch3: Model Building through Regression
CS 416 Artificial Intelligence
Tutorial #3 by Ma’ayan Fishelson
Discrete Event Simulation - 4
Statistical NLP: Lecture 4
Wellcome Trust Centre for Neuroimaging
Conditional Probability, Bayes Theorem, Independence and Repetition of Experiments Chris Massa.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005

1) In a casino, two differently loaded but identically looking dice are thrown in repeated runs. The frequencies of numbers observed in 40 rounds of play are as follows: Dice 1, [Nr, Frequency]: [1,5], [2,3], [3,10], [4,1], [5,10], [6,11] Dice 2, [Nr, Frequency]: [1,10], [2,11], [3,4], [4,10], [5,3], [6,2] (i)Characterize the two dice by the corresponding random sequence model they generated. That is, estimate the parameters of the random sequence model for both dice. ANSWER Die 1, [Nr, P_1(Nr)]: [1, 0.125], [2,0.075], [3,0.250], [4,0.025], [5,0.250], [6,0.275] Die 2, [Nr, P_2(Nr)]: [1,0.250], [2,0.275], [3,0.100], [4,0.250], [5,0.075], [6,0.050]

(ii) Some time later, one of the dice has disappeared. You (as the casino owner) need to find out which one. The remaining one is now thrown 40 times and here are the observed counts: [1,8], [2,12], [3,6], [4,9], [5,4], [6,1]. Use a Bayes’ rule to decide the identity of the remaining die. ANSWER Since we have a random sequence model (i.i.d. data) D, the probability of D under the two models is Since there is no prior knowledge about either dice, we use a flat prior, i.e. the same 0.5 for both hypotheses. Because P_1(D) < P_2(D), and the prior is the same for both hypothesies, we conclude that the die in question is the die no. 2.

2) A simple model for a data sequence is the random sequence model – i.e. that each symbol is generated independently from some distribution. A more complex model is a Markov model – i.e. the probability of a symbol at time t depends on the symbol observed on time t-1. Consider the following two sequences: (s1): A B B A B A A A B A A B B B (s2): B B B B B A A A A A B B B B Further, consider the following two models: (M1): a random sequence model with parameters P(A)=0.4, P(B)=0.6 (M2): a first order Markov model with initial probabilities 0.5 for both symbols and the following transition matrix: P(A|A)=0.6, P(B|A)=0.4, P(A|B)=0.1, P(B|B)=0.9.  Which of s1 and s2 is more likely to have been generated from which of the models M1 and M2? Justify your answer both using intuitive arguments and also by using Bayes’ rule. (As there is no prior knowledge given here, then consider equal prior probabilities.)

ANSWER: Intuitively it can be observed that s2 contains more state repetitions, which is an evidence that indicates that the Markov structure of M2 is more likely than the random structure of M1. The sequence s1 in turn is apparently more random, therefore it is more likely generated from M1. The probability of s1 under the models is the following: log P(s1|M1)=7*log(0.4)+7*log(0.6)= log P(s1|M2)=0.5+3*log(0.6)+4*log(0.4)+3*log(0.1)+3*log(0.9) = So s1 is more likely to have been generated from M1. Similarly, for s2 we get: log P(s2|M1)=5*log(0.4)+9*log(0.6)= log P(s2|M2)=0.5+4*log(0.6)+log(0.4)+log(0.1)+7*log(0.9)= So s2 is more likely to have been generated from M2.

3) You are to be tested for a disease that has prevalence in the population of 1 in The lab test used is not always perfect: It has a false-positive rate of 1%. [A false-positive result is when the test is positive, although the disease is not present.] The false negative rate of the test is zero. [A false negative is when the test result is negative while in fact the disease is present.] a) If you are tested and you get a positive result, what is the probability that you actually have the disease? b) Under the conditions in the previous question, is it more probable that you have the disease or that you don’t? c) Would the answers to a) and / or b) differ if you use a maximum likelihood versus a maximum a posteriori hypothesis estimation method? Comment on your answer.

ANSWER a) We have two binary variables, A and B. A is the outcome of the test, B is the presence/absence of the disease. We need to compute P(B=1|A=1). We use Bayes theorem: Now the required quantities are known from the problem. These are the following: P(A=1|B=1)=1, i.e. true positives P(B=1)=1/1000, i.e. prevalence P(A=1|B=0)=0.01, i.e. false positives P(B=0)=1-1/1000 Replacing, we have:

b) Under the conditions in the previous question, is it more probable that you have the disease or that you don’t? ANSWER: P(B=0|A=1)=1-P(B=1|A=1)= So clearly it is more probable that the disease is not present.

c) Would the answers to a) and / or b) differ if you use a maximum likelihood versus a maximum a posteriori hypothesis estimation method? Comment on your answer. ANSWER: -ML maximises P(D|h) w.r.t. h, whereas MAP maximises P(h|D). So MAP includes prior knowledge about the hypothesis, as P(h|D) is in fact proportional to P(D|h)*P(h). This is a good example where the importance and influence of prior knowledge is evident. -The answer at b) is based on the maximum a posteriori estimate, as we have included prior knowledge in the form of prevalence of the disease. If that would not been taken into account, i.e. both P(B=1)=0.5 and P(B=0)=0.5 is considered than the hypothesis estimate would be the maximum likelihood one. In that case the presence of the disease would come out be more probable than the absence of it. This is an example of how prior knowledge can influence the Bayesian decisions.