Top Changwatchai 18 October 2000

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Outline input analysis input analyzer of ARENA parameter estimation
ฟังก์ชั่นการแจกแจงความน่าจะเป็น แบบไม่ต่อเนื่อง Discrete Probability Distributions.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Parameter Estimation using likelihood functions Tutorial #1
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Discrete random variables Probability mass function Distribution function (Secs )
Probability Distributions
Copyright © 2009 Pearson Education, Inc. Chapter 16 Random Variables.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Computer vision: models, learning and inference
First we need to understand the variables. A random variable is a value of an outcome such as counting the number of heads when flipping a coin, which.
Copyright © Cengage Learning. All rights reserved. 3.4 The Binomial Probability Distribution.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Chapter 6: Probability Distributions
Math of Games Unit Portfolio Presentation Charles Jenkins.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Random Variables.
Modeling and Simulation CS 313
1 Chapter 16 Random Variables. 2 Expected Value: Center A random variable assumes a value based on the outcome of a random event.  We use a capital letter,
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
27 February 2001What is Confidence?Slide 1 What is Confidence? How to Handle Overfitting When Given Few Examples Top Changwatchai AIML seminar 27 February.
1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)
King Saud University Women Students
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 16 Random Variables.
LECTURE 17 THURSDAY, 22 OCTOBER STA 291 Fall
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Sampling and estimation Petter Mostad
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
STA 2023 Module 5 Discrete Random Variables. Rev.F082 Learning Objectives Upon completing this module, you should be able to: 1.Determine the probability.
Chapter 16 Week 6, Monday. Random Variables “A numeric value that is based on the outcome of a random event” Example 1: Let the random variable X be defined.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
The Mean of a Discrete Random Variable Lesson
Univariate Gaussian Case (Cont.)
Statistics 16 Random Variables. Expected Value: Center A random variable assumes a value based on the outcome of a random event. –We use a capital letter,
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Probability Distribution for Discrete Random Variables
Applied statistics Usman Roshan.
The binomial probability distribution
Bayesian Estimation and Confidence Intervals
3 Discrete Random Variables and Probability Distributions
Binomial and Geometric Random Variables
CHAPTER 14: Binomial Distributions*
Discrete Random Variables
Chapter 15 Random Variables
STATISTICS Random Variables and Distribution Functions
Distribution functions
Bayes Net Learning: Bayesian Approaches
Chapter 16 Random Variables
Chapter 5 Sampling Distributions
Review of Probability and Estimators Arun Das, Jason Rebello
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.

Random Variables Binomial Distributions
Chapter 16 Random Variables Copyright © 2009 Pearson Education, Inc.
Chapter 5 Sampling Distributions
LECTURE 09: BAYESIAN LEARNING
Expected values and variances
Bayes for Beginners Luca Chech and Jolanda Malamud
Lecture 11: Binomial and Poisson Distributions
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Chapter 16 Random Variables Copyright © 2010 Pearson Education, Inc.
Kalman Filter: Bayes Interpretation
Mathematical Foundations of BME Reza Shadmehr
Sets, Combinatorics, Probability, and Number Theory
Presentation transcript:

Top Changwatchai 18 October 2000 Group Presentation Top Changwatchai 18 October 2000 Revised 23 Oct 2000 1

The main point Last week I got several good questions I plan to address three issues: Explain my definition of the random variable Explain why we want expectation, not maximum likelihood value Justify why it has a beta distribution under certain assumptions Revised 23 Oct 2000 2

Assumptions There are k different coins (1, 2, …, k) pi = prior probability of picking coin i wi = weight of coin i = probability of getting heads on any given toss of coin i (independent of any other tosses) Our algorithm knows this, and knows the values of the pi’s and wi’s Revised 23 Oct 2000 3

Random experiment 1 Experiment: Goal: Algorithm A: 1. Pick one of the k coins according to the p’s 2. Toss this coin one time Goal: Perform this experiment one time Without knowing anything else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails Algorithm A: 1. Calculate the probability of getting heads 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Revised 23 Oct 2000 4

Confidence We want confidence to reflect how “good” our prediction is: confideal  P(make same prediction | more knowledge) Lots of different things can constitute extra knowledge. We focus on one type of knowledge in particular: confexp1  P(make same prediction | we know which coin was picked) Note: we don’t actually know which coin was picked. We want to know the probability we will make the same prediction in the hypothetical case that we are told which coin was picked. (See next slide for alternative explanation.) So: Our new prediction uses the same rule as in algorithm A. Say we are told that coin i is picked. Then if wi < 0.5, we will predict tails. Otherwise, we will predict heads. In other words, if we predicted heads with algorithm A: In addition: So, if we predicted heads: Revised 23 Oct 2000 5

Confidence (alternative explanation) Revised 23 Oct 2000 6

Random variable for experiment 1 The space of random experiment 1 is: { (coin i, heads or tails) } We define a discrete random variable X for this experiment: X((coin i, heads or tails)) = wi Note that we ignore the outcome of the flip…since that’s what we’re predicting Support for X is { w1, w2, …, wk } The pmf of X is defined as follows: f(w) = { pi if w = wi, 0 otherwise } The expectation of X: Note this is the same as pheads in algorithm A, so we define: Algorithm B: 1. Calculate E(X) 2. If E(X) < 0.5, predict tails. Otherwise, predict heads This is why we use expectation of X, not maximum likelihood We also use X to compute confidence. For example, if we predict heads: Revised 23 Oct 2000 7

Example Max likelihood coin (highest probability) is coin 1 w1 = 0.2, so predict tails (not what we want) Instead, we use expectation: E(X) = 0.20.4 + 0.80.3 + 0.90.3 = 0.59, so predict heads confexp1 = 0.3 + 0.3 = 0.6 Revised 23 Oct 2000 8

Random experiment 2 Same situation as above. Let N be a finite but very large number. Experiment: 1. Pick one of the k coins according to the p’s 2. Toss this coin N times. 3. Toss the same coin one more time Goal: Perform this experiment one time Let H be the number of heads observed in the first N tosses Knowing H and N but nothing else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails on the last toss Note that for N=0, we have random experiment 1 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Revised 23 Oct 2000 9

Algorithm C Algorithm C: Confidence: 1. Calculate the probability of getting heads on the last toss: 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Confidence: If we predict heads: Revised 23 Oct 2000 10

Random variable for experiment 2 The space of random experiment 2 is: { (coin i, data from N tosses, heads or tails on last toss) } We define a discrete random variable X for this experiment: X((coin i, data from N tosses, heads or tails on last toss)) = wi Note again that we ignore everything except the coin index The pmf of X is defined as follows: f(w) = { P(coin i | H, N) if w = wi, 0 otherwise } The expectation of X: Note this is the same as pheads in algorithm C, so we define: Algorithm D: 1. Calculate E(X) 2. If E(X) < 0.5, predict tails. Otherwise, predict heads Confidence: If we predict heads: Revised 23 Oct 2000 11

Continuous case Random experiment 3 (continuous version of experiment 2): 1. Assume we have random variable W with pdf g(w): Pick a value w under this distribution 2. Toss coin with this weight N times 3. Toss the same coin one more time We can use Algorithm C as well, using the following calculations (we abuse notation slightly--we will correct this on the next slide) Since: And: Assuming we predicted heads: Revised 23 Oct 2000 12

Continuous case (con’t) We can translate all the probabilities as follows: so we can write: Clearly, if we define random variable X with the pdf: Then the equations on the previous page become: Which of course fit into algorithms B and D Revised 23 Oct 2000 13

Beta distribution Let’s say we don’t know g(w). If we assume Wbeta(w, w), then: where C is the appropriately defined constant. Clearly f(w) is also a beta distribution with parameters  = H+w and  = N-H+w, that is: Xbeta(H+w, N-H+w) with mean: For example, if Wbeta(1, 1)=U(0, 1), the uniform distribution, then Xbeta(H+1, N-H+1) and: Note that E(X) = HN exactly only if HN = ½ Revised 23 Oct 2000 14