Top Changwatchai 18 October 2000 Group Presentation Top Changwatchai 18 October 2000 Revised 23 Oct 2000 1
The main point Last week I got several good questions I plan to address three issues: Explain my definition of the random variable Explain why we want expectation, not maximum likelihood value Justify why it has a beta distribution under certain assumptions Revised 23 Oct 2000 2
Assumptions There are k different coins (1, 2, …, k) pi = prior probability of picking coin i wi = weight of coin i = probability of getting heads on any given toss of coin i (independent of any other tosses) Our algorithm knows this, and knows the values of the pi’s and wi’s Revised 23 Oct 2000 3
Random experiment 1 Experiment: Goal: Algorithm A: 1. Pick one of the k coins according to the p’s 2. Toss this coin one time Goal: Perform this experiment one time Without knowing anything else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails Algorithm A: 1. Calculate the probability of getting heads 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Revised 23 Oct 2000 4
Confidence We want confidence to reflect how “good” our prediction is: confideal P(make same prediction | more knowledge) Lots of different things can constitute extra knowledge. We focus on one type of knowledge in particular: confexp1 P(make same prediction | we know which coin was picked) Note: we don’t actually know which coin was picked. We want to know the probability we will make the same prediction in the hypothetical case that we are told which coin was picked. (See next slide for alternative explanation.) So: Our new prediction uses the same rule as in algorithm A. Say we are told that coin i is picked. Then if wi < 0.5, we will predict tails. Otherwise, we will predict heads. In other words, if we predicted heads with algorithm A: In addition: So, if we predicted heads: Revised 23 Oct 2000 5
Confidence (alternative explanation) Revised 23 Oct 2000 6
Random variable for experiment 1 The space of random experiment 1 is: { (coin i, heads or tails) } We define a discrete random variable X for this experiment: X((coin i, heads or tails)) = wi Note that we ignore the outcome of the flip…since that’s what we’re predicting Support for X is { w1, w2, …, wk } The pmf of X is defined as follows: f(w) = { pi if w = wi, 0 otherwise } The expectation of X: Note this is the same as pheads in algorithm A, so we define: Algorithm B: 1. Calculate E(X) 2. If E(X) < 0.5, predict tails. Otherwise, predict heads This is why we use expectation of X, not maximum likelihood We also use X to compute confidence. For example, if we predict heads: Revised 23 Oct 2000 7
Example Max likelihood coin (highest probability) is coin 1 w1 = 0.2, so predict tails (not what we want) Instead, we use expectation: E(X) = 0.20.4 + 0.80.3 + 0.90.3 = 0.59, so predict heads confexp1 = 0.3 + 0.3 = 0.6 Revised 23 Oct 2000 8
Random experiment 2 Same situation as above. Let N be a finite but very large number. Experiment: 1. Pick one of the k coins according to the p’s 2. Toss this coin N times. 3. Toss the same coin one more time Goal: Perform this experiment one time Let H be the number of heads observed in the first N tosses Knowing H and N but nothing else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails on the last toss Note that for N=0, we have random experiment 1 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Revised 23 Oct 2000 9
Algorithm C Algorithm C: Confidence: 1. Calculate the probability of getting heads on the last toss: 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Confidence: If we predict heads: Revised 23 Oct 2000 10
Random variable for experiment 2 The space of random experiment 2 is: { (coin i, data from N tosses, heads or tails on last toss) } We define a discrete random variable X for this experiment: X((coin i, data from N tosses, heads or tails on last toss)) = wi Note again that we ignore everything except the coin index The pmf of X is defined as follows: f(w) = { P(coin i | H, N) if w = wi, 0 otherwise } The expectation of X: Note this is the same as pheads in algorithm C, so we define: Algorithm D: 1. Calculate E(X) 2. If E(X) < 0.5, predict tails. Otherwise, predict heads Confidence: If we predict heads: Revised 23 Oct 2000 11
Continuous case Random experiment 3 (continuous version of experiment 2): 1. Assume we have random variable W with pdf g(w): Pick a value w under this distribution 2. Toss coin with this weight N times 3. Toss the same coin one more time We can use Algorithm C as well, using the following calculations (we abuse notation slightly--we will correct this on the next slide) Since: And: Assuming we predicted heads: Revised 23 Oct 2000 12
Continuous case (con’t) We can translate all the probabilities as follows: so we can write: Clearly, if we define random variable X with the pdf: Then the equations on the previous page become: Which of course fit into algorithms B and D Revised 23 Oct 2000 13
Beta distribution Let’s say we don’t know g(w). If we assume Wbeta(w, w), then: where C is the appropriately defined constant. Clearly f(w) is also a beta distribution with parameters = H+w and = N-H+w, that is: Xbeta(H+w, N-H+w) with mean: For example, if Wbeta(1, 1)=U(0, 1), the uniform distribution, then Xbeta(H+1, N-H+1) and: Note that E(X) = HN exactly only if HN = ½ Revised 23 Oct 2000 14