A/Prof Geraint Lewis A/Prof Peter Tuthill Thomas Bayes (1702-1761) Pierre-Simon Laplace (1749-1827) Bayesian Reasoning A/Prof Geraint Lewis A/Prof Peter Tuthill “Probability theory is nothing but common sense, reduced to calculation.” Laplace
Are you a Bayesian or Frequentist? 4 “There are 3 kinds of lies: Lies, Damned Lies, and Statistics” ...and Bayesian Statistics Benjamin Disraeli Frequentists Fig 1. A Frequentist Statistician Fig 2. Bayesian Statistics Conference
} } What is Inference? If A is true then B is true (Major Premise) A = A,B (in Boolean notation) Deductive Inference (Logic) Aristotle 4th Century B.C. A B A is true (Minor Premise) therefore B is true (conclusion) } T → T STRONG SYLLOGISMS B is False (Minor Premise) therefore A is False (conclusion) F ← F Inductive Inference (Plausible Reasoning) Useful to have in your head a concrete example. “When the bough breaks, the cradle will fall”. B is true (Minor Premise) therefore A is more plausible } t ← T WEAK SYLLOGISMS A is false (Minor Premise) therefore B is less plausible F → f
What is Inference? Deductive Logic: Effects or outcomes Cause Inductive Logic: Effects or observations Possible Causes
What is a Probability? Frequentists Bayesians P(A|B) = Real number measure of the plausibility of proposition A, given (conditional upon) the truth of proposition B P(A) = long run relative frequency of A occurring in identical repeats of an observation “A” is restricted to propositions about random variables “A” can be any logical proposition All probabilities are conditional; we must be explicit what our assumptions B are (no such thing as an absolute probability!)
Probability depends on our state of Knowledge Monte Hall A B C ?
Probability depends on our state of Knowledge 7 Red 5 Blue ? 1st draw 2nd draw 5/12 Blue 7/12 Red
The Desiderata of Bayesian Probability Theory Degrees of plausibility are represented by real numbers (higher degree of belief represented by a larger number) With extra evidence supporting a proposition, the plausibility should increase monotonically up to a limit (certainty). Consistency. Multiple ways to arrive at a conclusion must all produce the same answer (see book for additional details)
Logic and Probability In the certainty limit, where probabilities go to zero (falsehood) or one (truth), then the sum and product rules reduce to formal Boolean deductive logic (strong syllogisms). Bayesian Probability is therefore an extension of formal logic into intermediate states of knowledge. Bayesian inference gives a measure of our state of knowledge about nature, not a measure of nature itself.
The two rules underlying probability theory P(A|B) + P(A|B) = 1 SUM RULE: P(A,B|C) = P(A|C) P(B|A,C) PRODUCT RULE: = P(B|C) P(A|B,C) Blue, Left Blue Eyes Right Handed Left Handed All Kangaroos Brown Eyes
Bayes’ Theorem P(Hi|I) P(D|Hi I) P(Hi|D,I) = P(D|I) Posterior P(Hi|I) P(D|Hi I) P(Hi|D,I) = Bayes Theorem: P(D|I) Hi = proposition asserting truth of a hypothesis of interest I = proposition representing prior information D = proposition representing the data P(D|Hi I) = Likelihood: probability of obtaining the data given that the hypothesis is true P(Hi|I) = Prior: probability of hypothesis before new data P(D|I) = Normalization factor (prob all hypothesis i sum to 1)
Example: The Gambler’s coin problem P(H|I) P(D|H I) P(H|D,I) = P(D|I) Normalization factor – Ignore this for now as only need relative merit Prior – what do we know about the coin? Assume H=pdf(head) is uniformly distributed 0-1 Likelihood – if we assume the data D gives R heads in N tosses: P(D|H I) HR (1-H)N-R The full distribution, assuming independence of throws, is the Binomial Distribution. We omit terms not containing H, and use a proportionality.
Data Example: A fair coin? H H T T
Example: A fair coin?
The effects of the Prior