MSc Methods part II: Bayesian analysis Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:
Intro to Bayes’ Theorem –Science and scientific thinking –Probability & Bayes Theorem – why is it important? –Frequentists v Bayesian –Background, rationale –Methods: MCMC …… –Advantages / disadvantages Applications: –parameter estimation, uncertainty –Practical – basic Bayesian estimation Lecture outline
Reading and browsing Gauch, H., 2002, Scientific Method in Practice, CUP. Sivia, D. S. with Skilling, J. (2008) Data Analysis, 2 nd ed., OUP, Oxford. Monteith and Unsworth, Computational Numerical Methods in C (XXXX) Flake, W. G. (2000) Computational Beauty of Nature, MIT Press. Gershenfeld, N. (2002) The Nature of Mathematical Modelling,, CUP. Mathematical texts –Blah Kalman filters –Welch and Bishop –Maybeck Papers
Carry out experiments? Collect observations? Test hypotheses (models)? Generate “understanding”? Objective knowledge?? Induction? Deduction? So how do we do science?
Deduction –Inference, by reasoning, from general to particular –E.g. Premises: i) every mammal has a heart; ii) every horse is a mammal. –Conclusion: Every horse has a heart. –Valid if the truth of premises guarantees truth of conclusions & false otherwise. –Conclusion is either true or false Induction and deduction
Induction –Process of inferring general principles from observation of particular cases –E.g. Premise: every horse that has ever been observed has a heart –Conclusion: Every horse has a heart. –Conclusion goes beyond information present, even implicitly, in premises –Conclusions have a degree of strength (weak -> near certain). Induction and deduction
If plants lack nitrogen, they become yellowish –The plants are yellowish, therefore they lack N –The plants do not lack N, so they do not become yellowish –The plants lack N, so they become yellowish –The plants are not yellowish, so they do not lack N Affirming the antecedent: p q; p, q ✓ Denying the consequent: p q: ~q, ~p ✓ Affirming the consequent: p q: q, p X Denying the antecedent: p q: ~p, ~q X Aside: sound argument v fallacy
Fallacies can be hard to spot in longer, more detailed arguments: –Fallacies of composition; ambiguity; false dilemmas; circular reasoning; genetic fallacies (ad hominem) Gauch (2003) notes: –For an argument to be accepted by any audience as proof, audience MUST accept premises and validity –That is: part of responsibility for rational dialogue falls to the audience –If audience data lacking and / or logic weak then valid argument may be incorrectly rejected (or vice versa) Aside: sound argument v fallacy
1.Realism: physical world is real; 2.Presuppositions: world is orderly and comprehensible; 3.Evidence: science demands evidence; 4.Logic: science uses standard, settled logic to connect evidence and assumptions with conclusions; 5.Limits: many matters cannot usefully be examined by science; 6.Universality: science is public and inclusive; 7.Worldview: science must contribute to a meaningful worldview. Gauch (2006): “Seven pillars of Science”
Fundamental laws of probability can be derived from statements of logic BUT there are different ways to apply Two key ways –Frequentist –Bayesian – after Rev. Thomas Bayes ( ) What’s this got to do with methods?
Informally, the Bayesian Q is: –“What is the probability (P) that a hypothesis (H) is true, given the data and any prior knowledge?” –Weighs hypotheses (different models) in the light of data The frequentist Q is: –“How reliable is an inference procedure, but virtue of not rejecting a true hypothesis or accepting a false hypothesis?” –Weighs procedures (different sets of data) in the light of hypothesis Bayes: see Gauch (2003) ch 5
Prior knowledge? –What is known beyond the particular experiment at hand, which may be substantial or negligible We all have priors: assumptions, experience, other pieces of evidence Bayes approach explicitly requires you to assign a probability to your prior (somehow) Bayesian view - probability as degree of belief rather than a frequency of occurrence (in the long run…) Bayes: see Gauch (2003) ch 5
The “chief rule involved in the process of learning from experience” (Jefferys, 1983) Formally: P(H|D) = Posterior i.e. probability of hypothesis (model) H being true, given data D P(D|H) = Likelihood i.e probability of data D being observed if H is true P(H) = Prior i.e. probability of hypothesis being true before measurement of D Bayes’ Theorem
Importance? P(H|D) appears on the left of BT It solves the inverse (inductive) problem – probability of a hypothesis given some data This is how we do science in practice! We don’t have access to infinite repetitions of expts (the ‘long run frequency’ view) Bayes’ Theorem
I is ‘background information’ as there is ‘no such thing as absolute probability’ (see S & S p 5) P(rain today) will depend on clouds this morning, whether we saw forecast etc. etc. – I usually left out but …. Power of Bayes’ Theorem –Relates the quantity of interest i.e. P of H being true given D, to that which we might estimate in practice i.e. P of observing D, given H is correct Bayes Theorem
To go from to to = we need to divide by P(D|I) Where P(D|I) is known as the Evidence Normalisation constant which can be left out for parameter estimation as independent of H But is required in model selection for e.g. where data amount may be critical Bayes Theorem
To go from to to = we need to divide by P(D|I) Where P(D|I) is known as the Evidence Normalisation constant which can be left out for parameter estimation as independent of H But is required in model selection for e.g. where data amount may be critical Bayes Theorem & marginalisation
For two mutually exclusive H1, H2 i.e. P(H2|D) = 1 – P(H1|D) we can express in ratio or ‘odds’ form Posterior odds = likelihood odds x prior odds E.g. if prior odds P(H1)/P(H2) 3:1 and new data shows likelihood odds P(D|H1)/P(D|H2) then posterior odds = 1:9 i.e. now H2 favoured over H1 Bayes’s Theorem
Ignored priors & the rare diseases Disease affects 1:100,000 randomly If you have it, test will correctly say so with P = 0.95 Test gives incorrect positive diagnosis (false positive) with P = If test is positive, what is P that diagnosis is correct? Bayes: examples, implications
Use Bayes’s Theorem two hypothesis case dasasd Bayes: examples, implications