Bayesian statistics Probabilities for everything.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Bayesian statistics 2 More on priors plus model choice.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Sampling Distributions (§ )
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational.
Bayesian statistics – MCMC techniques
Visual Recognition Tutorial
Making rating curves - the Bayesian approach. Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Visual Recognition Tutorial
Inferences About Process Quality
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Lecture II-2: Probability Review
Machine Learning Queens College Lecture 3: Probability and Statistics.
Statistical Decision Theory
Model Inference and Averaging
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Statistical modelling and latent variables. Constructing models based on insight and motivation.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Analysis of Time-intervals for Radiation Monitoring Peng Luo a T. A. DeVol a and J. L. Sharp b a.Environmental Engineering and Earth Sciences.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
Introduction to Statistical Inference A Comparison of Classical Orthodoxy with the Bayesian Approach.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Confidence Interval & Unbiased Estimator Review and Foreword.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Sampling and estimation Petter Mostad
Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Statistical modelling and latent variables.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
One Sample Mean Inference (Chapter 5)
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 71 Lecture 7 Bayesian methods: a refresher 7.1 Principles of the Bayesian approach 7.2 The beta distribution.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Oliver Schulte Machine Learning 726
Bayesian data analysis
Bayes Net Learning: Bayesian Approaches
Oliver Schulte Machine Learning 726
Of Probability & Information Theory
More about Posterior Distributions
Bayesian Inference, Basics
'Linear Hierarchical Models'
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Sampling Distributions (§ )
CS639: Data Management for Data Science
Presentation transcript:

Bayesian statistics Probabilities for everything

Different views on probability Frequentistic: Probabilities are there to tell us about long-term frequencies. They are objective, solely properties of nature (aleatoric). Bayesian: Probabilities are there so that we can sum up our knowledge about things we are uncertain about. They are therefore found in the interplay between the subject of our study and ourselves. In Bayesian statistics, probabilities are subjective, but can obtain an air of weak objectivity (intersubjectivity) if most people can agree that these probabilities sum up their collective knowledge (example: dice).

Bayes formula Both latent variables and parameters are treated using probability theory. We treat everything with the same tool, conditional probabilities. In a sense, we only have observations and latent variables. Knowledge is updated using Bayes theorem: The probability density f(  ) is called the prior and is meant to contain whatever information we have about  before the data, in the form of a probability density. Restrictions on the possible values the parameters can take are placed here. More on this later. For discrete variables, replace the probability density, f, with probability and integrals with sums.

Bayes formula Bayes theorem: This probability density, f(  |D), is called the posterior distribution. It sums up everything we know about the parameters, , after dealing with the data, D. Estimates, parameter uncertainty, derived quantities, decision making and model testing all follows from this. An estimate can be formed using the expectation, median or mode of the posterior distribution. Parameter uncertainty can be described using credibility intervals. A 95% credibility interval (a,b) has the property Pr  |D (a<  <b)=95%. I.e. after the data, you have 95% probability of the parameter having a value inside this interval. The distribution f(D) will turn out to be a (the?) problem.

Bayesian statistics – Pros / Cons Restrictions and insights coded in the prior from the biology can help the inference. Since you need to give a prior, you are actually forced to think about the meaning of your model. For some, Bayesian probabilities make more sense than frequentist ones. You don’t have to take a stance on whether an unknown quantity is fundamentally stochastic or not. You get the parameter uncertainty “for free”. It can give answers where the classical approach has none (such as the occupancy probability conditioned only on the data). You are actually allowed to talk about the probability that a parameter is found in a given interval and the probability of a given zero- hypothesis. (This is often how confidence intervals and p- values are incorrectly interpreted). Understanding the output of a Bayesian analysis is often easier than for frequentist outputs. You *have* to supply a prior. That prior can be criticized. Making a prior that’s hard to criticize is hard. Thinking about the meaning of a model parameter is extra work. For some, frequentist probabilities make more sense than Bayesian ones. Some think distinguishing between parameters and latent variables is important. Sometimes you’re only interested in estimates. Bayesian statistics is subjective (though it can be made inter-subjective with hard work).

Bayesian statistics vs frequentist statistics – the practical issue When the model or analysis complexity is below a certain limit, frequentist methods will be easier while above that threshold, Bayesian analysis is easier. Complexity Work/ effort Bayesian Frequentist

Graphical modelling - occupancy  Latent variables: 11 22 33 ……… AA x 1,1 x 1,2 x 1,3 ……… x 1,n1 p Parameters (  ): Pr(  i =1|  )=  Pr(x i,j =1 |  i =1,  )=p. Pr(x i,j =0 |  i =1,  )=1-p, Pr(x i,j =1 |  i =0,  )=0. Pr(x i,j =0 |  i =0,  )=1 The area occupancies are independent given the occupancy rate. All unknown quantities are now on equal footing. All dependencies are described by conditional probabilities, with marginal probabilities (priors) at the top nodes. Prior: f(  )=1, f(p)=1 ( ,p~U(0,1), i.e. uniform between 0 and 1). Data:

Hyper-parameters  Latent variables: 11 22 33 ……… AA x 1,1 x 1,2 x 1,3 ……… x 1,n1 p Parameters (  ): If your prior has a parametric form, the stuff you put as values in these forms are the hyper-parameters. For instance, the uniform distribution from zero to one is a special case of a uniform prior from a to b. Prior:  ~U(a ,b  ) and p ~U(a p,b p ). Since p and  are rates, we have set a  =a p =0, b  =b p =1. a ,b  a p,b p The hyper-parameters are fixed. They are there to sum up our prior knowledge. If you start doing inference on them, they are parameters, not hyper-parameters. Hyper-parameters: Data: