Tutorial on Bayesian Techniques for Inference A.Asensio Ramos Instituto de Astrofísica de Canarias.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France Bayesian inference.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Bayesian models for fMRI data
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Biointelligence Laboratory, Seoul National University
Kriging.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Learning from spectropolarimetric observations A. Asensio Ramos Instituto de Astrofísica de Canarias aasensio.github.io/blog.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Visual Recognition Tutorial
Chapter 10 Simple Regression.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Sampling Distributions
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Visual Recognition Tutorial
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
Thanks to Nir Friedman, HU
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Contraining clumpy dusty torus models using optimized filter sets A.Asensio Ramos C. Ramos Almeida Instituto de Astrofísica de Canarias Torus Workshop.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
PATTERN RECOGNITION AND MACHINE LEARNING
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
INTRODUCTION TO Machine Learning 3rd Edition
BCS547 Neural Decoding.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Bayesian inference J. Daunizeau
CS639: Data Management for Data Science
Presentation transcript:

Tutorial on Bayesian Techniques for Inference A.Asensio Ramos Instituto de Astrofísica de Canarias

Outline General introduction The Bayesian approach to inference Examples Conclusions

The Big Picture Predictions Observation Data Hypothesis testing Parameter estimation Testable Hypothesis (theory) Statistical Inference Deductive Inference

The Big Picture Available information is always incomplete Our knowledge of nature is necessarily probabilistic Cox & Jaynes demonstrated that probability calculus fulfilling the rules can be used to do statistical inference

Probabilistic inference H 1, H 2, H 3, …., H n are hypothesis that we want to test The Bayesian way is to estimate p(H i |…) and select depending on the comparison of their probabilities What are the p(H i |…)??? But…

What is probability? (Frequentist) In frequentist approach, probability describes “randomness” If we carry out the experiment many times, which is the distribution of events (frequentist)  p(x) is the histogram of random variable x

What is probability? (Bayesian) In Bayesian approach, probability describes “uncertainty” p(x) gives how probability is distributed among the possible choice of x We observe this value Everything can be a random variable as we will see later

Bayes theorem It is trivially derived from the product rule Hi  proposition asserting the truth of a hypothesis I  proposition representing prior information D  proposition representing data

Bayes theorem - Example Model M 1 predicts a star at d=100 ly Model M 2 predicts a star at d=200 ly Uncertainty in measurement is Gaussian with  =40 ly Measured distance is d=120 ly Likelihood Posteriors

Bayes theorem – Another example 1.4% false negative (98.6% reliability) 2.3% false positive

Bayes theorem – Another example H  you have the disease H  you don’t have the disease D 1  your test is positive You take the test and you get it positive. What is the probability that you have the disease if the incidence is 1:10000?

Bayes theorem – Another example

What is usually known as inversion All inversion methods work by adjusting the parameters of the model with the aim of minimizing a merit function that compares observations with the synthesis from the model One proposes a model to explain observations Least-squares solution (maximum-likelihood) is the solution to the inversion problem

Defects of standard inversion codes Solution is given as a set of model parameters (max. likelihood) Not necessary the optimal solution Sensitive to noise Error bars or confidence regions are scarce Gaussian errors Not easy to propagate errors Ambiguities, degeneracies, correlations are not detected Assumptions are not explicit Cannot compare models

Inversion as a probabilistic inference problem Observations Parameter 1Parameter 2Parameter 3 ModelNoise Likelihood Prior Evidence Posterior Use Bayes theorem to propagate information from data to our final state of knowledge

Priors Typical priors Top-hat function (flat prior) ii  max  min Gaussian prior (we know some values are more probable than others) ii Assuming statistical independence for all parameters the total prior can be calculated as Contain information about model parameters that we know before presenting the data

Likelihood Assuming normal (gaussian) noise, the likelihood can be calculated as where the  2 function is defined as usual In this case, the  2 function is specific for the the case of Stokes profiles

Visual example of Bayesian inference

Advantages of Bayesian approach “Best fit” values of parameters are e.g., mode/median of the posterior Uncertainties are credible regions of the posterior Correlation between variables of the model are captured Generalized error propagation (not only Gaussian and including correl.) Integration over nuissance parameters (marginalization)

Bayesian inference – an example Hinode

Beautiful posterior distributions Field strengthField inclination Field azimuthFilling factor

Not so beautiful posterior distributions - degeneracies Field inclination

Inversion with local stray-light – be careful  i is the variance of the numerator But… what happens if we propose a model like Orozco Suárez et al. (2007) with a stray-light contamination obtained from a local average on the surrounding pixels From observations

Variance becomes dependent on stray-light contamination It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels

Spatial correlations: use global stray-light It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels If M   correlations tend to zero 

Spatial correlations

Lesson: use global stray-light contamination

But… the most general inversion method is… Observations Model 1 Model 2 Model 3 Model 4 Model 5

Model comparison Choose among the selected models the one that is preferred by the data Posterior for model M i Model likelihood is just the evidence 

Model comparison (compare evidences)

Model comparison – a worked example H 0 : simple Gaussian H 1 : two Gaussians of equal width but unknown amplitude ratio

H 0 : simple Gaussian H 1 : two Gaussians of equal width but unknown amplitude ratio Model comparison – a worked example

Model H 1 is 9.2 times more probable

Model comparison – an example Model 1 1 magnetic component Model 2 1 magnetic+1 non-magnetic component Model 3 2 magnetic components Model 4 2 magnetic components with (v 2 =0, a 2 =0)

Model comparison – an example Model 1 1 magnetic component 9 free parameters Model 2 1 magnetic+1 non-magnetic component 17 free parameters Model 3 2 magnetic components 20 free parameters Model 4 2 magnetic components with (v 2 =0, a 2 =0) 18 free parameters Model 2 is preferred by the data “Best fit with the smallest number of parameters”

Model averaging. One step further Models {M i, i=1..N} have a common subset of parameters  of interest but each model depends on a different set of parameters  or have different priors over these parameters What all models have to say about parameters  All of them give a “weighted vote” Posterior for  including all models

Model averaging – an example

Hierarchical models In the Bayesian approach, everything can be considered a random variable DATA MODELLIKELIHOOD MARGINALIZATION NUISANCE PAR. PRIOR INFERENCE PRIOR PAR.

Hierarchical models In the Bayesian approach, everything can be considered a random variable DATA MODELLIKELIHOOD MARGINALIZATION NUISANCE PAR. PRIOR INFERENCE PRIOR PAR. PRIOR

Bayesian Weak-field Bayes theorem Advantage: everything is close to analytic

Bayesian Weak-field – Hierarchical priors Priors depend on some hyperparameters over which we can again set priors and marginalize them

Bayesian Weak-field - Data IMaX data

Bayesian Weak-field - Posteriors Joint posteriors

Bayesian Weak-field - Posteriors Marginal posteriors

Hierarchical priors - Distribution of longitudinal B

Hierarchical priors – Distribution of longitudinal B We want to infer the distribution of longitudinal B from many observed pixels taking into account uncertainties Parameterize the distribution in terms of a vector  Mean+variance if Gaussian Height of bins if general

Hierarchical priors – Distribution of longitudinal B

We generate N synthetic profiles with noise with longitudinal field sampled from a Gaussian distribution with standard deviation 25 Mx cm -2

Hierarchical priors – Distribution of any quantity

Bayesian image deconvolution

PSF blurring using linear expansion Image is sparse in any basis Maximum-likelihood solution (phase-diversity, MOMFBD,…)

Inference in a Bayesian framework Solution is given as a probability over model parameters Error bars or confidence regions can be easily obtained, including correlations, degeneracies, etc. Assumptions are explicit on prior distributions Model comparison and model averaging is easily accomplished Hierarchical model is powerful for extracting information from data

Hinode data Continuum Total polarization Asensio Ramos (2009) Observations of Lites et al. (2008)

How much information? – Kullback-Leibler divergence Field strength (37% larger than 1) Field inclination (34% larger than 1) Measures “distance” between posterior and prior distributions

Posteriors Field strength Field azimuth Field inclination Stray-light

Field inclination – Obvious conclusion Linear polarization is fundamental to obtain reliable inclinations

Field inclination – Quasi-isotropic Isotropic field Our prior

Field inclination – Quasi-isotropic

Representation Marginal distribution for each parameter Sample N values from the posterior and all values are compatible with observations

Field strength – Representation All maps compatible with observations!!!

Field inclination All maps compatible with observations!!!

In a galaxy far far away… (the future) RAW DATA POSTERIOR+ MARGINALIZATION NON-IMPORTANT PARAMETERS INFERENCE INSTRUMENTS WITH SYSTEMATICS PRIORS MODELPRIORS

Conclusions Inversion is not an easy task and has to be considered as a probabilistic inference problem Bayesian theory gives us the tools for inference Expand our view of inversion as a model comparison/averaging problem (no model is the absolute truth!)

Thank you and be Bayesian, my friend!