#21 Marginalize vs. Condition Uninteresting Fitted Parameters

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Pattern Recognition and Machine Learning
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
The Simple Linear Regression Model: Specification and Estimation
Point estimation, interval estimation
Maximum likelihood (ML)
Parametric Inference.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Maximum likelihood (ML)
Lecture II-2: Probability Review
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Estimation in Marginal Models (GEE and Robust Estimation)
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Estimating standard error using bootstrap
Data Modeling Patrice Koehl Department of Biological Sciences
The Maximum Likelihood Method
Measurement, Quantification and Analysis
Chapter 3: Maximum-Likelihood Parameter Estimation
STATISTICS POINT ESTIMATION
Probability Theory and Parameter Estimation I
Stat 31, Section 1, Last Time Sampling Distributions
The Simple Linear Regression Model: Specification and Estimation
Model Inference and Averaging
CH 5: Multivariate Methods
The Maximum Likelihood Method
Special Topics In Scientific Computing
Statistics in Applied Science and Technology
The Maximum Likelihood Method
More about Posterior Distributions
Modelling data and curve fitting
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Pattern Recognition and Machine Learning
OVERVIEW OF LINEAR MODELS
Computing and Statistical Data Analysis / Stat 7
Principles of the Global Positioning System Lecture 11
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Learning From Observed Data
#22 Uncertainty of Derived Parameters
#10 The Central Limit Theorem
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Applied Statistics and Probability for Engineers
Modeling Ordinal Associations Bin Hu
Testing Causal Hypotheses
Probabilistic Surrogate Models
Presentation transcript:

#21 Marginalize vs. Condition Uninteresting Fitted Parameters Opinionated Lessons in Statistics by Bill Press #21 Marginalize vs. Condition Uninteresting Fitted Parameters Professor William H. Press, Department of Computer Science, the University of Texas at Austin

submatrix of interesting rows and columns is new We can Marginalize or Condition uninteresting parameters. (Different things!) Marginalize: (this is usual) Ignore (integrate over) uninteresting parameters. In submatrix of interesting rows and columns is new Special case of one variable at a time: Just take diagonal components in Covariances are pairwise expectations and don’t depend on whether other parameters are “interesting” or not. Condition: (this is rare!) Fix uninteresting parameters at specified values. In submatrix of interesting rows and columns is new Take matrix inverse if you want their covariance (If you fix uninteresting parameters at any value other than b0, the mean also shifts – exercise for reader to calculate, or see Wikipedia “Multivariate Normal Distribution”.) Professor William H. Press, Department of Computer Science, the University of Texas at Austin

Example of 2 dimensions marginalizing or conditioning to 1 dimension: By the way, don’t confuse the “covariance matrix of the fitted parameters” with the “covariance matrix of the data”. For example, the data covariance is often diagonal (uncorrelated si’s), while the parameters covariance is essentially never diagonal! If the data has correlated errors, then the starting point for c2(b) is (recall): instead of Professor William H. Press, Department of Computer Science, the University of Texas at Austin

For our example, we are conditioning or marginalizing from 5 to 2 dims: y ( x j b ) = 1 e p ¡ 2 + 3 µ 4 5 ¶ the uncertainties on b3 and b5 jointly (as error ellipses) are sigcond = 0.0044 -0.0076 -0.0076 0.0357 sigmarg = 0.0049 -0.0094 -0.0094 0.0948 Conditioned errors are always smaller, but are useful only if you can find other ways to measure (accurately) the parameters that you want to condition on. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

Consistency: converges to true value of the parameters Frequentists love MLE estimates (and not just the case with a Normal error model) because they have provably nice properties asymptotically as the size of the data set becomes large Consistency: converges to true value of the parameters Equivariance: estimate of function of parameter = function of estimate of parameter asymptotically Normal asymptotically efficient (optimal): among estimators with the above properties, it has the smallest variance The “Fisher Information Matrix” is another name for the Hessian of the log probability (or, rather, log likelihood): except that, strictly speaking, it is an expectation over the population Bayesians tolerate MLE estimates because they are almost Bayesian – even better if you put the prior back into the minimization. But Bayesians know that we live in a non-asymptotic world: none of the above properties are exactly true for finite data sets! Professor William H. Press, Department of Computer Science, the University of Texas at Austin

But which Dc2 contour contains 95% of the probability? Small digression: You can give confidence intervals or regions, instead of (co-)variances The variances of one parameter at a time imply confidence intervals as for an ordinary 1-dimensional normal distribution: (Remember to take the square root of the variances to get the standard deviations!) If you want to give confidence regions for more than one parameter at a time, you have to decide on a shape, since any shape containing 95% (or whatever) of the probability is a 95% confidence region! It is conventional to use contours of probability density as the shapes (= contours of Dc2) since these are maximally compact. But which Dc2 contour contains 95% of the probability? Professor William H. Press, Department of Computer Science, the University of Texas at Austin

What Dc2 contour in n dimensions contains some percentile probability? Rotate and scale the covariance to make it spherical. Contours still contain same probability. (In equations, this would be another “Cholesky thing”.) Now, each dimension is an independent Normal, and contours are labeled by radius squared (sum of n individual t2 values), so Dc2 ~ Chisquare(n) You sometimes learn “facts” like: “delta chi-square of 1 is the 68% confidence level”. We now see that this is true only for one parameter at a time. i.e., radius Professor William H. Press, Department of Computer Science, the University of Texas at Austin