PSY 626: Bayesian Statistics for Psychological Science

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

Model Assessment and Selection

3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.

Chapter 10 Sampling and Sampling Distributions

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University (joint work with Ed Green, Rutgers University)

Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?

The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.

1 Psych 5500/6500 Statistics and Parameters Fall, 2008.

Standard Error of the Mean

Standard Error and Research Methods

The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.

STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Chapter 7 Sample Variability. Those who jump off a bridge in Paris are in Seine. A backward poet writes inverse. A man's home is his castle, in a manor.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.

Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.

Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)

Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,

Monday, September 27 More basics.. _ “Life is a series of samples, you can infer the truth from the samples but you never see the truth.”

Virtual University of Pakistan

Sampling Distributions

Where Are You? Children Adults.

GOVT 201: Statistics for Political Science

PSY 626: Bayesian Statistics for Psychological Science

Bayesian Semi-Parametric Multiple Shrinkage

Information criterion

Effect sizes, power, and violations of hypothesis testing

Measures of Dispersion

Nature of Estimation.

Some General Concepts of Point Estimation

STA 291 Spring 2010 Lecture 12 Dustin Lueker.

ECO 173 Chapter 10: Introduction to Estimation Lecture 5a

Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.

Let’s continue to do a Bayesian analysis

Chapter 6: Sampling Distributions

STATISTICAL INFERENCE PART I POINT ESTIMATION

Distribution of the Sample Means

Let’s do a Bayesian analysis

Simulation-Based Approach for Comparing Two Means

CPSC 531: System Modeling and Simulation

ECO 173 Chapter 10: Introduction to Estimation Lecture 5a

Bias and Variance of the Estimator

PSY 626: Bayesian Statistics for Psychological Science

Section 9.1 Sampling Distributions

Review: What influences confidence intervals?

Arithmetic Mean This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.

Chapter 9.1: Sampling Distributions

PSY 626: Bayesian Statistics for Psychological Science

ANOVA Determining Which Means Differ in Single Factor Models

Linear Model Selection and regularization

PSY 626: Bayesian Statistics for Psychological Science

Sampling Distributions

PSY 626: Bayesian Statistics for Psychological Science

Effect sizes, power, and violations of hypothesis testing

Statistics II: An Overview of Statistics

Carey Williamson Department of Computer Science University of Calgary

Chapter 7: Sampling Distributions

Regression & Correlation (1)

STA 291 Spring 2008 Lecture 13 Dustin Lueker.

Objectives 6.1 Estimating with confidence Statistical confidence

Objectives 6.1 Estimating with confidence Statistical confidence

STA 291 Summer 2008 Lecture 12 Dustin Lueker.

STA 291 Spring 2008 Lecture 12 Dustin Lueker.

Applied Statistics and Probability for Engineers

MGS 3100 Business Analysis Regression Feb 18, 2016

Introduction to Econometrics, 5th edition

Presentation transcript:

PSY 626: Bayesian Statistics for Psychological Science 11/27/2018 Shrinkage Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Estimation Suppose you want to estimate a population mean from a normal distribution The standard approach is to use the sample mean Indeed, the sample mean is an unbiased estimate of the population mean It is the value that minimizes squared error, relative to the sample: It is also the Maximum Likelihood Estimate of the population mean It is hard to do better than this! X3 X2 X4 X1

Estimation Suppose you want to estimate the population mean from a normal distribution with mean μ and standard deviation σ You want to minimize squared error: You cannot do better (on average) than the sample mean For sample size n, the expected squared error will be

Estimation Suppose you want to estimate p=5 population means by minimizing mean squared error You might try using the sample means Suppose the populations all have the same standard deviation but different means μi This would give an MSE of

Sample means Let’s try some simulations Average MSE=0.2456 n=20 for each sample σ=1 (assumed to be known to us) μi selected from a normal distribution N(0, 1) Average MSE=0.2456 You might think this is as good as you can do But you can do rather better

Shrinkage Consider a different (James-Stein) estimate of the population means This estimate produces a population mean estimate closer to 0 than the sample mean

Shrinkage Consider a different (James-Stein) estimate of the population means This estimate produces a population mean estimate closer to 0 than the sample mean (Sometimes just a bit)

Shrinkage Consider a different (James-Stein) estimate of the population means This estimate produces a population mean estimate closer to 0 than the sample mean

Shrinkage Shrinking each sample mean toward 0 decreases (on average) the MSE Average MSE(Sample means)=0.2456 Average MSE (James-Stein)=0.2389 In fact, the James-Stein estimator has a smaller MSE than the sample means for 62% of the simulated cases This is true despite the fact that the sample mean is a better estimate for each individual mean!

Shrinkage You might think the James-Stein estimators work only by shrinking every estimate toward the mean of the population means [μi selected from a normal distribution N(0, 1)] There is an advantage when you have such information, but shrinkage can help (on average) by shrinking toward any value When v is far from the sample means, the denominator tends to be large, so the adjustment is close to 1

Shrinkage Set v=3 Not much change, but still a small drop in MSE

Shrinkage Set v=3 Not much change, but here a small decrease in MSE Average MSE for v=3 is 0.2451 (Compared to 0.2456 for sample means) These estimates beat the sample means in 54% of cases

Shrinkage Set v=30 Not much change, but still a small drop in MSE Average MSE for v=30 is 0.2456 (Compared to 0.2456 for sample means) These estimates beat the sample means in 50.2% of cases

Shrinkage There is a “best” value of v that will minimize MSE, but you do not know it Even a bad choice tends to help (on average), albeit not as much as is possible, and it does not hurt (on average) There is a trade off You have smaller MSE on average You increase MSE for some cases and decrease it for other cases You never know which case you are in These are biased estimates of the population means

Shrinkage These effects have odd philosophical implications If you want to estimate a single mean (e.g., the IQ of psychology majors, mathematics majors, English majors, physics majors, or economics majors) You are best to use the sample mean If you want to jointly estimate the mean IQs of these majors You are best to use the James-Stein estimator Of course, you get different answers for the different questions! Which one is right? There is no definitive answer! What is truth?

Shrinkage It is actually even weirder! If you want to jointly estimate Family income Mean number of windows in apartments in West Lafayette Mean height of chairs in a classroom Mean number of words in economics textbook sentences The speed of light You are better off using a James-Stein estimator than the sample means Even though the measures are unrelated to each other! (You may not get much benefit, because the scales are so very different.) You get different values for different units

Using wisely You get more benefit with more means being estimated together The more complicated your measurement situation, the better you do! Compare to hypothesis testing, where controlling Type I error becomes harder with more comparisons!

Data driven There is a “best” value of v that will minimize MSE, but you do not know it Oftentimes, a good guess is the average of the sample means Called the Morris-Efron estimator Sometimes use (p-3) instead of (p-2)

Morris-Efron estimators Not quite as good as James-Stein when population means come from N(0,1) n=20, p=5 MSE(samples means) = 0.2456 MSE(James-Stein)= 0.2389, beats sample means 62% MSE(Morris-Efron) = 0.2407, beats sample means 59% Better if population means come from N(7,1) MSE(James-Stein)= 0.2454, beats sample means 51%

Why does shrinkage work? MSE can be split up into bias and variance Consider a single mean, i, Expand the square

Bias and variance Apply the expectation to the terms The last term is unaffected by the expectation (it is a constant) In the middle term, the second difference is a constant and can be pulled out, same for the number 2

Bias and variance Apply the expectation to the middle terms The double expectation is just the expectation, so the middle term equals 0 The term on the left is variance of the estimator The term on the right is the square of bias

MSE and shrinkage Sample means are unbiased estimators of population means, so the second term equals 0 MSE is all about variance James-Stein and Morris-Efron estimators are biased estimators, but they have a smaller variance than sample means (because they are shrunk toward a fixed value) The trade-off works only for p>2

Prediction Shrinkage works for prediction too! Suppose you have 5 population means drawn from N(0,1) You take samples of n=20 for each population You want to predict sample means for replication samples, Y Your criterion for the quality of prediction is MSE

Prediction Averaged across 1000 simulations n=20, p=5 MSErep(samples means) = 0.5028 MSErep (James-Stein)= 0.4968, beats sample means 57% MSErep (Morris-Efron) = 0.4988, beats sample means 54% Larger MSE values than previous simulations because there is variability in the initial sample and in the replication sample

Conclusions Shrinkage demonstrates that there is no universally appropriate estimator or predictor How to estimate/predict depends on what you want to estimate/predict Shrinkage can be used in frequentist approaches to statistics Lasso, regularization, ridge regression, hierarchical models, (random effects) meta-analysis Shrinkage falls out naturally from Bayesian approaches to statistics Hierarchical models Priors do not matter very much!