1 When are BLUPs Bad Ed Stanek UMass- Amherst Julio Singer USP- Brazil George ReedUMass- Worc. Wenjun LiUMass- Worc.

Slides:



Advertisements
Similar presentations
Copula Regression By Rahul A. Parsa Drake University &
Advertisements

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Chapter 6 Sampling and Sampling Distributions
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
1 Bernoulli and Binomial Distributions. 2 Bernoulli Random Variables Setting: –finite population –each subject has a categorical response with one of.
Sampling Distributions. Review Random phenomenon Individual outcomes unpredictable Sample space all possible outcomes Probability of an outcome long-run.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
1 What are Mixed Models? Why are they Used? An Introduction Ed Stanek.
Chapter 7 Sampling and Sampling Distributions
SPH&HS, UMASS Amherst 1 Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch Ed Stanek and Julio Singer U of Mass, Amherst, and U of.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
1 Finite Population Inference for the Mean from a Bayesian Perspective Edward J. Stanek III Department of Public Health University of Massachusetts Amherst,
Estimation of parameters. Maximum likelihood What has happened was most likely.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Why sample? Diversity in populations Practicality and cost.
Statistical Inference and Sampling Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
1 Finite Population Inference for Latent Values Measured with Error that Partially Account for Identifable Subjects from a Bayesian Perspective Edward.
Today’s Agenda Review Homework #1 [not posted]
1 Finite Population Inference for Latent Values Measured with Error from a Bayesian Perspective Edward J. Stanek III Department of Public Health University.
Mixed models Various types of models and their relation
1 Sampling Models for the Population Mean Ed Stanek UMASS Amherst.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
1 Introduction to Biostatistics (PUBHLTH 540) Estimating Parameters Which estimator is best? Study possible samples, determine Expected values, bias, variance,
Before doing comparative research with SEM … Prof. Jarosław Górniak Institute of Sociology Jagiellonian University Krakow.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Statistical Methods Descriptive Statistics Inferential Statistics Collecting and describing data. Making decisions based on sample data.
Violations of Assumptions In Least Squares Regression.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Formalizing the Concepts: Simple Random Sampling.
How confident are we that our sample means make sense? Confidence intervals.
CHAPTER 7, the logic of sampling
Chapter 6 Sampling and Sampling Distributions
Sampling error Error that occurs in data due to the errors inherent in sampling from a population –Population: the group of interest (e.g., all students.
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
Random Sampling, Point Estimation and Maximum Likelihood.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Chapter 11 – 1 Chapter 7: Sampling and Sampling Distributions Aims of Sampling Basic Principles of Probability Types of Random Samples Sampling Distributions.
Ka-fu Wong © 2003 Chap 8- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
Introduction to Multilevel Analysis Presented by Vijay Pillai.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Chapter 6 Sampling and Sampling Distributions
Modeling and Simulation CS 313
Sampling Why use sampling? Terms and definitions
Modeling and Simulation CS 313
Goals of Statistics.
Chapter Six Normal Curves and Sampling Probability Distributions
Chapter 7: Sampling Distributions
Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on.
Ed Stanek and Julio Singer
Predictive distributions
OVERVIEW OF LINEAR MODELS
Types of Control I. Measurement Control II. Statistical Control
OVERVIEW OF LINEAR MODELS
Presentation transcript:

1 When are BLUPs Bad Ed Stanek UMass- Amherst Julio Singer USP- Brazil George ReedUMass- Worc. Wenjun LiUMass- Worc.

2 Outline Description of the Problem –Examples Basic Models & Ideas Approaches Results Conclusions/Further Research

3 Problem: Estimate/Predict Cluster Latent Value Example: 11 Middle Schools 8-16 Classrooms/School Question: How much bullying occurs at school? Example: 166 School Districts 6-18 Schools/District Question: Does the District have a sun-protection policy for students? Example: 25,000 Members in an HMO 30 Days/Member before Cholesterol measure Question:What is a Patient’s Ave Sat fat intake (g)?

4 Description of the Problem Clustered Finite Population (cluster=group of subjects) Schools, Clinics, Hospitals, Cities, Neighborhoods, Physician Practices, Families, Litters, States, Studies… Data is available only on some clusters Observational studies, multi-stage samples, group- randomized trials For selected clusters, response is observed on a subset of subjects (possibly with response error). How to Best Estimate/Predict the Cluster’s Latent Value?

5 Basic Models & Ideas Simple Response Error Model for a Subject: School: Student: where

6 Combining terms, the Response Error Model: becomes where ith school effect jth student effect kth response error

7 Model Properties Over Schools Over Students The Latent Value of the School in the ith position:

8 Approaches to Estimate/Predict Latent Value 1. Fixed Models 2. Mixed Models 3. Super-population Models 4. Random Permutation Models

9 1. Fixed Effect Models: (Random Sample of Students) Latent Cluster Mean: (i.e. School “s”) OLS Estimate:

10 2. Mixed Models To predict Solution: BLUP= where

11 3. Super-population Models Super-populationFinite Population (Realization)Latent Value

12 Breaking Up the Latent Value Parameter where

13 Superpopulation Model Prediction Approach (Scott and Smith, 1969) Sample: Remainder: BLUP:

14 4. Random Permutation Model Predictors Finite Population: 2-Stage Permutation: To Predict : BLUP where

15 Approaches 1. Fixed: 2. Mixed: 3. Superpop. 4. RP Model

16 If variance components are known, then the RP Model Predictor is “best”- (i.e. has small expected MSE) How much better depends on: Sampling fraction: Cluster-intra-class correlation: Unit- Reliability:

17 Figure 1a. Percent Increase in Expected MSE for Mixed Model (──) and Scott and Smith Model (- - - ) Predictors Relative to the Random Permutation Model Predictors of the Latent Value of a Realized Sample PSU

18 Figure 1b. Percent Increase in Expected MSE for Mixed Model (──) and Scott and Smith Model (- - - ) Predictors Relative to the Random Permutation Model Predictors of the Latent Value of a Realized Sample PSU

19 Figure 1c. Percent Increase in Expected MSE for Mixed Model (──) and Scott and Smith Model (- - - ) Predictors Relative to the Random Permutation Model Predictors of the Latent Value of a Realized Sample PSU

20 What happens when variances must be estimated? Are BLUPs still best? Not always.

21 Simulation Study: Two-stage finite population sampling Settings: N, M, n, m Variance Components: Distribution of Clusters: normal, uniform, beta, gamma Unit Variance Prop Cluster Mean: yes/no

22 Simulation Results (No response Error) N=3 N=5 N=10 N=25 N=50 N=100 N=500 N=1000

23 Simulation Results (with Response Error) N=100, n=10, M=25, m=10 ClustersUnits Normal Gamma(2)Normal

24 Summary and Conclusions eBLUPs can have higher MSE than the Cluster Ave Small #s of sample clusters (n<5) RP predictors (while theoretically best), may be worse than MM empirical predictors, or Super- population predictors. Properties don’t seem to depend on distributions. Simulations are limited- fall short of providing guidelines

25 Thanks