Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
The Simple Regression Model
Linear Regression.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Lecture 6 Outline – Thur. Jan. 29
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 4: Mathematical Tools for Econometrics Statistical Appendix (Chapter 3.1–3.2)
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Chapter 10 Simple Regression.
1 When are BLUPs Bad Ed Stanek UMass- Amherst Julio Singer USP- Brazil George ReedUMass- Worc. Wenjun LiUMass- Worc.
SPH&HS, UMASS Amherst 1 Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch Ed Stanek and Julio Singer U of Mass, Amherst, and U of.
1 Finite Population Inference for the Mean from a Bayesian Perspective Edward J. Stanek III Department of Public Health University of Massachusetts Amherst,
1 Finite Population Inference for Latent Values Measured with Error that Partially Account for Identifable Subjects from a Bayesian Perspective Edward.
1 Finite Population Inference for Latent Values Measured with Error from a Bayesian Perspective Edward J. Stanek III Department of Public Health University.
Statistical Background
1 Sampling Models for the Population Mean Ed Stanek UMASS Amherst.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Chapter 6: Sampling Distributions
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-1 Basic Mathematical tools Today, we will review some basic mathematical tools. Then we.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 6: Sampling Distributions
Inference about the slope parameter and correlation
The simple linear regression model and parameter estimation
PSY 626: Bayesian Statistics for Psychological Science
Chapter 7. Classification and Prediction
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Chapter 4: Sampling and Statistical Inference
Inference: Conclusion with Confidence
Chapter 11: Simple Linear Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Sampling Distributions
Hypothesis Testing: Hypotheses
Clustering Evaluation The EM Algorithm
PSY 626: Bayesian Statistics for Psychological Science
Making Statistical Inferences
Ed Stanek and Julio Singer
Single-Factor Studies
Regression Models - Introduction
Single-Factor Studies
Chapter 5 Sampling Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Introduction to Predictive Modeling
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Daniela Stan Raicu School of CTI, DePaul University
OVERVIEW OF LINEAR MODELS
Learning Theory Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Chapter 8: Estimating With Confidence
CH2 Time series.
Applied Statistics and Probability for Engineers
Regression Models - Introduction
Presentation transcript:

Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on the Cluster Team 11/11/2018

Anne Stanek Viviana Lencina Alice Singer Silvia San Martino Wenjun Li Luz Mery Gonzalas Julio Singer Ed Stanek Maria Lucia Singer 11/11/2018

What is truth?: Predict what? Subsets- sampling Prediction Outline Example: Dose-response Models in Toxicology- Threshold vs Hormetic Models What is truth?: Predict what? Subsets- sampling Prediction Results on Predictor of Realized Subject True Value Illustration and Dilemma Extension to two-stage problems Missing data framework Conclusions And others: Recai Yucel, Bo Xu, Ruitao Zhang , and others on the Cluster Team 11/11/2018

1. Example: Dose-response Models - Threshold vs Hormetic Models Yeast data- 2189 chemicals, 13 yeast strains, 5 doses x 2 replications- Focus on doses below BMD These plots are of hypothetical ‘true’ responses. Response is represented as Percent of Control 100% is the response when the dose=0. Question: Is there evidence of hormesis? The point where the true response drops below 100% is the zero effect point. In practice, a ‘bench mark dose’ is estimated as a dose where the observed response drops below 95%. 11/11/2018

i = chemical J = dose k = replication 11/11/2018 A mixed model is fit to response for doses in the hormetic range. Only 5 doses; Identify BMD(5), (meaning benchmark dose 5%, value where response above is less than (100-5)%=95% , and doses below BMD; When 3 doses below BMD, Predict average response for below BMD range. Results- order predicted response for realized chemicals from low to high Equal resp error, unequal resp error i = chemical J = dose k = replication 11/11/2018

Plot of predicted response for the strain ‘wild type yeast’ for 253 chemicals with 3 doses below a benchmark dose of 95%, using a pooled (equal) response errors based on a mixed model. Black line is expected distribution of mean response if Threshold model held 11/11/2018

11/11/2018 Similar plot of with un-equal respone error. This was constructed by fitting mixed models to each chemical, and estimating response variance. Which results should be used? Does it depends on whether model has heterogeneous response error? No- theoretically, a derivation with heterogeneous response error pools response error variances. However, in simple example, we can show that better results occur if response error is separated. The theory doesn’t match- we don’t understand the theory for the ‘better results’. Next Steps: Review what we do understand. Keep the context simple. 11/11/2018

2. What is truth? Predict what? Population, subjects, true response Subject Labels: True Response: Population Parameters Mean: Variance: Subject Deviation: Subjects == chemicals True Response== Average response in hormetic range Need to Define Parameters to represent the problem

Non-Stochastic Model:. Index for response:. Response error: Non-Stochastic Model: Index for response: Response error: Assume: Response Error Model: For each subject: Response Process: In hormetic range, pick a dose at random Measure response Assumptions (unbiased response error, heteroskedastic) Response Error Model is a stochastic Model Response Error is a random effect Sum of subject effects is zero (over population). Information: (subject label, response) Subsequently, take r=1 (one measure per subject) 11/11/2018

3. Subsets, Sampling Select n of N subjects (a subset, “sample”) Let all subsets be equally likely: Sample Mean: Note difference with: Select n of N subjects (a subset) Sample is a set (un-ordered) of different subjects. Usually representCommon 11/11/2018

Sample as a Sequence (part of Permutation) Represent Positions in a Permutation: Assume all Permutations Equally Likely: Define: Sample= positions Sample Mean: The random variable Y(ik) is not clearly defined. Sample is now a sequence (order matters)! 11/11/2018

Population s=2 s=3 Ed s=1 Wenjun Julio 11/11/2018 Population of N=3 subjects. The sample is the first two subjects on the left. s=2 Ed s=3 Wenjun s=1 Julio 11/11/2018

i=1 i=2 i=3 s=2 s=3 s=1 Position in Permutation 11/11/2018 Population of N=3 subjects. Note labels and positions. s=2 s=3 s=1 11/11/2018

i=1 i=2 i=1 i=2 i=3 i=3 s=2 s=1 s=3 s=1 s=3 s=2 Position in Permutation i=1 i=2 i=3 i=3 Different permutation: Ed, Julio, and Wenjun s=2 s=1 s=3 s=1 s=3 s=2 11/11/2018

i=1 i=2 i=3 s=3 s=1 s=2 Position in Permutation 11/11/2018 Different Permutation: Ed, Wenjun, Julio s=3 s=1 s=2 11/11/2018

i=1 i=2 i=3 s=3 s=2 s=1 Position in Permutation 11/11/2018 Different permutation: Julio, Wenjun, Ed s=3 s=2 s=1 11/11/2018

i=1 i=2 i=3 Sample Remainder s=1 s=2 s=3 Position in Permutation | Different Permutation with Sample and Remainder: Wenjun, Julio, and Ed s=1 s=2 s=3 11/11/2018

i=1 i=2 i=3 Sample Remainder s=2 s=1 s=3 Position in Permutation | Wenjun, Ed, and Julio (using sample and remainder s=2 s=1 s=3 11/11/2018

Population size (N) is most likely > 3 We only see “n” subjects in the sample For example: Suppose n=3, and N=7 We may see … 11/11/2018

i=1 i=2 i=3 Sample Remainder i=4 i=… s=3 s=4 s=5 | Position in Permutation i=1 i=2 i=3 Sample Remainder Luzmery, Wenjun, and Viviana in sample i=4 i=… s=3 s=4 s=5 11/11/2018

i=1 i=2 i=3 Sample Remainder s=2 s=4 s=7 i=… Position in Permutation | Viviana, Ed, Silvina, in a sample s=2 s=4 s=7 i=… 11/11/2018

Traditional Sampling Approach 1 2 … N Horvitz-Thompson Estimator: First order inclusion Probabilites= Prob( subject included in a sample) Bold y is a vector of population values. Missing Data Missing Data 11/11/2018

With Response Error Model Sample Mean Sample is a set Sample is a Sequence U(is) is an indicator variables that has a value of 1 if subject s is in position i To represent positions: 11/11/2018

| Position in Permutation i=1 i=2 i=3 Sample s=1 s=2 s=3 11/11/2018

First Position in Permutation: Suppose s=1,…,3=N First Position in Permutation: Then: Formal expression of response for Position i=1 in a permutation 11/11/2018

Positions in Sample Sequences Sample and Remainder representation Remainder 11/11/2018

Basic Random Variables Sample Remainder Population 11/11/2018

Finite Population Mixed Model Response Error Model Response Error Model Finite Population Mixed Model Combine response error model with permutation, get mixed model 11/11/2018

Mixed Model Mixed Model 11/11/2018 Alpha = fixed effects B = Random Effects W* = Response error Note that subscript is POSITION, not SUBJECT 11/11/2018

Properties of Basic Random Variables (N=3) Sum Expected Value Sum Average Expected Value Average 11/11/2018

Sample Random Variables (n=2) Sum Expected Value Sum Sum over Rows, get usual random variable, with expected value mu Sum over columns: get random variable with different expected values Expected Value 11/11/2018

Prediction of Mean in a Simple Case: No Response Error (N=3, n=2) Sample Remainder Note: Criteria: Linear Function of sample Unbiased Smallest Mean Squared Error Need to predict a function of the remainder Called Best Linear Unbiased Predictor (not that we use the term “Predictor” here for a parameter, not a random variable) 11/11/2018

Prediction of Mean No Response Error (N=3, n=2) Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of a Subject’s Mean in Position i with No Resp Prediction of a Subject’s Mean in Position i with No Resp. Error (N=3, n=2) Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of a Subject’s Mean in Position i with Response Error Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of Realized Random Effect – Other Examples SRS+ Subject Resp. Error SRS+ Position Resp. Error Cluster Sampling: Balanced Return to Basic Question- Which predictor should be use- Common Response Error- Optimal via the theory Allowing K to depend on realized subject- Had smaller MSE D Cluster Sampling: Un-Balanced Similar form, more complicated 11/11/2018

Plot of predicted response for the strain ‘wild type yeast’ for 253 chemicals with 3 doses below a benchmark dose of 95%, using a pooled (equal) response errors based on a mixed model. 11/11/2018

11/11/2018 Plot of with un-equal resp error Which results should be used? Does it depends on whether model has heterogeneous response error? No- theoretically, a derivation with heterogeneous response error pools response error variances. However, in simple example, we can show that better results occur if response error is separated. The theory doesn’t match- we don’t understand the theory for the ‘better results’. Review what we do understand. Keep the context simple. 11/11/2018

Delimma Pooled Response Error Variance should be used for K (Using theoretical Results) Empirical example illustrates smaller MSE results with K depending on realized Subject -- but no theory! What should we do?.... Is there a ‘gap’ in the framework? 11/11/2018

Basic Sample Random Variables Sum Usual Modelling Approach (work with right column) Properties of these random variables- exchangeable- Natural lead in to Bayesian Inference Traditional Sampling (and missing data) approach (work with bottom row): Don’t use explicit notation for sample, use inclusion probabilities, Some are missing. Super-population models: Use bottom row, but re-arrange elements so that those in the sample are first. Assume the random variables are exchangeable (like for the right column). Really doesn’t make sense. Sum 11/11/2018

Basic Random Variables Sample and Remainder What is potentially observable? What is observed? 11/11/2018

Thanks More Work is needed! 11/11/2018 Anne Stanek Viviana Lencina Alice Singer Silvia San Martino Wenjun Li Luz Mery Gonzalas Julio Singer Ed Stanek Maria Lucia Singer 11/11/2018 Thanks