Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on.

Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference
Ed Stanek And others: Recai Yucel, Julio Singer, and others on the Cluster Team 11/11/2018

Anne Stanek Viviana Lencina Alice Singer Silvia San Martino Wenjun Li
Luz Mery Gonzalas Julio Singer Ed Stanek Maria Lucia Singer 11/11/2018

What is truth?: Predict what? Subsets- sampling Prediction
Outline Example: Dose-response Models in Toxicology- Threshold vs Hormetic Models What is truth?: Predict what? Subsets- sampling Prediction Results on Predictor of Realized Subject True Value Illustration and Dilemma Extension to two-stage problems Missing data framework Conclusions And others: Recai Yucel, Bo Xu, Ruitao Zhang , and others on the Cluster Team 11/11/2018

1. Example: Dose-response Models - Threshold vs Hormetic Models
Yeast data chemicals, 13 yeast strains, 5 doses x 2 replications- Focus on doses below BMD These plots are of hypothetical ‘true’ responses. Response is represented as Percent of Control 100% is the response when the dose=0. Question: Is there evidence of hormesis? The point where the true response drops below 100% is the zero effect point. In practice, a ‘bench mark dose’ is estimated as a dose where the observed response drops below 95%. 11/11/2018

i = chemical J = dose k = replication 11/11/2018
A mixed model is fit to response for doses in the hormetic range. Only 5 doses; Identify BMD(5), (meaning benchmark dose 5%, value where response above is less than (100-5)%=95% , and doses below BMD; When 3 doses below BMD, Predict average response for below BMD range. Results- order predicted response for realized chemicals from low to high Equal resp error, unequal resp error i = chemical J = dose k = replication 11/11/2018

Plot of predicted response for the strain ‘wild type yeast’ for 253 chemicals with 3 doses below a benchmark dose of 95%, using a pooled (equal) response errors based on a mixed model. Black line is expected distribution of mean response if Threshold model held 11/11/2018

11/11/2018 Similar plot of with un-equal respone error.
This was constructed by fitting mixed models to each chemical, and estimating response variance. Which results should be used? Does it depends on whether model has heterogeneous response error? No- theoretically, a derivation with heterogeneous response error pools response error variances. However, in simple example, we can show that better results occur if response error is separated. The theory doesn’t match- we don’t understand the theory for the ‘better results’. Next Steps: Review what we do understand. Keep the context simple. 11/11/2018

2. What is truth? Predict what?
Population, subjects, true response Subject Labels: True Response: Population Parameters Mean: Variance: Subject Deviation: Subjects == chemicals True Response== Average response in hormetic range Need to Define Parameters to represent the problem

Non-Stochastic Model:. Index for response:. Response error:
Non-Stochastic Model: Index for response: Response error: Assume: Response Error Model: For each subject: Response Process: In hormetic range, pick a dose at random Measure response Assumptions (unbiased response error, heteroskedastic) Response Error Model is a stochastic Model Response Error is a random effect Sum of subject effects is zero (over population). Information: (subject label, response) Subsequently, take r=1 (one measure per subject) 11/11/2018

3. Subsets, Sampling Select n of N subjects (a subset, “sample”)
Let all subsets be equally likely: Sample Mean: Note difference with: Select n of N subjects (a subset) Sample is a set (un-ordered) of different subjects. Usually representCommon 11/11/2018

Sample as a Sequence (part of Permutation)
Represent Positions in a Permutation: Assume all Permutations Equally Likely: Define: Sample= positions Sample Mean: The random variable Y(ik) is not clearly defined. Sample is now a sequence (order matters)! 11/11/2018

Population s=2 s=3 Ed s=1 Wenjun Julio 11/11/2018
Population of N=3 subjects. The sample is the first two subjects on the left. s=2 Ed s=3 Wenjun s=1 Julio 11/11/2018

i=1 i=2 i=3 s=2 s=3 s=1 Position in Permutation 11/11/2018
Population of N=3 subjects. Note labels and positions. s=2 s=3 s=1 11/11/2018

i=1 i=2 i=1 i=2 i=3 i=3 s=2 s=1 s=3 s=1 s=3 s=2
Position in Permutation i=1 i=2 i=3 i=3 Different permutation: Ed, Julio, and Wenjun s=2 s=1 s=3 s=1 s=3 s=2 11/11/2018

Different Permutation: Ed, Wenjun, Julio s=3 s=1 s=2 11/11/2018

Different permutation: Julio, Wenjun, Ed s=3 s=2 s=1 11/11/2018

i=1 i=2 i=3 Sample Remainder s=1 s=2 s=3 Position in Permutation |
Different Permutation with Sample and Remainder: Wenjun, Julio, and Ed s=1 s=2 s=3 11/11/2018

i=1 i=2 i=3 Sample Remainder s=2 s=1 s=3 Position in Permutation |
Wenjun, Ed, and Julio (using sample and remainder s=2 s=1 s=3 11/11/2018

Population size (N) is most likely > 3
We only see “n” subjects in the sample For example: Suppose n=3, and N=7 We may see … 11/11/2018

i=1 i=2 i=3 Sample Remainder i=4 i=… s=3 s=4 s=5
| Position in Permutation i=1 i=2 i=3 Sample Remainder Luzmery, Wenjun, and Viviana in sample i=4 i=… s=3 s=4 s=5 11/11/2018

i=1 i=2 i=3 Sample Remainder s=2 s=4 s=7 i=… Position in Permutation |
Viviana, Ed, Silvina, in a sample s=2 s=4 s=7 i=… 11/11/2018

Traditional Sampling Approach
1 2 … N Horvitz-Thompson Estimator: First order inclusion Probabilites= Prob( subject included in a sample) Bold y is a vector of population values. Missing Data Missing Data 11/11/2018

With Response Error Model
Sample Mean Sample is a set Sample is a Sequence U(is) is an indicator variables that has a value of 1 if subject s is in position i To represent positions: 11/11/2018

| Position in Permutation i=1 i=2 i=3 Sample s=1 s=2 s=3 11/11/2018

First Position in Permutation:
Suppose s=1,…,3=N First Position in Permutation: Then: Formal expression of response for Position i=1 in a permutation 11/11/2018

Positions in Sample Sequences
Sample and Remainder representation Remainder 11/11/2018

Basic Random Variables
Sample Remainder Population 11/11/2018

Finite Population Mixed Model
Response Error Model Response Error Model Finite Population Mixed Model Combine response error model with permutation, get mixed model 11/11/2018

Mixed Model Mixed Model 11/11/2018 Alpha = fixed effects
B = Random Effects W* = Response error Note that subscript is POSITION, not SUBJECT 11/11/2018

Properties of Basic Random Variables (N=3)
Sum Expected Value Sum Average Expected Value Average 11/11/2018

Sample Random Variables (n=2)
Sum Expected Value Sum Sum over Rows, get usual random variable, with expected value mu Sum over columns: get random variable with different expected values Expected Value 11/11/2018

Prediction of Mean in a Simple Case: No Response Error (N=3, n=2)
Sample Remainder Note: Criteria: Linear Function of sample Unbiased Smallest Mean Squared Error Need to predict a function of the remainder Called Best Linear Unbiased Predictor (not that we use the term “Predictor” here for a parameter, not a random variable) 11/11/2018

Prediction of Mean No Response Error (N=3, n=2)
Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of a Subject’s Mean in Position i with No Resp
Prediction of a Subject’s Mean in Position i with No Resp. Error (N=3, n=2) Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of a Subject’s Mean in Position i with Response Error
Target Sample Data Realized We predict the un-observed values in the population. Best Linear Unbiased Predictor: 11/11/2018

Prediction of Realized Random Effect – Other Examples
SRS+ Subject Resp. Error SRS+ Position Resp. Error Cluster Sampling: Balanced Return to Basic Question- Which predictor should be use- Common Response Error- Optimal via the theory Allowing K to depend on realized subject- Had smaller MSE D Cluster Sampling: Un-Balanced Similar form, more complicated 11/11/2018

Plot of predicted response for the strain ‘wild type yeast’ for 253 chemicals with 3 doses below a benchmark dose of 95%, using a pooled (equal) response errors based on a mixed model. 11/11/2018

11/11/2018 Plot of with un-equal resp error
Which results should be used? Does it depends on whether model has heterogeneous response error? No- theoretically, a derivation with heterogeneous response error pools response error variances. However, in simple example, we can show that better results occur if response error is separated. The theory doesn’t match- we don’t understand the theory for the ‘better results’. Review what we do understand. Keep the context simple. 11/11/2018

Delimma Pooled Response Error Variance should be used for K (Using theoretical Results) Empirical example illustrates smaller MSE results with K depending on realized Subject -- but no theory! What should we do?.... Is there a ‘gap’ in the framework? 11/11/2018

Basic Sample Random Variables
Sum Usual Modelling Approach (work with right column) Properties of these random variables- exchangeable- Natural lead in to Bayesian Inference Traditional Sampling (and missing data) approach (work with bottom row): Don’t use explicit notation for sample, use inclusion probabilities, Some are missing. Super-population models: Use bottom row, but re-arrange elements so that those in the sample are first. Assume the random variables are exchangeable (like for the right column). Really doesn’t make sense. Sum 11/11/2018

Basic Random Variables
Sample and Remainder What is potentially observable? What is observed? 11/11/2018

Thanks More Work is needed! 11/11/2018
Anne Stanek Viviana Lencina Alice Singer Silvia San Martino Wenjun Li Luz Mery Gonzalas Julio Singer Ed Stanek Maria Lucia Singer 11/11/2018 Thanks

Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on.

Similar presentations

Presentation on theme: "Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on.

Similar presentations

Presentation on theme: "Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference Ed Stanek And others: Recai Yucel, Julio Singer, and others on."— Presentation transcript:

Similar presentations

About project

Feedback