Term 4, 2006BIO656--Multilevel Models 1 PROJECTS ARE DUE By midnight, Friday, May 19 th Electronic submission only to Please.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

Multiple Regression Analysis
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
The Simple Regression Model
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Model Assessment, Selection and Averaging
Multiple regression analysis
Lecture 23: Tues., Dec. 2 Today: Thursday:
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 7 Sampling and Sampling Distributions
Topic 2: Statistical Concepts and Market Returns
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Chapter 7 Estimation: Single Population
Inferences About Process Quality
Today Today: Chapter 8, start Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Lecture 3 Properties of Summary Statistics: Sampling Distribution.
Measures of Variability: Range, Variance, and Standard Deviation
THE WEIGHTING GAME Ciprian M. Crainiceanu Thomas A. Louis Department of Biostatistics
Business Statistics: Communicating with Numbers
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Correlation & Regression
Inference for regression - Simple linear regression
Chapter 7 Estimation: Single Population
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Guide to Handling Missing Information Contacting researchers Algebraic recalculations, conversions and approximations Imputation method (substituting missing.
Chapter 7 Statistical Inference: Confidence Intervals
Lecture 14 Sections 7.1 – 7.2 Objectives:
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
PARAMETRIC STATISTICAL INFERENCE
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Estimation (Point Estimation)
Term 4, 2006BIO656--Multilevel Models 1 Midterm Open “book” and notes; closed mouth minutes to read carefully and answer completely  60 minutes.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
STA291 Statistical Methods Lecture 18. Last time… Confidence intervals for proportions. Suppose we survey likely voters and ask if they plan to vote for.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Linear Regression Hypothesis testing and Estimation.
Chapter 6 Sampling and Sampling Distributions
Inference about the slope parameter and correlation
Chapter 4 Basic Estimation Techniques
Some General Concepts of Point Estimation
Hypothesis testing using contrasts
Psychology 202a Advanced Psychological Statistics
Sampling Distribution
Sampling Distribution
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
Presentation transcript:

Term 4, 2006BIO656--Multilevel Models 1 PROJECTS ARE DUE By midnight, Friday, May 19 th Electronic submission only to Please name the file: [myname]-project.[filetype] or [name1_name2]-project.[filetype]

Term 4, 2006BIO656--Multilevel Models 2 Efficiency-Robustness Trade-offs First, we consider alternatives to the Gaussian distribution for random effects Then, we move to issues of weighting, starting with some formalism Then, move to an example of informative sample size And, finally give a basic example that has broad implications of choosing among weighting schemes

Term 4, 2006BIO656--Multilevel Models 3 Alternatives to the Gaussian Distribution for Random Effects

Term 4, 2006BIO656--Multilevel Models 4 The t-distribution Broader tails than the Gaussian So, shrinks less for deviant Y-values The t-prior allows “outlying” parameters and so a deviant Y is not so indicative of a large, level 1 residual

Term 4, 2006BIO656--Multilevel Models 5 Creating a t-distribution Assume a Gaussian sampling distribution, Using the sample standard deviation produces the t-distribution Z is t with a large df t 3 is the most different from Z for t-distributions with a finite variance

Term 4, 2006BIO656--Multilevel Models 6

Term 4, 2006BIO656--Multilevel Models 7 With a t-prior, B is B(Y), increasing with |Y -  |

Term 4, 2006BIO656--Multilevel Models 8 Z is distance from the center (1-B) = ½ = 0.50

Term 4, 2006BIO656--Multilevel Models 9 Z is distance from the center (1- B) = 2/3 = 0.666

Term 4, 2006BIO656--Multilevel Models 10 Estimated Gaussian & Fully Non-parametric priors Estimated Gaussian & Fully Non-parametric priors for the USRDS data

Term 4, 2006BIO656--Multilevel Models 11 USRDS estimated Priors

Term 4, 2006BIO656--Multilevel Models 12

Term 4, 2006BIO656--Multilevel Models 13

Term 4, 2006BIO656--Multilevel Models 14

Term 4, 2006BIO656--Multilevel Models 15

Term 4, 2006BIO656--Multilevel Models 16

Term 4, 2006BIO656--Multilevel Models 17 Informative Sample Size (Similar to informative Censoring) Informative Sample Size (Similar to informative Censoring) See Louis et al. SMMR 2006

Term 4, 2006BIO656--Multilevel Models 18

Term 4, 2006BIO656--Multilevel Models 19

Term 4, 2006BIO656--Multilevel Models 20

Term 4, 2006BIO656--Multilevel Models 21

Term 4, 2006BIO656--Multilevel Models 22

Term 4, 2006BIO656--Multilevel Models 23

Term 4, 2006BIO656--Multilevel Models 24

Term 4, 2006BIO656--Multilevel Models 25 Choosing among weighting schemes Choosing among weighting schemes “Optimality” versus goal achievement

Term 4, 2006BIO656--Multilevel Models 26 Inferential Context Question What is the average length of in-hospital stay? A more specific question What is the average length of stay for: –Several hospitals of interest? –Maryland hospitals? –All hospitals? –

Term 4, 2006BIO656--Multilevel Models 27 “Data” Collection & Goal Data gathered from 5 hospitals Hospitals are selected by some method n hosp patient records are sampled at random Length of stay (LOS) is recorded Goal is to: Estimate the “population” mean

Term 4, 2006BIO656--Multilevel Models 28 Procedure Compute hospital-specific means “Average” them –For simplicity assume that the population variance is known and the same for all hospitals How should we compute the average? Need a goal and then a good/best way to combine information

Term 4, 2006BIO656--Multilevel Models 29 “DATA” Hospital # sampled n hosp Hospital size % of Total size: 100  hosp Mean LOS Within- hospital variance  2 /  2 /  2 /  2 /  2 /15 Total ??

Term 4, 2006BIO656--Multilevel Models 30 Weighted averages & Variances Weighted averages & Variances (Variances are based on FE not RE) Weighting approach Weights x100 MeanVariance Ratio 100*(Var/min) Equal Proportional to Reciprocal variance Population  hosp Each weighted average is mean = Reciprocal variance weights minimize variance Is that our goal?

Term 4, 2006BIO656--Multilevel Models 31 There are many weighting choices and weighting goals Minimize variance by using reciprocal variance weights Minimize bias for the population mean by using population weights (“survey weights”) Use policy weights (e.g., equal weighting) Use “my weights,”...

Term 4, 2006BIO656--Multilevel Models 32 General Setting When the model is correct All weighting schemes estimate the same quantities – same value for slopes in a multiple regression So, it is clearly best to minimize variance by using reciprocal variance weights When the model is incorrect Must consider analysis goals and use appropriate weights Of course, it is generally true that our model is not correct!

Term 4, 2006BIO656--Multilevel Models 33 Weights and their properties But if  1 =  2 =  3 =  4 =  5 =  then all weighted averages estimate the population mean:    k  k So, it’s best to minimize the variance But, if the hospital-specific  k are not all equal, then Each set of weights estimates a different target Minimizing variance might not be “best” For an unbiased estimate of set  w k =  k

Term 4, 2006BIO656--Multilevel Models 34 The variance-bias tradeoff General idea Trade-off variance & bias to produce low Mean Squared Error (MSE) (Estimate - True) 2 MSE = Expected(Estimate - True) 2 Variance + (Bias) 2 = Variance + (Bias) 2 Bias is unknown unless we know the  k (the true hospital-specific mean LOS) But, we can study MSE ( , w,  ) In practice, make some “guesses” and do sensitivity analyses

Term 4, 2006BIO656--Multilevel Models 35 Variance, Bias and MSE as a function of (the  s, w,  ) Consider a true value for the variation of the between hospital means (  * is the “overall mean”) T =  (  k -  * ) 2 Study BIAS, Variance, MSE for weights that optimize MSE for an assumed value (A) of the between- hospital variance So, when A = T, MSE is minimized by this optimizer In the following plot, A is converted to a fraction of the total variance A/(A + within-hospital) –Fraction = 0  minimize variance –Fraction = 1  minimize bias

Term 4, 2006BIO656--Multilevel Models 36 The bias-variance trade-off The bias-variance trade-off X-axis is assumed variance fraction Y is performance computed under the true fraction Assumed kk

Term 4, 2006BIO656--Multilevel Models 37 Summary Much of statistics depends on weighted averages Weights should depend on assumptions and goals trustIf you trust your (regression) model, –Then, minimize the variance, using “optimal” weights –This generalizes the equal  case worryIf you worry about model validity (bias for    –You can buy full insurance by using population weights –But, you pay in variance (efficiency) –So, consider purchasing only the insurance you need by using compromise weights