2005 Hopkins Epi-Biostat Summer Institute1 Module I: Statistical Background on Multi-level Models Francesca Dominici Michael Griswold The Johns Hopkins.

Slides:



Advertisements
Similar presentations
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Advertisements

Lecture 11 (Chapter 9).
Logistic Regression Psy 524 Ainsworth.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Comments on Hierarchical models, and the need for Bayes Peter Green, University of Bristol, UK IWSM, Chania, July 2002.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
2005 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
EPI 809/Spring Multiple Logistic Regression.
Mixed models Various types of models and their relation
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Logistic regression for binary response variables.
Analysis of Clustered and Longitudinal Data
Review of Lecture Two Linear Regression Normal Equation
Review of normal distribution. Exercise Solution.
Lecture 2 Basic Bayes and two stage normal normal model…
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and.
1 Lecture 1 Introduction to Multi-level Models Course Website: All lecture materials extracted and.
Lecture 8: Generalized Linear Models for Longitudinal Data.
1 Module I: Statistical Background on Multi-level Models Francesca Dominici Scott L. Zeger Michael Griswold The Johns Hopkins University Bloomberg School.
Case Study - Relative Risk and Odds Ratio John Snow’s Cholera Investigations.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Association between 2 variables
Statistical Sampling & Analysis of Sample Data
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Average Arithmetic and Average Quadratic Deviation.
Term 4, 2006BIO656--Multilevel Models 1 PART 4 Non-linear models Logistic regression Other non-linear models Generalized Estimating Equations (GEE) Examples.
The Choice Between Fixed and Random Effects Models: Some Considerations For Educational Research Clarke, Crawford, Steele and Vignoles and funding from.
Term 4, 2006BIO656--Multilevel Models 1 Part 2 Schematic of the alcohol model Marginal and conditional models Variance components Random Effects and Bayes.
Term 4, 2006BIO656--Multilevel Models Multi-Level Statistical Models If you did not receive the welcome from me, me at:
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
2005 Hopkins Epi-Biostat Summer Institute1 Module 3: An Example of a Two-stage Model; NMMAPS Study Francesca Dominici Michael Griswold The Johns Hopkins.
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Confidence Interval & Unbiased Estimator Review and Foreword.
Multi-level Models Summer Institute 2005 Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg School of Public Health.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
1 Estimation of Gene-Specific Variance 2/17/2011 Copyright © 2011 Dan Nettleton.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Logistic regression (when you have a binary response variable)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Methods of Presenting and Interpreting Information Class 9.
Module 2: Bayesian Hierarchical Models
CHAPTER 7 Linear Correlation & Regression Methods
Linear Mixed Models in JMP Pro
Introduction to logistic regression a.k.a. Varbrul
Why use marginal model when I can use a multi-level model?
Module 3: An Example of a Two-stage Model; NMMAPS Study
From GLM to HLM Working with Continuous Outcomes
Presentation transcript:

2005 Hopkins Epi-Biostat Summer Institute1 Module I: Statistical Background on Multi-level Models Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg School of Public Health

2005 Hopkins Epi-Biostat Summer Institute2 Statistical Background on MLMs Module 1:  Main Ideas on Multilevel Models  Review of GLMs (Generalized Linear Models)  Accounting for Correlated Data  Bayes Theorem  Bayesian Inference and Computation

2005 Hopkins Epi-Biostat Summer Institute3 The Main Idea…

2005 Hopkins Epi-Biostat Summer Institute4 Biological, psychological and social processes that influence health occur at many levels:  Cell  Organ  Person  Family  Neighborhood  City  Society An analysis of risk factors should consider:  Each of these levels  Their interactions Multi-level Models – Main Idea Health Outcome

2005 Hopkins Epi-Biostat Summer Institute5 Example: Alcohol Abuse 1. Cell: Neurochemistry 2. Organ: Ability to metabolize ethanol 3. Person: Genetic susceptibility to addiction 4. Family: Alcohol abuse in the home 5. Neighborhood: Availability of bars 6. Society: Regulations; organizations; social norms Level:

2005 Hopkins Epi-Biostat Summer Institute6 Example: Alcohol Abuse; Interactions between Levels 5Availability of bars and 6State laws about drunk driving 4Alcohol abuse in the family and 2Person’s ability to metabolize ethanol 3Genetic predisposition to addiction and 4Household environment 6State regulations about intoxication and 3Job requirements Level:

2005 Hopkins Epi-Biostat Summer Institute7 Notation: Population Neighborhood: i=1,…,I s State: s=1,…,S Family: j=1,…,J si Person: k=1,…,K sij Outcome: Y sijk Predictors: X sijk Person: sijk ( y 1223, x 1223 )

2005 Hopkins Epi-Biostat Summer Institute8 Notation (cont.)

2005 Hopkins Epi-Biostat Summer Institute9 Multi-level Models: Idea Predictor Variables Alcohol Abuse Response Person’s Income Family Income Percent poverty in neighborhood State support of the poor s sij si sijk Level: X.n X.s X.f X.p Y sijk

2005 Hopkins Epi-Biostat Summer Institute10 A Rose is a Rose is a… Multi-level model Random effects model Mixed model Random coefficient model Hierarchical model Many names for similar models, analyses, and goals.

2005 Hopkins Epi-Biostat Summer Institute11 Generalized Linear Models (Review)

2005 Hopkins Epi-Biostat Summer Institute12 Digression on Statistical Models A statistical model is an approximation to reality There is not a “correct” model;  ( forget the holy grail ) A model is a tool for asking a scientific question;  ( screw-driver vs. sludge-hammer ) A useful model combines the data with prior information to address the question of interest. Many models are better than one.

2005 Hopkins Epi-Biostat Summer Institute13 Generalized Linear Models (GLMs) g(  ) =  0 +  1 *X 1 + … +  p *X p ModelResponse g(  )g(  ) DistributionCoef Interp Linear Continuous (ounces)  Gaussian Change in avg(Y) per unit change in X Logistic Binary (disease) logBinomialLog Odds Ratio Log- linear Count/Times to events log(  ) PoissonLog Relative Risk (  = E(Y|X) = mean )  (1-  )

2005 Hopkins Epi-Biostat Summer Institute14 Since: E(y|Age+1,Gender) =  0 +  1 (Age+1) +  2 Gender And: E(y|Age,Gender) =  0 +  1 Age +  2 Gender  E(y) =  1 Generalized Linear Models (GLMs) g(  ) =  0 +  1 *X 1 + … +  p *X p Gaussian – Linear :E(y) =  0 +  1 Age +  2 Gender Example: Age & Gender  1 = Change in Average Response per 1 unit increase in Age, Comparing people of the SAME GENDER. WHY?

2005 Hopkins Epi-Biostat Summer Institute15 Generalized Linear Models (GLMs) g(  ) =  0 +  1 *X 1 + … +  p *X p Binary – Logistic : log{odds(Y)} =  0 +  1 Age +  2 Gender Example: Age & Gender  1 = log-OR of “+ Response” for a 1 unit increase in Age, Comparing people of the SAME GENDER. WHY? Since: log{odds(y|Age+1,Gender)} =  0 +  1 (Age+1) +  2 Gender And: log{odds(y|Age,Gender)} =  0 +  1 Age +  2 Gender  log-Odds =  1 log-OR =  1

2005 Hopkins Epi-Biostat Summer Institute16 Generalized Linear Models (GLMs) g(  ) =  0 +  1 *X 1 + … +  p *X p Counts – Log-linear : log{E(Y)} =  0 +  1 Age +  2 Gender Example: Age & Gender  1 = log-RR for a 1 unit increase in Age, Comparing people of the SAME GENDER. WHY? Self-Check: Verify Tonight

2005 Hopkins Epi-Biostat Summer Institute17 Correlated Data…

2005 Hopkins Epi-Biostat Summer Institute18 D. Responses are independent B. All the key covariates are included in the model “Quiz”: Most Important Assumptions of Regression Analysis? A. Data follow normal distribution B. All the key covariates are included in the model C. Xs are fixed and known D. Responses are independent

2005 Hopkins Epi-Biostat Summer Institute19 Non-independent responses (Within-Cluster Correlation) Fact: two responses from the same family tend to be more like one another than two observations from different families Fact: two observations from the same neighborhood tend to be more like one another than two observations from different neighborhoods Why?

2005 Hopkins Epi-Biostat Summer Institute20 Why? (Family Wealth Example) GOD Grandparents Parents You Great-Grandparents You Parents Grandparents

2005 Hopkins Epi-Biostat Summer Institute21 Multi-level Models: Idea Predictor Variables Alcohol Abuse Response Person’s Income Family Income Percent poverty in neighborhood State support of the poor s sij si sijk Level: X.n X.s X.f X.p Y sijk Bars a.n si Drunk Driving Laws a.s s Unobserved random intercepts Genes a.f sij

2005 Hopkins Epi-Biostat Summer Institute22 Key Components of Multi-level Models Specification of predictor variables from multiple levels (Fixed Effects)  Variables to include  Key interactions Specification of correlation among responses from same clusters (Random Effects) Choices must be driven by scientific understanding, the research question and empirical evidence.

2005 Hopkins Epi-Biostat Summer Institute23 Multi-level Shmulti-level Multi-level analyses of social/behavioral phenomena: an important idea Multi-level models involve predictors from multi- levels and their interactions They must account for associations among observations within clusters (levels) to make efficient and valid inferences.

2005 Hopkins Epi-Biostat Summer Institute24 Regression with Correlated Data Must take account of correlation to: Obtain valid inferences  standard errors  confidence intervals  posteriors Make efficient inferences

2005 Hopkins Epi-Biostat Summer Institute25 Logistic Regression Example: Cross-over trial Response: 1-normal; 0- alcohol dependence Predictors: period (x 1 ); treatment group (x 2 ) Two observations per person (cluster) Parameter of interest: log odds ratio of dependence: treatment vs placebo Mean Model: log{odds(AD)} =  0 +  1 Period +  2 Trt

2005 Hopkins Epi-Biostat Summer Institute26 Results: estimate, (standard error) Model Variable Ordinary Logistic Regression Account for correlation Intercept 0.66 (0.32) 0.67 (0.29) Period (0.38) (0.23) Treatment 0.56 (0.38) 0.57 (0.23) Similar Estimates, WRONG Standard Errors (& Inferences) for OLR (  0 ) (  2 ) (  1 )

2005 Hopkins Epi-Biostat Summer Institute27 Simulated Data: Non-Clustered Neighborhood Alcohol Consumption (ml/day)

2005 Hopkins Epi-Biostat Summer Institute28 Simulated Data: Clustered Neighborhood Alcohol Consumption (ml/day)

2005 Hopkins Epi-Biostat Summer Institute29 Within-Cluster Correlation Correlation of two observations from same cluster = Non-Clustered = ( ) / 9.8 = 0 Clustered = ( ) / 9.8 = 0.67 Total Var – Within Var Total Var

2005 Hopkins Epi-Biostat Summer Institute30 Models for Clustered Data Models are tools for inference Choice of model determined by scientific question Scientific Target for inference?  Marginal mean: Average response across the population  Conditional mean: Given other responses in the cluster(s) Given unobserved random effects We will deal mainly with conditional models  Operating under a Bayesian paradigm

2005 Hopkins Epi-Biostat Summer Institute31 Basic Bayes…

2005 Hopkins Epi-Biostat Summer Institute32 Diagnostic Testing Ask Marilyn ® BY MARILYN VOS SAVANT A particularly interesting and important question today is that of testing for drugs. Suppose it is assumed that about 5% of the general population uses drugs. You employ a test that is 95% accurate, which we’ll say means that if the individual is a user, the test will be positive 95% of the time, and if the individual is a nonuser, the test will be negative 95% of the time. A person is selected at random and is given the test. It’s positive. What does such a result suggest? Would you conclude that the individual is a drug user? What is the probability that the person is a drug user?

2005 Hopkins Epi-Biostat Summer Institute33 a d True positives True negatives Disease Status Test Outcome Diagnostic Testing False positives False negatives c b

2005 Hopkins Epi-Biostat Summer Institute34 Diagnostic Testing “The workhorse of Epi”: The 2  2 table Disease +Disease -Total Test +aba + b Test -cdc + d Totala + cb + da + b + c + d

2005 Hopkins Epi-Biostat Summer Institute35 Diagnostic Testing “The workhorse of Epi”: The 2  2 table Disease +Disease -Total Test +aba + b Test -cdc + d Totala + cb + da + b + c + d

2005 Hopkins Epi-Biostat Summer Institute36 Diagnostic Testing “The workhorse of Epi”: The 2  2 table Disease +Disease -Total Test +aba + b Test -cdc + d Totala + cb + da + b + c + d

2005 Hopkins Epi-Biostat Summer Institute37 Diagnostic Testing Marilyn’s Example Disease +Disease -Total Test Test Total PPV = 51% NPV = 99% Sens = 0.95 Spec = 0.95 P(D) = 0.05

2005 Hopkins Epi-Biostat Summer Institute38 Diagnostic Testing Marilyn’s Example Disease +Disease -Total Test Test Total PPV = 83% NPV = 99% Sens = 0.95 Spec = 0.95 P(D) = 0.20 Point: PPV depends on prior probability of disease in the population

2005 Hopkins Epi-Biostat Summer Institute39 Diagnostic Testing & Bayes Theorem Bayesian Formulation:  Parameter of interest: “D”: D=0 if disease free, D=1 if diseased  Prior distribution of the parameter D: Pr(D=1) (Prevalence of disease in general pop’n)  Data: to provide evidence about the parameter Y=0 if test negative, Y=1 if test positive  Likelihood: Pr(data | specific parameter value) Pr(Y|D) = sens, spec; i.e. Pr(Y=1|D=1), Pr(Y=0|D=0)  Posterior distribution of the parameter D: Pr(D|Y) = Pr(diseased | test outcome)  Pr(Y|D)P(D) = Likelihood * Prior

2005 Hopkins Epi-Biostat Summer Institute40 Diagnostic Testing & Bayes Theorem Marilyn’s Example:  Parameter of interest: “D”: D=0 if disease free, D=1 if diseased  Prior distribution of the parameter D: Pr(D=1) = 0.05  Data: to provide evidence about the parameter Suppose positive test observed: Y=1  Likelihood: Pr(data | specific parameter value) Pr(Y=1|D=1) = 0.95, (sens); Pr(Y=1|D=0) = 1-Pr(Y=0|D=0) = = 0.05  Posterior distribution of the parameter D: Pr(D=1|Y=1) =

2005 Hopkins Epi-Biostat Summer Institute41 Diagnostic Testing & Bayes Theorem Bayes Theorem lets us combine our Prior beliefs with the Likelihood of having observed our Data to obtain Posterior inferences about the parameters we’re interested in, given the data we saw.

2005 Hopkins Epi-Biostat Summer Institute42 Key Points “Multi-level” Models:  Have covariates from many levels and their interactions  Acknowledge correlation among observations from within a level (cluster) Bayesian Inference: Assumptions about the latent variables determine the nature of the within cluster correlations Information can be borrowed across clusters (levels) to improve individual estimates