1 Special Topic: Logistic Regression for Binary outcomes The dependent variable is often binary such as whether a person litters or not, used a condom.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Brief introduction on Logistic Regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Hypothesis Testing Steps in Hypothesis Testing:
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
The General Linear Model Or, What the Hell’s Going on During Estimation?
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and regression
Objectives (BPS chapter 24)
Models with Discrete Dependent Variables
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 13 Additional Topics in Regression Analysis
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Sample size computations Petter Mostad
Additional Topics in Regression Analysis
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Review: The Logic Underlying ANOVA The possible pair-wise comparisons: X 11 X 12. X 1n X 21 X 22. X 2n Sample 1Sample 2 means: X 31 X 32. X 3n Sample 3.
Topic 3: Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression Analysis
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Simple Linear Regression
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
Chapter 13 Multiple Regression
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Chapter 10 The t Test for Two Independent Samples
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Methods of Presenting and Interpreting Information Class 9.
Nonparametric Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Stats Club Marnie Brennan
Nonparametric Statistics
Simple Linear Regression
Statistics II: An Overview of Statistics
Chapter 10 Introduction to the Analysis of Variance
Presentation transcript:

1 Special Topic: Logistic Regression for Binary outcomes The dependent variable is often binary such as whether a person litters or not, used a condom or not, dead or alive, diseased or not, intercourse or not, or divorced or not. In this case, logistic or probit regression is the method of choice because of violation of assumptions if ordinary least squares regression is used. Estimates of the mediated effect using logistic and probit regression can be distorted using conventional procedures. Here we examine binary or continuous X, continuous M, and binary Y. MacKinnon et al., under review in Clinical Trials and MacKinnon et al., under review Psychological Methods.

2 Logistic Regression Model for Equations 1 and 2 Standard logistic regression model, where Y depends on X, β1 is the intercept and τ codes the relation between X and Y. logit Pr{Y=1|X} =β1 + τX (1) Standard logistic regression model, where Y depends on X and M, β2 is the intercept, τ′ codes the relation between X and Y adjusted for M and β codes the relation between M and Y, adjusted for X. logit Pr{Y=1|X,M} = β2 + τ′X + β M (2)

3 Logistic Regression Model for latent variable Y* Y* = β1 + τX + ε1 (1) Y* = β2 + τ′X + β M + ε2 (2) The unobserved latent variable Y* is linearly related to X and then to both X and M, ε1 and ε2 represent residual variability and have a standard logistic distribution. The dichotomous Y is derived from Y* through the relation Y = 1 if and only if Y* > 0. The same model applies for the probit with the errors having a standard normal distribution rather than a standard logistic distribution.

4 Equation 3 M = β 3 + αX + ε3 (3) M is a continuous variable so ordinary least squares regression is used to estimate this model where β 3 is the intercept, α represents the relation between X and Y, and ε3 is residual variability.

5 Logistic Regression Model for latent variable Y* τ - τ′ Difference in coefficients. The coefficients are from separate logistic regression equations. αβ Product of coefficients. The β coefficient is from a logistic regression model and α is from an ordinary least squares regression model. As will be shown, the difference in coefficient method can give distorted values for the mediated effect because of differences in the scale of separate logistic regression models. For both Equations 1 and 2, residual variability is fixed at  2 /3 and fixed at 1 for probit regression.

6 What is the in the next plot?  Expected logistic regression coefficients based on Haggstrom (1983) are used to compute τ - τ′ and α β.  All possible combinations of α, β and τ′ values for small (2% variance explained), medium (13%), large (26%), and very large (40%) effects (4 X 4 X 4 = 64)  Y-axis is the expected value for τ - τ′ and α β  X-axis is the true value of the b coefficient in the continuous variable mediation model. It is indicated by β C

7 Plot of true values of αβ and τ - τ′ as a function of true mediated effect and true value of β C.

8 Plot of true proportion mediated as a function of true value of β C.

9 αβ and τ - τ′ are not equal in Logistic and Probit Regression The two estimators, α β and τ - τ′ are not identical in logistic or probit regression because, unlike ordinary least squares regression where the residual variance varies across equations, in logistic regression the residual variance is fixed to equal  2 /3 (MacKinnon & Dwyer, 1993). So the logistic regression coefficients are a function of the relations among variables and the fixed value of the residual variance. There are solutions

10 Solutions to mediation estimation in Logistic and Probit Regression Standardize the values of the coefficients. One standardization method computes the variance of Y in both equations and uses that to standardize values (MacKinnon & Dwyer, 1993; Winship & Mare, 1983). Another standardization method standardizes coefficients in Equation 2 to be in the same metric as Equation 1. To the best of our knowledge, this is a new method that is described below. Use a computer program such as Mplus that appropriately handles categorical variables in covariance structure models. I believe that this approach is similar to the first approach to standardization, i.e., the scale of the latent Y* is the same for all equations in a model.

11 Standardizing across logistic regression equations Standardize the values of the coefficients in Equations 1 and 2 (see MacKinnon & Dwyer, 1993 and Winship & Mare, 1983). s 2 Y* = τ 2 s X 2 +  2 /3 and divide the τ coefficient and standard error by s Y* from this equation. s 2 Y* = τ′ 2 s X 2 + β 2 s M τ′ β s XM +  2 /3 and divide the τ′ and β coefficients and standard errors by s Y* from this equation. where s X 2 is the variance of the X variable, s M 2 is the variance of the M variable, and s XM is the covariance of the X and M variables. The α parameter does not require rescaling if M is continuous. Note that if probit regression is used the last term of the equations for s 2 Y* should be 1 rather than  2 /3.

12 Standardizing Equation 2 to the metric of Equation 1 The coefficients from Equation 2 are divided by the following quantity: where σ 2 33·X is the residual variance in the regression model for M predicted by X, i.e. Equation 3. The first term is replaced with 1 for probit regression.

13 Plot of true values of αβ and τ - τ′ as a function β C, after standardization.

14 Plot of true values of proportion mediated as a function of β C, after standardization.

15 Simulation Design  All possible combinations of α, β and τ′ effect size for small (2% variance explained), medium (13%), large (26%), and very large (40%) effects.  6 Sample sizes, N= 50, 100, 200, 500, 1000, 5000  1000 Replications of each of the 4 X 4 X 4 X 6 = 384 generated data sets or 384,000 data sets.  Probit and Logistic Regression on the same data  Standardized and Unstandardized coefficients  Data were generated using standard normal deviate for the error term in Equation 2–which is the probit model.

16 Simulation Outcomes  Estimates of α β and τ - τ′ before and after standardization for both probit and logistic regression.  Estimates of proportion mediated α β /(α β +τ′), 1-(τ′/τ), and α β /τ before and after standardization for both probit and logistic regression  Measures of mean and average relative bias  Tables and plots

17 The estimated mediated effect, τ - τ′, as a function of β C for α=.14.

18 Power: Logistic Regression Sample Size Small Effect Size Delta Method Joint Significance Asymmetric *from MacKinnon, Yoon, & Lockwood (2003, SPR).

19 Summary and Future Directions 1.Unlike the linear OLS model case, the difference in coefficients and product of coefficients estimators of the mediated effect are not equal. The difference in coefficients estimator is distorted, as shown with expected values and in the simulation study. The same problem occurs for the proportion mediated measures. 2.Standardization of coefficients across equations solves the problem and removes distortion. Two approaches to standardization were mentioned, but the results for rescaling coefficients in Equation 2 to be in the same metric as those in Equation 1 were described. The other standardization method works in a similar manner. 3.The simplest approach is the product of coefficients estimator of the mediated effect, which does not require standardization. Researchers who prefer the logic of the difference in coefficients methods should standardize coefficients prior to computing the mediated effect.. 4.The standardization approaches should apply to other examples of the Generalized Linear model such as the Poisson and survival analysis model.

20 Surrogate Endpoint Research I The length of time for a disease to occur and low incidence of the disease require very large sample sizes and long duration studies. Alternative is to find an outcome that can serve as a surrogate for the ultimate outcome. Here the mediator is called a surrogate or intermediate outcome. Surrogate endpoints are more frequent or more proximate to the prevention strategy.

21 Examples of Surrogate endpoints Precancerous cells for colon cancer Cholesterol level for coronary heart disease. Bone density for osteoporosis Lymphocyte levels for HIV/AIDS Partial loss of vision for blindness Tumor size for breast cancer

22 Surrogate endpoints Research II “Above all else, we believe that the issue of when and how to use surrogate endpoints is probably the pre-eminent contemporary problem in clinical trials methodology, so it merits much extensive scrutiny” (Begg & Leung, 2000, p. 27). A surrogate endpoint is a “response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the true endpoint” (Prentice, 1989, p. 432)

23 Micromediational chain It is often not possible to study all steps in a mediation chain, e.g., in a prevention program, to study each of six constructs in a theoretical chain from exposure to a component, comprehension, retention of the component’s message, short-term attitude change, long-term attitude change, and long-term refusal to use drugs. Cook and Campbell (1979) make a distinction between molar mediation where some steps are studied and micromediation where each link is measured. Kenny et al., (1998) make the distinction between proximal and distal mediators. Any mediation model is part of a longer mediation chain. The researcher decides what part of the micromediational chain to examine. Similar decisions must be made about outcomes.

24 Possible Surrogate endpoints in Prevention Aggression at age 12 for incarceration at 24. Early onset gateway drug use for adult addiction and driving under the influence. Harming animals as a child for later assault. Social withdrawal at age 8 for adult depression. School dropout for adult unemployment.

25 Prevention Mediators versus Surrogate Endpoints Many similarities between surrogates and mediators in prevention science, but… Theoretical causal connection between surrogate and outcome is often clearer than in prevention. In prevention, relation between mediator (surrogate) and outcome is weaker than in most areas of surrogate endpoint research. Surrogates are more likely to completely mediate effects of X on the outcome.