Logistic Regression APKC – STATS AFAC (2016).

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Overview of Logistics Regression and its SAS implementation
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Multinomial Logistic Regression Basic Relationships
Logistic Regression Chapter 8.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Standard Binary Logistic Regression
Logistic Regression – Basic Relationships
Generalized Linear Models
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
STAT E-150 Statistical Methods
Categorical Data Prof. Andy Field.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Hierarchical Binary Logistic Regression
Multinomial Logistic Regression Basic Relationships
Assessing Survival: Cox Proportional Hazards Model
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Logistic Regression Analysis Gerrit Rooks
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic regression (when you have a binary response variable)
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
EHS Lecture 14: Linear and logistic regression, task-based assessment
CHAPTER 7 Linear Correlation & Regression Methods
An Interactive Tutorial for SPSS 10.0 for Windows©
Notes on Logistic Regression
Basic Statistics Overview
Analysis of Covariance (ANCOVA)
Generalized Linear Models
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Scatter Plots of Data with Various Correlation Coefficients
Categorical Data Analysis Review for Final
Logistic Regression.
Introduction to Logistic Regression
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Presentation transcript:

Logistic Regression APKC – STATS AFAC (2016)

What is Logistic Regression? In statistics, logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical. Form of regression that allows the prediction of discrete variables by a mix of continuous and discrete predictors. No distributional assumptions on the predictors (the predictors do not have to be normally distributed, linearly related or have equal variance in each group)

What is Logistic Regression? Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear Example : Occurrence of T Storm Vs any specific atmospheric parameter

Questions Are there interactions among predictors? Does adding interactions among predictors (continuous or categorical) improve the model? Continuous predictors should be centered before interactions made in order to avoid multicollinearity. Can parameters be accurately predicted? How good is the model at classifying cases for which the outcome is known ?

Assumptions The only “real” limitation on logistic regression is that the outcome must be discrete.

Assumptions Linearity in the logit – the regression equation should have a linear relationship with the logit form of the DV. There is no assumption about the predictors being linearly related to each other.

What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number. The log of 1000 is 3 as 103=1000. The log of an odds ratio of 1.0 is 0 as 100 = 1 Exponent (e) or 2.718 raised to a certain power is the antilog of that number. Thus, (expβ) = antilog β Antilog of log odds 0 is 2.718o =1 Exponential increases are curvilinear.

Background LOGITS ARE CONTINOUS, LIKE Z SCORES p = 0.50, then logit = 0 p = 0.70, then logit = 0.84 p = 0.30, then logit = -0.84

Plain old regression Y = A BINARY RESPONSE (DV) 1 POSITIVE RESPONSE (Success) P 0 NEGATIVE RESPONSE (failure) Q = (1-P) MEAN(Y) = P, observed proportion of successes VAR(Y) = PQ, maximized when P = .50, variance depends on mean (P) XJ = ANY TYPE OF PREDICTOR  Continuous, Dichotomous, Polytomous

Plain old regression and it is assumed that errors are normally distributed, with mean=0 and constant variance (i.e., homogeneity of variance)

Plain old regression an expected value is a mean, so The predicted value equals the proportion of observations for which Y|X = 1; P is the probability of Y = 1(A SUCCESS) given X, and Q = 1- P (A FAILURE) given X.

Plain old regression – you can’t use regular old regression when you have discrete outcomes because you don’t meet homoskedasticity.

The logistic function

The logistic function Where Y-hat is the estimated probability that the ith case is in a category and u is the regular linear regression equation:

The logistic function

The logistic function Change in probability is not constant (linear) with constant changes in X This means that the probability of a success (Y = 1) given the predictor variable (X) is a non-linear function, specifically a logistic function

Logistic regression It can be binomial, ordinal or multinomial. Binomial or binary logistic regression deals with situations in which the observed outcome for a dependent variable can have only two possible types (for example, “Occ" vs. “Non Occ" or "win" vs. "loss"). Multinomial logistic regression deals with situations where the outcome can have three or more possible types that are not ordered.

An Example Predictors of a treatment intervention. Participants 113 adults with a medical problem Outcome: Cured (1) or not cured (0). Predictors: Intervention: intervention or no treatment. Duration: the number of days before treatment that the patient had the problem. Slide 19

Click Categorical Click First, then Change. Identify any categorical Covariates (Predictors). With a categorical predictor with more than 2 categories you should use either the highest number to code your control category, then select last for your indicator contrast. In this data set 1 is cured, 0 not cured

Enter Interaction Term(s) You can specify main effects and interactions. Highlight both predictors, then click the >a*b> If you don’t have previous literature, choose Stepwise Forward LR LR is Likelihood Ratio

Save Settings for Logistic Regression

Option Settings for Logistic Regression Hosmer-Lemeshow assesses how well the model fits the data. Look for outliers +/- 2 SD Request the 95% CI for the odds ratio (odds of Y occurring)

Output for Step 0, Constant Only Initially the model will always select the option with the highest frequency, in this case it selects the intervention (treated). Large values for -2 Log Likelihood (-2 LL) indicate a poor fitting model. The -2 LL will get smaller as the fit improves.

Example of How to Write the Logistic Regression Equation from Coefficients Using the constant only the model above predicts a 57% probability of Y occurring.

Output: Step 1

Equation for Step 1 See p 288 for an Example of using equation to compute Odds ratio. We can say that the odds of a patient who is treated being cured are 3.41 times higher than those of a patient who is not treated, with a 95% CI of 1.561 to 7.480. The important thing about this confidence interval is that it doesn’t cross 1 (both values are greater than 1). This is important because values greater than 1 mean that as the predictor variable(s) increase, so do the odds of (in this case) being cured. Values less than 1 mean the opposite: as the predictor increases, the odds of being cured decreases.

Output: Step 1 Removing Intervention from the model would have a significant effect on the predictive ability of the model, in other words, it would be very bad to remove it.

Classification Plot Further away from .5 is better. The .5 line represents a coin toss you have a 50/50 chance. If the model fits the data, then the histogram should show all of the cases for which the event has occurred on the right hand side (C), and all the cases for which the event hasn’t occurred on the left hand side (N). This model is better at predicting cured cases than it is for non cured cases, as the non cured cases are closer to the .5 line.

Choose Analyze – Reports – Case Summaries Use the Case Summaries function to create a table of the first 15 cases showing the values of Cured, Intervention, Duration, the predicted probability (PRE_1) and the predicted group membership (PGR_1).

Case Summaries

Summary The overall fit of the final model is shown by the −2 log-likelihood statistic. If the significance of the chi-square statistic is less than .05, then the model is a significant fit of the data. Check the table labelled Variables in the equation to see which variables significantly predict the outcome. Use the odds ratio, Exp(B), for interpretation. OR > 1, then as the predictor increases, the odds of the outcome occurring increase. OR < 1, then as the predictor increases, the odds of the outcome occurring decrease. The confidence interval of the OR should not cross 1! Check the table labelled Variables not in the equation to see which variables did not significantly predict the outcome.