Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.

Slides:



Advertisements
Similar presentations
Sociology 680 Multivariate Analysis Logistic Regression.
Advertisements

Structural Equation Modeling
Logistic Regression Psy 524 Ainsworth.
Logistic Regression.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
Lecture 6: Multiple Regression
Multiple Regression.
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Today Concepts underlying inferential statistics
Chapter 15: Model Building
Multiple Regression – Basic Relationships
Generalized Linear Models
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Relationships Among Variables
Correlation & Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Lecture 15 Basics of Regression Analysis
Categorical Data Prof. Andy Field.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Moderation & Mediation
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Logistic Regression Analysis Gerrit Rooks
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
P Values - part 2 Samples & Populations Robin Beaumont 2011 With much help from Professor Chris Wilds material University of Auckland.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Logistic Regression and Odds Ratios Psych DeShon.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Stats Methods at IC Lecture 3: Regression.
Module II Lecture 1: Multiple Regression
Regression Analysis.
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Correlation, Regression & Nested Models
Analysis of Covariance (ANCOVA)
Regression.
Multiple logistic regression
Regression Part II.
Presentation transcript:

Logistic Regression Part I - Introduction

Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect of concentration of drug on whether symptoms go away –effect of age on whether or not a patient survived treatment –effect of negative cognitions about SELF, WORLD, or Self-BLAME on whether a participant has PTSD

Simple Linear Regression Relationship between continuous response variable and continuous explanatory variable Example –Effect of concentration of drug on reaction time –Effect of age of patient on number of years of post-operation survival

Simple Linear Regression RT (ms) = β 0 + β 1 x concentration (mg) β 0 is value of RT when concentration is 0 β 1 is change in RT caused by a change in concentration of 1mg. E.g. RT = x concentration

Logistic Regression What do we do when we have a response variable which is not continuous, but is dichotomous

Probability of DiseaseOdds of DiseaseLog(Odds) of Disease Concentration

Odds Odds are simply the ratio of the proportions for the two possible outcomes. If p is the proportion for one outcome, then 1- p is the proportion for the second outcome.

Odds (Example) At concentration level 16 we observe 75 participants out of 100 showing no disease (healthy) If p is the probability of healthy is then p = Then 1 – p is the probability of not healthy, and equals 0.25 Odds of showing healthy over not healthy given concentration level 16 p / (1 – p) = 0.75/0.25 = 3 Means that it is 3 times more likely that person is healthy at concentration level 16

Logarithms Logarithms are a way of expressing numbers as powers of a base Example –10 2 = 100 –10 is called the “base” –The power, 2 in this case, is called the “exponent” Therefore 10 2 = 100 means that log = 2

Log Odds Odds of being healthy after 16mg of drug is 3 Log odds is log(3) = 1.1 Lets say that odds of being healthy after 2mg of drug is 0.25 Means that it is four times more likely to not be healthy after 2mg of drug Log odds is log(0.25) = -1.39

Logistic Regression With Log-odds we can now look at the linear relationship between dichotomous response and continuous explanatory Where, for example, p is the probability of being healthy at different levels of drug concentration, X

Example: Simple Logistic Regression Look at the effect of drug concentration on probability of NOT having disease (i.e. being healthy) Use SPSS to do the regression (we’ll all do this soon) Get

Looks Like

Interpreting parameters (b 0 and b 1 ) in logistic regression is a little tricky An increase of 1mg of concentration increases the log(odds) of being healthy by An increase of 1mg of concentration increases the odds of being healthy by Increasing concentration by 1mg increases odds of being healthy by a factor of 1.11

Slope Parameter Parameter β 1 in general: –if positive then increasing X increases the odds of p –if negative then increasing X decreases the odds of p –the larger (in magnitude) the larger the effect of X on p Like simple linear regression, can test whether or not β 1 is significantly different from 0.

Let’s break to do simple Logistic Regression Open XYZ.sav in SPSS Fit logistic regression with –PTSD (Y/N) as response variable –Self-BLAME as explanatory variable Is the effect of Self-BLAME significant? Get parameter estimates Write equation of model What is the odds of having PTSD given Self- BLAME score of 3? Use the interpretation of the regression coefficient to work out odds given Self-BLAME of 4.

Logistic Regression Part II – Multiple Logistic Regression

Multiple Linear Regression Simple Linear Regression extended out to more than one explanatory variable Example –Effect of both concentration and age on reaction time –Effect of age, number of previous operations, time in anaesthesia, cholesterol level, etc. on number of years of post-operation survival

Multiple Linear Regression RT (ms) = β 0 + β 1 x concentration (mg) + β 2 x age + β 3 x gender (0=male,1=female) β 0 is value of RT when concentration is 0. β 1 is change in RT caused by a change in concentration of 1mg. β 2 is change in RT caused by a change in age of 1 year. β 3 is change in RT caused by going from male to female in gender.

Multiple Logistic Regression Look at the effect of drug concentration, age and gender on probability of NOT having disease Where p is the probability of not having the disease, X 1 is the concentration of drug (mg), X 2 is age (years), and X 3 is gender (0 for males, 1 for females)

Again, use SPSS to fit logistic model Increasing concentration increases odds of not having the disease (again, being healthy) Increasing age decreases odds of being healthy “Increasing” gender (from male to female) increases odds of being healthy In particular, increasing age decreases the odds of being healthy by a factor of 0.95 M to F increases odds by factor of 1.001

Was it worth adding the factors? When we add parameters we make our model more complicated. We really want this addition to be “worth it” In other words, adding age and gender should improve our explanation of disease But what constitutes an improvement

Was it worth adding the factors? Quality (badness) of model fit is given by -2logL If we fit want to see if it was worth adding parameters we can compare the quality of the fit of the simple and the more complex model Quality of model fit follows a chi-square (χ 2 ) distribution with degrees-of-freedom (df) equal to the number of parameters in the model The difference between quality of fit also follows a χ 2 distribution with df equal to the difference in the number of parameters between the two models

Was it worth adding these factors? Simple logistic regression model has overall χ 2 of 45.7 This multiple logistic regression model with 2 extra parameters has χ 2 of Test whether χ 2 = = 5.68 is a significant improvement Critical χ 2 for 2 df is 5.99 Our χ 2 is smaller and so NO, not worth it

BUT… It doesn’t look like gender is having much of an effect Check SPSS output and see that Wald χ 2 for Gender is 0.527, which has p =.47 Perhaps it wasn’t worth adding both parameters, but it will be worth just adding Age Age has Wald-χ 2 = 4.33, p =.03 When we only add Age, change in χ 2 = 5.5 and we test against χ 2 with df of 1, which has p =.02

Logistic Regression Model Building What if we have a whole host of possible explanatory variables We want to build a model which predicts whether a person will have a disease given a set of explanatory variables SAME as multiple linear regression –Forward selection –Backward elimination –Stepwise –All subsets –Hierarchical

How to know if a model is good All about having a model which does a good job of appropriately classifying participants as having disease or not In particular, model predicts how many people have disease and how many people don’t have the disease The model can be –Correct in two ways Correctly categorise a person who has a disease as having a disease Correctly say no disease when no disease –Incorrect in two ways Incorrectly categorise a person who has a disease as not having a disease Incorrectly say no disease when disease

Accuracy of model Proportion of correct classifications –Number of correct disease participants plus number of correct no disease participants divided by number of participants in total

Sensitivity of model Proportion of ‘successes’ correctly identified –Number of correct no disease participants divided by total number of no disease participants

Specificity of model Proportion of ‘failures’ correctly identified –Number of correct disease participants divided by total number of disease participants

Now…a real example Startup, Makgekgenene and Webster (2007) looked at whether or not the subscales of the Posttraumatic Cognitions Inventory (PTCI) are good predictors of Posttraumatic Stress Disorder (PTSD) Subscales are –Negative Cognitions About SELF –Negative Cognitions about the WORLD –Self-BLAME

Descriptive Results PTSD participants showed higher scores than non-PTSD in all three subscales variables

Multiple Logistic Regression Response variable: –whether or not the participant has PTSD Explanatory variables: –Negative Cognitions About SELF –Negative Cognitions about the WORLD –Self-BLAME

Let’s do the Logistic Regression Open XYZ.sav in SPSS Run the appropriate regression What are the parameter estimates for our three explanatory variables? Which of these are significant (at α =.05)? What are the odds ratios for those that are significant? Anything unusual?

Self-BLAME Self-BLAME has a negative odds ratio. This means that increasing self-blame decreases the chance of having PTSD This is surprising, especially since participants with PTSD showed higher Self-BLAME scores What’s going on?

Self-BLAME and SELF scales Startup et al. (2007) explain this by stating that Self-BLAME is made up of both behavioural and characterological questions SELF, however, may also tap into characterological aspects of self-blame Behavioural self-blame can be considered adaptive. It may help avoid PTSD Characterological self-blame, however, may be detrimental, and lead to PTSD

Suppressor Effect The relationship between SELF and PTSD is strong, and accounts for the negative relationship. This includes the effect of characterological self-blame. The variation in PTSD that is left for Self-BLAME to account for is the positive aspect of the relationship between the Self-BLAME scores and PTSD. The negative aspect of Self-BLAME scores has been suppressed (already accounted for by SELF). The positive aspect of Self-BLAME can now come out.

Homework (haha) Evaluate the model by looking at –Accuracy of model’s predictions –Sensitivity of model’s predictions –Specificity of model’s predictions