Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.

Logistic Regression Part I - Introduction

Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect of concentration of drug on whether symptoms go away –effect of age on whether or not a patient survived treatment –effect of negative cognitions about SELF, WORLD, or Self-BLAME on whether a participant has PTSD

Simple Linear Regression Relationship between continuous response variable and continuous explanatory variable Example –Effect of concentration of drug on reaction time –Effect of age of patient on number of years of post-operation survival

Simple Linear Regression RT (ms) = β 0 + β 1 x concentration (mg) β 0 is value of RT when concentration is 0 β 1 is change in RT caused by a change in concentration of 1mg. E.g. RT = 400 + 50 x concentration

Logistic Regression What do we do when we have a response variable which is not continuous, but is dichotomous

Probability of DiseaseOdds of DiseaseLog(Odds) of Disease Concentration

Odds Odds are simply the ratio of the proportions for the two possible outcomes. If p is the proportion for one outcome, then 1- p is the proportion for the second outcome.

Odds (Example) At concentration level 16 we observe 75 participants out of 100 showing no disease (healthy) If p is the probability of healthy is then p = 0.75. Then 1 – p is the probability of not healthy, and equals 0.25 Odds of showing healthy over not healthy given concentration level 16 p / (1 – p) = 0.75/0.25 = 3 Means that it is 3 times more likely that person is healthy at concentration level 16

Logarithms Logarithms are a way of expressing numbers as powers of a base Example –10 2 = 100 –10 is called the “base” –The power, 2 in this case, is called the “exponent” Therefore 10 2 = 100 means that log 10 100 = 2

Log Odds Odds of being healthy after 16mg of drug is 3 Log odds is log(3) = 1.1 Lets say that odds of being healthy after 2mg of drug is 0.25 Means that it is four times more likely to not be healthy after 2mg of drug Log odds is log(0.25) = -1.39

Logistic Regression With Log-odds we can now look at the linear relationship between dichotomous response and continuous explanatory Where, for example, p is the probability of being healthy at different levels of drug concentration, X

Example: Simple Logistic Regression Look at the effect of drug concentration on probability of NOT having disease (i.e. being healthy) Use SPSS to do the regression (we’ll all do this soon) Get

Looks Like

Interpreting parameters (b 0 and b 1 ) in logistic regression is a little tricky An increase of 1mg of concentration increases the log(odds) of being healthy by 0.106 An increase of 1mg of concentration increases the odds of being healthy by Increasing concentration by 1mg increases odds of being healthy by a factor of 1.11

Slope Parameter Parameter β 1 in general: –if positive then increasing X increases the odds of p –if negative then increasing X decreases the odds of p –the larger (in magnitude) the larger the effect of X on p Like simple linear regression, can test whether or not β 1 is significantly different from 0.

Let’s break to do simple Logistic Regression Open XYZ.sav in SPSS Fit logistic regression with –PTSD (Y/N) as response variable –Self-BLAME as explanatory variable Is the effect of Self-BLAME significant? Get parameter estimates Write equation of model What is the odds of having PTSD given Self- BLAME score of 3? Use the interpretation of the regression coefficient to work out odds given Self-BLAME of 4.

Logistic Regression Part II – Multiple Logistic Regression

Multiple Linear Regression Simple Linear Regression extended out to more than one explanatory variable Example –Effect of both concentration and age on reaction time –Effect of age, number of previous operations, time in anaesthesia, cholesterol level, etc. on number of years of post-operation survival

Multiple Linear Regression RT (ms) = β 0 + β 1 x concentration (mg) + β 2 x age + β 3 x gender (0=male,1=female) β 0 is value of RT when concentration is 0. β 1 is change in RT caused by a change in concentration of 1mg. β 2 is change in RT caused by a change in age of 1 year. β 3 is change in RT caused by going from male to female in gender.

Multiple Logistic Regression Look at the effect of drug concentration, age and gender on probability of NOT having disease Where p is the probability of not having the disease, X 1 is the concentration of drug (mg), X 2 is age (years), and X 3 is gender (0 for males, 1 for females)

Again, use SPSS to fit logistic model Increasing concentration increases odds of not having the disease (again, being healthy) Increasing age decreases odds of being healthy “Increasing” gender (from male to female) increases odds of being healthy In particular, increasing age decreases the odds of being healthy by a factor of 0.95 M to F increases odds by factor of 1.001

Was it worth adding the factors? When we add parameters we make our model more complicated. We really want this addition to be “worth it” In other words, adding age and gender should improve our explanation of disease But what constitutes an improvement

Was it worth adding the factors? Quality (badness) of model fit is given by -2logL If we fit want to see if it was worth adding parameters we can compare the quality of the fit of the simple and the more complex model Quality of model fit follows a chi-square (χ 2 ) distribution with degrees-of-freedom (df) equal to the number of parameters in the model The difference between quality of fit also follows a χ 2 distribution with df equal to the difference in the number of parameters between the two models

Was it worth adding these factors? Simple logistic regression model has overall χ 2 of 45.7 This multiple logistic regression model with 2 extra parameters has χ 2 of 40.02 Test whether χ 2 = 45.7 - 40.02 = 5.68 is a significant improvement Critical χ 2 for 2 df is 5.99 Our χ 2 is smaller and so NO, not worth it

BUT… It doesn’t look like gender is having much of an effect Check SPSS output and see that Wald χ 2 for Gender is 0.527, which has p =.47 Perhaps it wasn’t worth adding both parameters, but it will be worth just adding Age Age has Wald-χ 2 = 4.33, p =.03 When we only add Age, change in χ 2 = 5.5 and we test against χ 2 with df of 1, which has p =.02

Logistic Regression Model Building What if we have a whole host of possible explanatory variables We want to build a model which predicts whether a person will have a disease given a set of explanatory variables SAME as multiple linear regression –Forward selection –Backward elimination –Stepwise –All subsets –Hierarchical

How to know if a model is good All about having a model which does a good job of appropriately classifying participants as having disease or not In particular, model predicts how many people have disease and how many people don’t have the disease The model can be –Correct in two ways Correctly categorise a person who has a disease as having a disease Correctly say no disease when no disease –Incorrect in two ways Incorrectly categorise a person who has a disease as not having a disease Incorrectly say no disease when disease

Accuracy of model Proportion of correct classifications –Number of correct disease participants plus number of correct no disease participants divided by number of participants in total

Sensitivity of model Proportion of ‘successes’ correctly identified –Number of correct no disease participants divided by total number of no disease participants

Specificity of model Proportion of ‘failures’ correctly identified –Number of correct disease participants divided by total number of disease participants

Now…a real example Startup, Makgekgenene and Webster (2007) looked at whether or not the subscales of the Posttraumatic Cognitions Inventory (PTCI) are good predictors of Posttraumatic Stress Disorder (PTSD) Subscales are –Negative Cognitions About SELF –Negative Cognitions about the WORLD –Self-BLAME

Descriptive Results PTSD participants showed higher scores than non-PTSD in all three subscales variables

Multiple Logistic Regression Response variable: –whether or not the participant has PTSD Explanatory variables: –Negative Cognitions About SELF –Negative Cognitions about the WORLD –Self-BLAME

Let’s do the Logistic Regression Open XYZ.sav in SPSS Run the appropriate regression What are the parameter estimates for our three explanatory variables? Which of these are significant (at α =.05)? What are the odds ratios for those that are significant? Anything unusual?

Self-BLAME Self-BLAME has a negative odds ratio. This means that increasing self-blame decreases the chance of having PTSD This is surprising, especially since participants with PTSD showed higher Self-BLAME scores What’s going on?

Self-BLAME and SELF scales Startup et al. (2007) explain this by stating that Self-BLAME is made up of both behavioural and characterological questions SELF, however, may also tap into characterological aspects of self-blame Behavioural self-blame can be considered adaptive. It may help avoid PTSD Characterological self-blame, however, may be detrimental, and lead to PTSD

Suppressor Effect The relationship between SELF and PTSD is strong, and accounts for the negative relationship. This includes the effect of characterological self-blame. The variation in PTSD that is left for Self-BLAME to account for is the positive aspect of the relationship between the Self-BLAME scores and PTSD. The negative aspect of Self-BLAME scores has been suppressed (already accounted for by SELF). The positive aspect of Self-BLAME can now come out.

Homework (haha) Evaluate the model by looking at –Accuracy of model’s predictions –Sensitivity of model’s predictions –Specificity of model’s predictions

Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.

Similar presentations

Presentation on theme: "Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.

Similar presentations

Presentation on theme: "Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect."— Presentation transcript:

Similar presentations

About project

Feedback