1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, 1 2010 Gerrit Rooks Sociology of Innovation.

Slides:



Advertisements
Similar presentations
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
& dding ubtracting ractions.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Solve Multi-step Equations
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
Break Time Remaining 10:00.
The basics for simulations
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
Continued Psy 524 Ainsworth
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
1..
Adding Up In Chunks.
Statistical Analysis SC504/HS927 Spring Term 2008
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Lecture Unit Multiple Regression.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Subtraction: Adding UP
: 3 00.
5 minutes.
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Static Equilibrium; Elasticity and Fracture
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Essential Cell Biology
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
PSSA Preparation.
& dding ubtracting ractions.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Essential Cell Biology
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Regression and Model Building
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
9. Two Functions of Two Random Variables
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Advanced Methods and Models in Behavioral Research – 2011/2012 Advanced Models and Methods in Behavioral Research Chris Snijders
Logistic Regression Analysis Gerrit Rooks
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Presentation transcript:

1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation Innovation Sciences & Industrial Engineering Phone:

This Lecture Why logistic regression analysis? The logistic regression model Estimation Goodness of fit An example 2

3 What's the difference between 'normal' regression and logistic regression? Regression analysis: –Relate one or more independent (predictor) variables to a dependent (outcome) variable

4 What's the difference between 'normal' regression and logistic regression? Often you will be confronted with outcome variables that are dichotomic: –success vs failure –employed vs unemployed –promoted or not –sick or healthy –pass or fail an exam

5 Example Relationship between hours studied for exam and success Hours# Failed exam # Passed exam? Total # students Prob. pass exam

6 Linear regression analysis Why is this wrong?

7 Logistic Regression The better alternative

8

9 The logistic regression equation predicting probabilities predicted probability (always between 0 and 1) similar to regression analysis

10 The Logistic function Sometimes authors rearrange the model or also

11 How do we estimate coefficients? Maximum-likelihood estimation Parameters are estimated by `fitting' models, based on the available predictors, to the observed data The chosen model fits the data best, i.e. is closest to the data Fit is determined by the so-called log likelihood statistic

12 Maximum likelihood estimation The log-likelihood statistic Large values of LL indicate poor fit of the model HOWEVER, THIS STATISTIC CANNOT BE USED TO EVALUATE THE FIT OF A SINGLE MODEL

13 Quantity of Study HoursOutcome An example to illustrate maximum likelihood and the log likelihood statistic Suppose we know hours spent studying and the outcome of an exam

14 Quantity of Study HoursOutcome Predicted probability (b 0 =0; b 1 = 0.05) Predicted probability (b 0 =-6.44; b 1 = 0.39) In ML different values for the parameters are `tried' Lets look at two possibilities: 1; b 0 = 0 & b 1 = 0.05; 2, b 0 = 0 & b 1 = 0.05

15 Quantity of Study HoursOutcome Predicted probability (b0=0; b1 = 0.05) LL (b0=0; b1 = 0.05) We are now able to calculate the log likelihood statistic

16 Outcome Pr (b0=0; b1 = 0.05) LL (b0=0; b1 = 0.05) Pr (b0=-6.44; b1 = 0.39) LL (b0=-6.44; b1 = 0.39) Two models and their log likelihood statistic Based on a clever algorithm the model with the best fit (LL closest to 0) is chosen

17 After estimation How do I determine significance? Obviously SPSS does all the work for you How to interpret output of SPSS Two major issues 1.Overall model fit –Between model comparisons –Pseudo R-square –Predictive accuracy / classification test 2.Coefficients –Wald test –Likelihood ratio test –Odds ratios

18 Model fit: Between model comparison The log-likelihood ratio test statistic can be used to test the fit of a model The test statistic has a chi-square distribution Model fit reduced model Model fit full model

19 Model fit The log-likelihood ratio test statistic can be used to test the fit of a model Model fit reduced modelModel fit full model

Between model comparison Estimate a null model Baseline model Estimate an improved model This model contains more variables Assess the difference in - 2LL between the models This difference follows a chi-square distribution degrees of freedom = # estimated parameters in proposed model – # estimated parameters in null model 20 Model fit reduced model Model fit full model

21 Overall model fit R and R 2 R2 in multiple regression is a measure of the variance explained by the model SS due to regression Total SS

22 Overall model fit pseudo R 2 Just like in multiple regression, logit R 2 ranges 0.0 to 1.0 –Cox and Snell cannot theoretically reach 1 –Nagelkerke adjusted so that it can reach 1 log-likelihood of model before any predictors were entered log-likelihood of the model that you want to test NOTE: R2 in logistic regression tends to be (even) smaller than in multiple regression

23 What is a small or large R and R 2 ? Strength of correlation Small0.10 to 0.29 Medium0.30 to 0.49 Large0.50 to 1.00

24 Overall model fit Classification table How well does the model predict outcomes? This means that we assume that if our model predicts that a player will score with a probability of.51 (above.5) the prediction will be a score (lower than.50 is a miss). spss output

25 Testing significance of coefficients The Wald statistic: not really good In linear regression analysis this statistic is used to test significance In logistic regression something similar exists however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) t-distribution standard error of estimate estimate

26 Likelihood ratio test an alternative way to test significance of a coefficient To avoid type II errors for some variables you best use the Likelihood ratio test model with variablemodel without variable

27 Before we go to the example A recap Logistic regression –dichotomous outcome –logistic function –log-likelihood / maximum likelihood Model fit –likelihood ratio test (compare LL of models) –Pseudo R-square –Classification table –Wald test

28 Illustration with SPSS Penalty kicks data, variables: –Scored: outcome variable, 0 = penalty missed, and 1 = penalty scored –Pswq: degree to which a player worries –Previous: percentage of penalties scored by a particulare player in their career

29 SPSS OUTPUT Logistic Regression Tells you something about the number of observations and missings

30 Block 0: Beginning Block this table is based on the empty model, i.e. only the constant in the model these variables will be entered in the model later on

31 Block 1: Method = Enter Block is useful to check significance of individual coefficients, see Field New model this is the test statistic after dividing by -2 Note: Nagelkerke is larger than Cox

32 Block 1: Method = Enter (Continued) Predictive accuracy has improved (was 53%) estimates standard error estimates significance based on Wald statistic change in odds

33 How is the classification table constructed? oops wrong prediction

34 How is the classification table constructed? pswqpreviousscoredPredict. prob

35 How is the classification table constructed? pswqprevio us scoredPredict. prob. predict ed