Advanced Methods and Models in Behavioral Research – 2011/2012 Advanced Models and Methods in Behavioral Research Chris Snijders

Slides:



Advertisements
Similar presentations
1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.
Advertisements

Things to do in Lecture 1 Outline basic concepts of causality
Continued Psy 524 Ainsworth
Advanced Methods and Models in Behavioral Research – 2009/2010 Advanced Methods and Models in Behavioral Research Chris Snijders Uwe Matzat Gerrit Rooks.
Advanced Methods and Models in Behavioral Research – 2010/2011 Advanced Methods and Models in Behavioral Research Chris Snijders Gerrit Rooks 3 ects
Brief introduction on Logistic Regression
Logistic Regression.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Advanced Methods and Models in Behavioral Research – 2014 Been there / done that: Stata Logistic regression (……) Conjoint analysis Coming up: Multi-level.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Introduction to Statistics: Chapter 8 Estimation.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Review.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
STAT E-150 Statistical Methods
An Illustrative Example of Logistic Regression
Chapter 8: Bivariate Regression and Correlation
Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects.
Inference for regression - Simple linear regression
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 7-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Multinomial Logistic Regression Basic Relationships
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Chap 7-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 7 Estimating Population Values.
Multiple Logistic Regression STAT E-150 Statistical Methods.
AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance.
Logistic Regression Analysis Gerrit Rooks
8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Stats Methods at IC Lecture 3: Regression.
Nonparametric Statistics
The simple linear regression model and parameter estimation
Regression Analysis.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Correlation, Regression & Nested Models
Multiple logistic regression
Nonparametric Statistics
CHAPTER 26: Inference for Regression
Simple Linear Regression
CHAPTER 12 More About Regression
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Presentation transcript:

Advanced Methods and Models in Behavioral Research – 2011/2012 Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects (=studyguide) literature: Field book + separate course material laptop exam (+ assignments) ToDo: Studyweb! Enroll in 0a611

Advanced Methods and Models in Behavioral Research – 2011/2012 The methods package MMBR (6 ects) –Blumberg: algemeen: vraagstelling, betrouwbaarheid, validiteit etc –Field: SPSS: factor analyse, multiple regressie, ANcOVA, sample size etc AMMBR (3 ects) - Field (deels): logistische regressie - literatuur via website: conjoint analysis  multi-level regression

Advanced Methods and Models in Behavioral Research – 2011/2012 Models and methods: topics t-test, Cronbach's alpha, etc multiple regression, analysis of (co)variance and factor analysis logistic regression conjoint analysis / repeated measures –Stata next to SPSS –“Finding new questions” –Practice data collection (a bit) In the background: “now you should be able to do it on your own”

Advanced Methods and Models in Behavioral Research – 2011/2012 Methods in brief (1) Logistic regression: target Y, predictors X i. Y is a binary variable (0/1). - Why not just multiple regression? - Interpretation is more difficult - goodness of fit is non-standard -...

Advanced Methods and Models in Behavioral Research – 2011/2012 Methods in brief (2) Conjoint analysis Underlying assumption: for each user, the "utility" of a product can be written as U(x 1,x 2,..., x n ) = c 0 + c 1 x c n x n -10 Euro p/m - 2 years fixed - free phone -... How attractive is this offer to you?

Conjoint analysis as an “in between method” Between Which phone do you like and why? What would your favorite phone be? And: Let’s keep track of what people buy. Advanced Methods and Models in Behavioral Research – 2011/2012

Coming up with new ideas (3) Advanced Methods and Models in Behavioral Research – 2011/2012 “More research is necessary” But on what? “More research is necessary” But on what? YOU: come up with sensible new ideas, given previous research

Stata next to SPSS Advanced Methods and Models in Behavioral Research – 2011/2012 It’s just better (faster, better written, more possibilities, better programmable …) Multi-level regression is much easier than in SPSS It’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments) More stable (I think) Supports OSX as well… (anybody?)

But … Output less “polished” It takes some extra work to get you started The Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part) (and it’s not campus software, but subfaculty software) Installation … Advanced Methods and Models in Behavioral Research – 2011/2012

Advanced Methods and Models in Behavioral Research – 2008/ Make sure to enroll in studyweb (0a611) Read the Field chapter on logistic regression Advanced Methods and Models in Behavioral Research Advanced Methods and Models in Behavioral Research – 2011/2012

Logistic Regression Analysis That is: your Y variable is 0/1: now what?

The main points 1.Why do we have to know and sometimes use logistic regression? 2.What is the underlying model? What is maximum likelihood estimation? 3.Logistics of logistic regression analysis 1.Estimate coefficients 2.Assess model fit 3.Interpret coefficients 4.Check residuals 4.An SPSS example

Advanced Methods and Models in Behavioral Research – 2011/2012

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD) IDageCHD …

A graphic representation of the data CHD Age

Let’s just try regression analysis pr(CHD|age) = *Age

... linear regression is not a suitable model for probabilities pr(CHD|age) = *Age

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)

A nonlinear model is probably better here

Something like this

This is the logistic regression model

Predicted probabilities are always between 0 and 1 similar to classic regression analysis

Side note: this is similar to MMBR … Advanced Methods and Models in Behavioral Research – 2011/2012 Suppose Y is a percentage (so between 0 and 1). Then consider …which will ensure that the estimated Y will vary between 0 and 1 and after some rearranging this is the same as

… (continued) Advanced Methods and Models in Behavioral Research – 2011/2012 And one “solution” might be: -Change all Y values that are 0 to Change all Y values that are 1 to Now run regression on log(Y/(1-Y)) … … but that doesn’t work so well …

Logistics of logistic regression 1.How do we estimate the coefficients? 2.How do we assess model fit? 3.How do we interpret coefficients? 4.How do we check regression assumptions?

Kinds of estimation in regression Ordinary Least Squares (we fit a line through a cloud of dots) Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator. OLS does not work well in logistic regression, but maximum likelihood estimation does … Advanced Methods and Models in Behavioral Research – 2011/2012

Maximum likelihood estimation Method of maximum likelihood yields values for the unknown parameters which maximize the probability of obtaining the observed set of data. Unknown parameters

Maximum likelihood estimation First we have to construct the likelihood function (probability of obtaining the observed set of data). Likelihood = pr(obs 1 )*pr(obs 2 )*pr(obs 3 )…*pr(obs n ) Assuming that observations are independent

Log-likelihood For technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities) LL= ln[pr(obs 1 )]+ln[pr(obs 2 )]+ln[pr(obs 3 )]…+ln[pr(obs n )]

Note: optimizing log-likelihoods is difficult It’s iterative (“searching the landscape”)  it might not converge  it might converge to the wrong answer Advanced Methods and Models in Behavioral Research – 2011/2012

Estimation of coefficients: SPSS Results

This function fits very well, other values of b0 and b1 give worse results

Illustration 1: suppose we chose.05X instead of.11X

Illustration 2: suppose we chose.40X instead of.11X

Logistics of logistic regression Estimate the coefficients Assess model fit –Between model comparisons –Pseudo R 2 (similar to multiple regression) –Predictive accuracy Interpret coefficients Check regression assumptions

37 Model fit: comparisons between models The log-likelihood ratio test statistic can be used to test the fit of a model The test statistic has a chi-square distribution reduced model full model

Between model comparisons: likelihood ratio test reduced model full model The model including only an intercept Is often called the empty model. SPSS uses this model as a default.

 This is the test statistic, and it’s associated significance Between model comparison: SPSS output

40 Overall model fit pseudo R 2 Just like in multiple regression, pseudo R 2 ranges 0.0 to 1.0 –Cox and Snell cannot theoretically reach 1 –Nagelkerke adjusted so that it can reach 1 log-likelihood of model before any predictors were entered log-likelihood of the model that you want to test NOTE: R 2 in logistic regression tends to be (even) smaller than in multiple regression

41 Overall model fit: Classification table We correctly predict 74% of our observations

42 Overall model fit: Classification table 14 cases had a CHD while according to our model this shouldnt have happened

43 Overall model fit: Classification table 12 cases didn’t have a CHD while according to our model this should have happened

Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients –Direction –Significance –Magnitude Check regression assumptions

45 Interpreting coefficients: direction We can rewrite our model as follows: 

46 Interpreting coefficients: direction original b reflects changes in logit: b>0 implies positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 implies a positive relationship

47 3. Interpreting coefficients: magnitude The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful. exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect

Magnitude of association: Percentage change in odds ProbabilityOdds 25% %1 75%3

For the age variable: –Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,117” –A one unit increase in age will result in 12% increase in the odds that the person will have a CHD –So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association

Another way to get an idea of the size of effects: Calculating predicted probabilities For somebody of 20 years old, the predicted probability is.04 For somebody of 70 years old, the predicted probability is.91

But this gets more complicated when you have more than a single X-variable (see blackboard) Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! Advanced Methods and Models in Behavioral Research – 2011/2012

Testing significance of coefficients In linear regression analysis this statistic is used to test significance In logistic regression something similar exists however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) t-distribution standard error of estimate estimate Note: This is not the Wald Statistic SPSS presents!!!

Interpreting coefficients: significance SPSS presents While Andy Field thinks SPSS presents this (at least in the 2nd version of the book):

Advanced Methods and Models in Behavioral Research – 2011/2012

Logistic regression Y = 0/1 Multiple regression (or ANcOVA) is not right You consider either the odds or the log(odds) It is estimated through “maximum likelihood” Interpretation is a bit more complicated than normal Advanced Methods and Models in Behavioral Research – 2011/2012

Advanced Methods and Models in Behavioral Research – 2008/ Make sure to enroll in studyweb (0a611) Read the Field chapter on logistic regression Advanced Methods and Models in Behavioral Research Advanced Methods and Models in Behavioral Research – 2011/2012

Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

Checking assumptions Influential data points & Residuals –Follow Samanthas tips Hosmer & Lemeshow –Divides sample in subgroups –Checks whether there are differences between observed and predicted between subgroups –Test should not be significant, if so: indication of lack of fit

Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference)

Examining residuals in lR 1.Isolate points for which the model fits poorly 2.Isolate influential data points

Residual statistics

Cooks distance Means square error Number of parameter Prediction for j from all observations Prediction for j for observations excluding observation i

64 Illustration with SPSS Penalty kicks data, variables: –Scored: outcome variable, 0 = penalty missed, and 1 = penalty scored –Pswq: degree to which a player worries –Previous: percentage of penalties scored by a particulare player in their career

65 SPSS OUTPUT Logistic Regression Tells you something about the number of observations and missings

66 Block 0: Beginning Block this table is based on the empty model, i.e. only the constant in the model these variables will be entered in the model later on

67 Block 1: Method = Enter Block is useful to check significance of individual coefficients, see Field New model this is the test statistic after dividing by -2 Note: Nagelkerke is larger than Cox

68 Block 1: Method = Enter (Continued) Predictive accuracy has improved (was 53%) estimates standard error estimates significance based on Wald statistic change in odds

69 How is the classification table constructed? # cases not predicted corrrectly # cases not predicted corrrectly

70 How is the classification table constructed? pswqpreviousscoredPredict. prob

71 How is the classification table constructed? pswqprevio us scoredPredict. prob. predict ed