Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects.

Similar presentations


Presentation on theme: "Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects."— Presentation transcript:

1 Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders c.c.p.snijders@gmail.com 3 ects http://www.chrissnijders.com/ammbr (=studyguide) literature: Field book + separate course material laptop exam (+ assignments) ToDo (if not done yet): Enroll in 0a611

2 Advanced Methods and Models in Behavioral Research – The methods package MMBR (6 ects) –Blumberg: questions, reliability, validity, research design –Field: SPSS: factor analysis, multiple regression, ANcOVA, sample size etc AMMBR (3 ects) - Field (1 chapter): logistic regression - literature through website: conjoint analysis  multi-level regression

3 Advanced Methods and Models in Behavioral Research – Models and methods: topics t-test, Cronbach's alpha, etc multiple regression, analysis of (co)variance and factor analysis logistic regression conjoint analysis / repeated measures –Stata next to SPSS –“Finding new questions” –Some data collection In the background: “now you should be able to deal with data on your own”

4 Advanced Methods and Models in Behavioral Research – Methods in brief (1) Logistic regression: target Y, predictors X i. Y is a binary variable (0/1). - Why not just multiple regression? - Interpretation is more difficult - goodness of fit is non-standard -... (and it is a chapter in Field)

5 Advanced Methods and Models in Behavioral Research – Methods in brief (2) Conjoint analysis Underlying assumption: for each user, the "utility" of an offer can be written as U(x 1,x 2,..., x n ) = c 0 + c 1 x 1 +... + c n x n -10 Euro p/m - 2 years fixed - free phone -... How attractive is this offer to you?

6 Conjoint analysis as an “in between method” Between Which phone do you like and why? What would your favorite phone be? And: Let’s keep track of what people buy. We have: Advanced Methods and Models in Behavioral Research –

7 Local Master Thesis example: Fiber to the home Speed: really fast Price: sort of high Installation: free! Your neighbors:are in! How attractive is this to you? Advanced Methods and Models in Behavioral Research – (Roel Schuring)

8 Coming up with new ideas (3) Advanced Methods and Models in Behavioral Research – “More research is necessary” But on what? “More research is necessary” But on what? YOU: come up with sensible new ideas, given previous research

9 Stata next to SPSS Advanced Methods and Models in Behavioral Research – It’s just better (faster, better written, more possibilities, better programmable …) Multi-level regression is much easier than in SPSS It’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments) More stable BTW Supports OSX as well… (anybody?)

10 Every advantage has a disadvantage Output less “polished” It takes some extra work to get you started The Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part) (and it’s not campus software, but subfaculty software) Installation … Advanced Methods and Models in Behavioral Research –

11 If on Windows, try downloading www.chrissnijders.com/ammbr/TUeStata12-zip.exe Advanced Methods and Models in Behavioral Research –

12 Logistic Regression Analysis That is: your Y variable is 0/1: Now what?

13 The main points 1.Why do we have to know and sometimes use logistic regression? 2.What is the underlying model? What is maximum likelihood estimation? 3.Logistics of logistic regression analysis 1.Estimate coefficients 2.Assess model fit 3.Interpret coefficients 4.Check residuals 4.An example (with some output)

14 Advanced Methods and Models in Behavioral Research

15 Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)

16 A graphic representation of the data CHD Age

17 Let’s just try regression analysis pr(CHD|age) = -.54 +.022*Age

18 ... linear regression is not a suitable model for probabilities pr(CHD|age) = -.54 +.0218107*Age

19 In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)

20 A nonlinear model is probably better here

21 Something like this

22 This is the logistic regression model

23 Predicted probabilities are always between 0 and 1 similar to classic regression analysis

24 Side note: this is similar to MMBR … Advanced Methods and Models in Behavioral Research – Suppose Y is a percentage (so between 0 and 1). Then consider …which will ensure that the estimated Y will vary between 0 and 1 and after some rearranging this is the same as

25 … (continued) Advanced Methods and Models in Behavioral Research – And one “solution” might be: -Change all Y values that are 0 to 0.001 -Change all Y values that are 1 to 0.999 Now run regression on log(Y/(1-Y)) … … but that really is sort of higgledy-piggledy …

26 Logistics of logistic regression 1.How do we estimate the coefficients? 2.How do we assess model fit? 3.How do we interpret coefficients? 4.How do we check regression assumptions?

27 Kinds of estimation in regression Ordinary Least Squares (we fit a line through a cloud of dots) Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ). Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …) Advanced Methods and Models in Behavioral Research –

28 Maximum likelihood estimation Method of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data Unknown parameters

29 Maximum likelihood estimation First we have to construct the “likelihood function” (probability of obtaining the observed set of data). Likelihood = pr(obs 1 )*pr(obs 2 )*pr(obs 3 )…*pr(obs n ) Assuming that observations are independent

30 Log-likelihood For technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities) LL= ln[pr(obs 1 )]+ln[pr(obs 2 )]+ln[pr(obs 3 )]…+ln[pr(obs n )]

31 Some subtleties In OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood) Advanced Methods and Models in Behavioral Research –

32 And this is what it looks like … Advanced Methods and Models in Behavioral Research –

33 Note: optimizing log-likelihoods is difficult It’s iterative (“searching the landscape”)  it might not converge  it might converge to the wrong answer Advanced Methods and Models in Behavioral Research –

34 Nasty implication: extreme cases should be left out (some handwaving here) Advanced Methods and Models in Behavioral Research –

35 Example (with some SPSS output) Advanced Methods and Models in Behavioral Research –

36 Estimation of coefficients: SPSS Results

37

38 This function fits best: other values of b0 and b1 give worse results (that is, other values have a smaller likelihood value)

39 Illustration 1: suppose we chose.05X instead of.11X

40 Illustration 2: suppose we chose.40X instead of.11X

41 Logistics of logistic regression Estimate the coefficients (and their conf.int.) Assess model fit –Between model comparisons –Pseudo R 2 (similar to multiple regression) –Predictive accuracy Interpret coefficients Check regression assumptions

42 42 Model fit: comparisons between models The log-likelihood ratio test statistic can be used to test the fit of a model The test statistic has a chi-square distribution reduced model full model NOTE This is sort of similar to the variance decomposition tables you see in MR!

43 Advanced Methods and Models in Behavioral Research

44 Between model comparisons: the likelihood ratio test reduced model full model The model including only an intercept Is often called the empty model. SPSS uses this model as a default.

45  This is the test statistic, and it’s associated significance Between model comparison: SPSS output

46 46 Overall model fit pseudo R 2 Just like in multiple regression, pseudo R 2 ranges 0.0 to 1.0 –Cox and Snell cannot theoretically reach 1 –Nagelkerke adjusted so that it can reach 1 log-likelihood of model before any predictors were entered log-likelihood of the model that you want to test NOTE: R 2 in logistic regression tends to be (even) smaller than in multiple regression

47 47 Overall model fit: Classification table We predict 74% correctly

48 48 Overall model fit: Classification table 14 cases had a CHD while according to our model this shouldnt have happened

49 49 Overall model fit: Classification table 12 cases didn’t have a CHD while according to our model this should have happened

50 Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients –Direction –Significance –Magnitude Check regression assumptions

51 51 The Odds Ratio We had: And after some rearranging we can get

52 Magnitude of association: Percentage change in odds ProbabilityOdds 25%0.33 50%1 75%3

53 53 Interpreting coefficients: direction original b reflects changes in logit: b>0 implies positive relationship exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationship

54 54 3. Interpreting coefficients: magnitude The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful. exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect

55 For the age variable: –Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,12” –A one unit increase in age will result in 12% increase in the odds that the person will have a CHD –So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association Ref=0Ref=1

56 Another way to get an idea of the size of effects: Calculating predicted probabilities For somebody of 20 years old, the predicted probability is.04 For somebody of 70 years old, the predicted probability is.91

57 But this gets more complicated when you have more than a single X-variable (see blackboard) Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!) Advanced Methods and Models in Behavioral Research –

58 Testing significance of coefficients In linear regression analysis this statistic is used to test significance In logistic regression something similar exists however, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely) t-distribution standard error of estimate estimate Note: This is not the Wald Statistic SPSS presents!!!

59 Interpreting coefficients: significance SPSS presents While Andy Field thinks SPSS presents this (at least in the 2nd version of the book):

60 Advanced Methods and Models in Behavioral Research –

61 Logistics of logistic regression Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions

62 Checking assumptions 0. Independent data points (no tests for that, just think about your data) Problem: likelihood function is wrong otherwise + confidence intervals too small 1. Influential data points & Residuals –Follow Samanthas tips in Field; we will get back to this later 2. No multi-collinearity (Stata: “collin”) 3. All relevant variables included (Stata: “linktest”, nb regression: “ovtest”) 4. Hosmer & Lemeshow (Stata: “estat gof”) –Divides sample in subgroups –Checks whether there are differences between observed and predicted between subgroups –Test should not be significant, if so: indication of lack of fit

63 1. Residual statistics: Field’s rules of thumb

64 1. Examining residuals in logistic regression Isolate points for which the model fits poorly Isolate influential data points

65 2. No multi-collinearity Problem = same as in regression, the net effect of two (or more) collinear variables will be zero (see MMBR) In regression: Stata-command is “vif”: reg y x// Stata’s regression command vif// the variance-inflation-factors In logistic regression: Stata-command is “collin” logit y x// Stata’s logit regr. Command collin// the variance-inflation-factors Advanced Methods and Models in Behavioral Research –

66 NOTE: “collin” is not standard Stata help... (if you know and have the command) net search … (otherwise) findit … (otherwise) Advanced Methods and Models in Behavioral Research –

67 3. All relevant variables included: Model specification Note that this refers to the inclusion of given variables (not the inclusion of totally other variables) (compare Stata’s “ovtest” in multiple regression) In Stata: linktest Many specification tests consider whether including y- hat and (y-hat)^2 would improve your model. If yes  keep adding transformation of your variables Advanced Methods and Models in Behavioral Research –

68 4. Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference)

69 Advanced Methods and Models in Behavioral Research –

70 Logistic regression Y = 0/1 Multiple regression (or ANcOVA) is not right You consider either the odds or the log(odds) It is estimated through “maximum likelihood” Interpretation is a bit more complicated than normal Assumption testing is a bit more concrete than in multiple regression (also because now we can do this with Stata) Advanced Methods and Models in Behavioral Research –

71 8 groups – run a logistic regression in Stata Create groups, choose a data set Create a do-file that reads in the data, and runs a logistic regression (along the lines of the commands in the example file, BUT WITH MORE COMMENTS ABOUT WHAT YOU FIND) Start now, deliver by this Saturday Participation mandatory Advanced Methods and Models in Behavioral Research –


Download ppt "Advanced Methods and Models in Behavioral Research – Advanced Models and Methods in Behavioral Research Chris Snijders 3 ects."

Similar presentations


Ads by Google