‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th.

‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th April 2008

Our experience has shown that the results of statistical models can easily be misrepresented In this seminar we demonstrate that the correct interpretation of results from statistical models often requires more detailed knowledge than is commonly appreciated We illustrate some approaches to best practice in this area This seminar is primarily aimed at quantitative social researchers working with micro-social survey data ‘Interpreting results from statistical modelling – a seminar for social scientists’

Principles of model construction and interpretation y i =  o +  1 X 1 +….+  k X k +u i Today we are interested in  – “What does  tell us?” –“Where’s the action?” –Going beyond “significance and sign”

Statistical Models The idea of generalized linear models (glm) brings together of wealth of disparate topics – thinking of these models under a general umbrella term aids interpretation Now I would say that generalized linear and mixed models (glmm) are the natural extension

Statistical Modelling Process Model formulation [make assumptions] Model fitting [quantify systematic relationships & random variation] (Model criticism) [review assumptions] Model interpretation [assess results] Davies and Dale, 1994 p.5

Building Models REMEMBER – Real data is much more messy, badly behaved (in real life people do odd stuff), models are harder to interpret etc. than the data used in books and at workshops

Building Models Many of you are experienced data analysts (otherwise see our handout) Always be guided by substantive theory (the economists are good at this – but a bit rigid) Consider the “functional form” of the variables (especially the outcome) Start with “main effects” – more complicated models later

How Long are Three Pieces of String?

Some Common Models Continuous YLinear Regression Binary YLogit; Probit Categorical YMultinomial Logit Ordered Cat. YCont. Ratio; Cum Logit Count YPoisson Repeated Binary YPanel Probit (Logit)

I must not use Stepwise Regression

A very very simple example A fictitious data set based on a short steep Scottish hill race (record time 31 minutes; 5 miles and 1,200 feet of ascent) A group of 73 male runners Times 32 mins - 60 minutes mean 42.7; s.d. 8.32 Heights 60 - 70 inches (5 ft to 6 ft) Weights 140 lbs - 161 lbs (10 st to 11 st 7 lb) Everyone finishes (i.e. no censored cases)

A (vanilla) Regression Simple Stata Output… regress time height weight --------------------------------------------------------------------------------------------------------- time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+------------------------------------------------------------------------------------------ height | 1.010251.0813485 12.42 0.000.8480067 1.172495 weight |.7369447.0370876 19.87 0.000.6629759.8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ---------------------------------------------------------------------------------------------------------

A (vanilla) Regression ------------------------------------------------------------------------------------------------- time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------------------------- height | 1.010251.0813485 12.42 0.000.8480067 1.172495 weight |.7369447.0370876 19.87 0.000.6629759.8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 -------------------------------------------------------------------------------------------------- On average (ceteris paribus) a one unit change in weight (lbs) leads to an increase of.74 minutes on the runner’s time

A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.010251.0813485 12.42 0.000.8480067 1.172495 weight |.7369447.0370876 19.87 0.000.6629759.8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ------------------------------------------------------------------------------ On average (ceteris paribus) a one unit change in height (inches) leads to an increase of 1 minute on the runner’s time (remember this is a fell race being too tall does not necessarily help you)

A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.010251.0813485 12.42 0.000.8480067 1.172495 weight |.7369447.0370876 19.87 0.000.6629759.8109135 _cons | -131.5619 6.834839 -19.25 0.000 -145.1936 -117.9303 ------------------------------------------------------------------------------ This is the intercept  0 – In this model it is the time (on average) that a person who was 0 inches and 0 pounds would take?

A (vanilla) Regression ------------------------------------------------------------------------------ time | Coef. Std. Err. t P>|t| -------------+---------------------------------------------------------------- height0 | 1.010251.0813485 12.42 0.000 weight0 |.7369447.0370876 19.87 0.000 _cons | 32.22542.5126303 62.86 0.000 A better parameterized model - height centred at 60 inches; weight centred at 140 lb This is the intercept  0 – In this model it is the time (on average) that a runner who is 60 inches and 140 pounds would take

A (vanilla) Regression regress time height0 weight0, beta time | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- height0 | 1.010251.0813485 12.42 0.000.4659946 weight0 |.7369447.0370876 19.87 0.000.7456028 _cons | 32.22542.5126303 62.86 0.000 Standardized beta coefficients are reported instead of confidence intervals The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1 Beta coefficients can be useful when comparing the effects of variables measured on different scales (i.e. in different units such as inches and pounds)

X Variable Measurement – e.g. Age Linear units (e.g. months) Resolution of measurement (are years better?) Is a squared term appropriate (e.g. age may not be linear in employment models) A banded variable (age bands – allow the direction of the effect to change; between 20-29 women’s employment behaviour might be different to 30-39)

Binary Outcomes Logit model is popular in sociology, social geography, social policy, education etc Probit model is more widely used in economics

Example Drew, D., Gray, J. and Sime, N. (1992) Against the odds: The Education and Labour Market Experiences of Black Young People

The Deviance is sometimes called G 2 -2 * Log Likelihood It has a chi-squared distribution with associated degrees of freedom

The degrees of freedom for the explanatory variable

The estimate. Also known as the ‘coefficient’ ‘log odds’ ‘parameter estimate’ Beta (  ) Measured on the log scale

This the standard error of the estimate Measured on the log scale

This is the odds ratio. It is the exponential (i.e. the anti-log) of the estimate.

Comparison of Odds Greater than 1 “higher odds” Less than 1 “lower odds”

Naïve Odds In this model (after controlling for other factors) White pupils have an odds of 1.0 Afro Caribbean pupils have an odds of 3.2 Reporting this in isolation is a naïve presentation of the effect because it ignores other factors in the model

A Comparison Pupil with 4+ higher passes White Professional parents Male Graduate parents Two parent family Pupil with 0 higher passes Afro-Caribbean Manual parents Male Non-Graduate parents One parent family

Odds are multiplicative 4+ Higher Grades1.01.0 Ethnic Origin1.03.2 Social Class1.00.5 Gender1.01.0 Parental Education1.00.6 No. of Parents1.00.9 Odds1.00.86

Naïve Odds Drew, D., Gray, J. and Sime, N. (1992) warn of this danger…. …Naïvely presenting isolated odds ratios is still widespread (e.g. Connolly 2006 Brit. Ed. Res. Journal 32(1),pp.3-21) We should avoid reporting isolated odds ratios where possible!

Logit scale Generally, people find it hard to directly interpret results on the logit scale – i.e. 

Log Odds, Odds, Probability Log odds converted to odds = exp(log odds) Probability = odds/(1+odds) Odds = probability / (1-probability)

Log Odds, Odds, Probability Oddsln oddsp 99.004.600.99 19.002.940.95 9.002.200.9 4.001.390.8 2.330.850.7 1.500.410.6 1.000.000.5 0.67-0.410.4 0.43-0.850.3 0.25-1.390.2 0.11-2.200.1 0.05-2.940.05 0.01-4.600.01 Odds are asymmetric – beware!

A Simple Stata Example Youth Cohort Study (1990) n= c.14,000 16-17 year olds y=1; pupil has 5+ GCSE passes (grade A*-C) X vars; –Gender; Parents in Service Class (NS-SEC)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Estimates are log odds - Sign = direction; Size = strength

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Standard errors also measured on the logit scale Small standard errors indicate better precision of the coefficient (estimate; beta)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857  s.e. beta; Wald  2 = (  s.e. beta) 2 @ 1 d.f. A very crude test of significance is if  is twice its standard error

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Formal significance test (p values)

Stata output logit Logistic regression Number of obs = 14022 LR chi2(2) = 807.67 Prob > chi2 = 0.0000 Log likelihood = -9260.22 Pseudo R2 = 0.0418 ------------------------------------------------------------------------------------------------------ t0fiveac | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------- +---------------------------------------------------------------------------------------- boys | -.1495507.0349946 -4.27 0.000 -.218139 -.0809625 service class | 1.398813.0526951 26.55 0.000 1.295532 1.502093 _cons | -.309116.0247608 -12.48 0.000 -.3576462 -.2605857 Confidence interval of  (on the logit scale)  ± (1.96 * standard error) e.g. -.15 – (1.96 *.035) Remember if the confidence interval does not include zero  is significant

A Thought on Goodness of Fit Standard linear models: R 2 is an easy, consistent measure of goodness of fit Nested models: change in deviance (G 2 ) follows a chi- square distribution (with associated d.f) Non-nested non-linear models: change in deviance cannot be compared AND there is no direct equivalent of R 2 (e.g. logit model from two different surveys) Various ‘pseudo’ R 2 measures – none take on full 0 – 1 range and technically should not be used to compare non-nested models (but may be adequate in many practical situations)

A Thought on Goodness of Fit Handy file in Stata spost9_ado produces a number of ‘pseudo’ R 2 measures Discussion at http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm Some analysts use Bayesian Information Criteria (BIC) type measures – these evaluate (and favour) parsimony – possibly a good idea for comparing across models

Probit / Logit Convert probit / logit (Probit  * 1.6) or (Logit  /1.6) (Amemiya 1981) Logit or probit – some say logit for a purely discrete Y (e.g. pregnancy) probit appeals to an underlying continuous distribution Some people make silly claims (e.g. the case of unemployment in Germany)

Logit / Probit Logit  Logit s.e.Logit Z Probit  Probit s.e.Probit ZConversion boys -0.150.03-4.27-0.090.02-4.27-0.15 service class 1.400.0526.550.860.0327.421.38 _cons -0.310.02-12.48-0.190.02-12.58-0.31 Generally, substantive inference is the same and models will have similar log likelihoods, Pseudo R 2 etc.

Probit  expressed on the standard cumulative normal scale (  ) Unlike logit a calculator might not have the appropriate function Use software or Excel [=NORMSDIST() ]

Probit Probability 5+GCSE (A*- C) passes girl; non-service class family  (-.19) =.42 boy; non-service class family  (-.19 -.09) =.39 Gender effect.03

Probit Stata has dprobit Here the coefficient is dF/dx i.e. the effect of discrete change of a dummy variable from 0 to 1 Continuous X vars interpreted at their mean Analysts often demonstrate specific values / combinations

Categorical Data (Multinomial Logit) Categorical Y Example YCS 1990 What the pupil was doing in Oct after Yr 11 0 Education 1 Unemployment 2 Training 3 Employment

Multinomial Logit Multinomial logit model = pairs of logits 1 Education / 0 Unemployment 1 Education / 0 Training 1 Education / 0 Employment Base category of y is y=1 for these pairs of models Betas are readily interpreted as in logit

Multinomial Logit Multinomial logistic regression Number of obs = 13925 LR chi2(3) = 80.25 Prob > chi2 = 0.0000 Log likelihood = -12653.444 Pseudo R2 = 0.0032 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1. unemplo~t | girls | -.0840972.0977346 -0.86 0.390 -.2756536.1074591 _cons | -3.041328.0708045 -42.95 0.000 -3.180102 -2.902553 -------------+---------------------------------------------------------------- 2. training | girls | -.245671.0526523 -4.67 0.000 -.3488675 -.1424744 _cons | -1.604877.0369626 -43.42 0.000 -1.677322 -1.532431 -------------+---------------------------------------------------------------- 3. employm~t | girls | -.3961514.0477778 -8.29 0.000 -.4897941 -.3025087 _cons | -1.291088.0325547 -39.66 0.000 -1.354894 -1.227282 ------------------------------------------------------------------------------ (t1dooct4==0. education is the base outcome)

Multinomial logistic regression Number of obs = 13925 LR chi2(3) = 80.25 Prob > chi2 = 0.0000 Log likelihood = -12653.444 Pseudo R2 = 0.0032 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1. unemplo~t | girls | -.0840972.0977346 -0.86 0.390 -.2756536.1074591 _cons | -3.041328.0708045 -42.95 0.000 -3.180102 -2.902553 -------------+---------------------------------------------------------------- (t1dooct4==0. education is the base outcome) Logit Unemployment / Education Logistic regression Number of obs = 10051 LR chi2(1) = 0.74 Prob > chi2 = 0.3899 Log likelihood = -1803.3779 Pseudo R2 = 0.0002 ------------------------------------------------------------------------------ t1dooct4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- girls | -.0840972.0977345 -0.86 0.390 -.2756533.1074589 _cons | -3.041328.0708044 -42.95 0.000 -3.180102 -2.902554

Multinomial Logit Multinomial logit model is NOT an ordinal model 1 Education / 0 Unemployment 1 Education / 0 Training 1 Education / 0 Employment Says nothing about Unemployment / Training, Unemployment / Employment or Training / Employment

Data with Ordinal Outcomes A large amount of data analysed within sociological studies consists of categorical outcome variables that can plausibly be considered as having a substantively interesting order (for example levels of attainment of educational qualifications) Standard log-linear models do not take ordinality into account

Data with Ordinal Outcomes Two different models Continuation Ratio Model Proportional Odds Model Both models have ‘logit’ style  interpretations

Reversing Category Codes – Proportional Odds Model Categories Cut pt A0123 Cut pt B0123 Cut pt C0123 Categories Cut pt A3210 Cut pt B3210 Cut pt C3210 Results reversed (signs) substantive meaning not changed – This can work well with attitude scales!

Reversing Category Codes – Continuation Ratio Model Categories Cut pt A0123 Cut pt B123 Cut pt C23 Categories Cut pt A3210 Cut pt B210 Cut pt C10 Results and substantive meaning are changed – Not Palindromically Invariant

The  that refer to the cut points (or partitions) in these two ordinal models have slightly different interpretations

Some thoughts on these ordinal models Proportional Odds –Palindromic invariance (e.g. attitudinal scores) –Motivated by an appeal to the existence of an underlying continuous and perhaps unobservable random variable – proportional odds Continuation ratio model –Natural base line (hierarchy in data) –Single direction of movement –Categories of Y really are discrete –Y categories denotes a shift or change from one state to another) not a coarse groupings of some finer scale

Poisson Regression Poisson regression is used to fit models to the number of occurrences (counts) of an event –Especially relevant if outcome has few values, or is a rate –Although, in some circumstances counts can reasonably be modelled as continuous outcomes – e.g. a wide range of different counts, and lack of clustering around 0 Examples of the poisson distribution Soldiers kicked to death by horses (Bortkewitsch 1898) Patterns of buzz bomb launch against London WWII (Clarke 1946) Telephone wrong numbers (Thorndike 1926)

Poisson Regression Example Coronary heart disease among male British doctors (Doll & Hill 1966) y (count) deaths / person years X variables age bands; smokers  has logit style interpretation – when exp(  is reported this is often termed an incident rate ratio poisson deaths smokes agecat2-agecat5, exposure(pyears) irr

Some More Complex Models

Panel Analysis Fixed or Random effects estimators Fierce debate –F.E.  will be consistent –R.E. standard errors will be efficient but  may not be consistent –F.E. models can’t estimate time constant X vars –R.E. assumes no correlation between observed X variables and unobserved characteristics

Panel Analysis Fixed or Random effects –Economists tend towards F.E. (attractive property of consistent  ) –With continuous Y – little problem, fit both F.E. and R.E. models and then Hausman test  f.e. /  r.e. (don’t be surprised if it points towards F.E. model) – Some estimators (xtprobit) don’t have F.E. equivalents (xtlogit F.E. is not equivalent to R.E.)

An example Married women’s employment (SCELI Data) y is the woman working yes=1; no=0 x woman has child aged under 1 year I have contrived this illustration….

Probit  s.e.  Child under 1 -1.950.56-1.950.40 Constant 0.670.140.670.10 Log likelihood -54.70-109.39 n 101.00202.00 Pseudo R 2 0.13 Clusters -- Consistent  smaller standard errors (double the sample size) but Stata thinks that there are 202 individuals and not 101 people surveyed in two waves!

Probit  s.e.   Robust Child under 1 -1.950.56-1.950.40-1.950.56 Constant 0.670.140.670.100.670.14 Log likelihood -54.70-109.39 n 101.00202.00 Pseudo R 2 0.13 Clusters --101.00 Consistent  - standard errors are now corrected – Stata knows that there are 101 individuals (i.e. repeated measures)

Probit R.E. Probit  s.e.   Robust  s.e. Child under 1 -1.950.56-1.950.40-1.950.56-19.411.22 Constant 0.670.140.670.100.670.146.390.28 Log likelihood -54.70-109.39 -49.57 n 101.00202.00 Pseudo R 2 0.13 Clusters --101.00 Beware  and standard errors are no longer measured on the same scale Stata knows that there are 101 individuals (i.e. repeated measures)

 in Binary Panel Models The  in a probit random effects model is scaled differently– Mark Stewart suggests  r.e. * (  1-rho) compared with  pooled probit rho (is analogous to an icc) – proportion of the total variance contributed by the person level variance Panel logit models also have this issue!

 in Binary Panel Models Conceptually two types of  in a binary random effects model X is time changing -  is the ‘effect’ for a woman of changing her value of X X is fixed in time -  is analogous to the effect for two women (e.g. Chinese / Indian) with the same value of the random effect (e.g. u i =0) – For fixed in time X Fiona Steele suggests simulating to get more appropriate value of 

Population Ave Model / Marginal Models Time constant X variables are usually analytically important Is a model that accounts for clustering between individuals adequate? Simple pop. average model: logit y x1, cluster(id) Population average models are becoming more popular (Pickles – preference in USA in public health) Marginal Modelling GEE approaches are developing rapidly (e.g. estimating a policy or ‘social group’ difference) When do we need ‘subject’ specific random/fixed effects? When ‘frailty’ or unobserved heterogeneity are important

Conclusions The results of statistical models can easily be misrepresented The correct interpretation of results from statistical models often requires more detailed knowledge than is commonly appreciated Social science analysts should pay more attention to developing the appropriate model context –Knowing about a wider range of glm’s / glmm’s is important –Thinking about the exact interpretation of  will help

‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th.

Similar presentations

Presentation on theme: "‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th.

Similar presentations

Presentation on theme: "‘Interpreting results from statistical modelling – a seminar for social scientists’ Dr Vernon Gayle and Dr Paul Lambert (Stirling University) Tuesday 29th."— Presentation transcript:

Similar presentations

About project

Feedback