Tue 8-10, Period III, Jan-Feb 2018

Applied quantitative analysis a practical introduction SOSM-405 (5 cr) Session 6
Tue 8-10, Period III, Jan-Feb 2018 Faculty of Social Sciences / University of Helsinki Teemu Kemppainen

Contents Statistical inference, causality, regression: a clarifying example (hopefully) Regression model evaluation & diagnostics  some basic notions Don’t worry! Factor analysis, manual sum variables and regression: some important observations Dummy coding

Statistical inference - take 2
Population: we first assume H0, e.g. difference between two groups = 0 Random sample -> Sampling error, sampling distribution Non-response Data Different sampling techniques Statistical inference; e.g. p < 0.05  this kind of result in data is quite rare when H0 is true

Causality and regression revisited
Mediators (intermediate vars, mechanism) Modifiers (interaction) X, IV, predictor Y, DV, outcome Confounders (control vars, adjustments)

Example: a typical cross-sectional survey study

RQ’s How is the tenure structure of the estate related to perceived social disorder? Does local social disadvantage mediate the association? To what extent do social interaction and normative regulation of the estate explain why more disadvantaged estates expose their residents to social disorder?  two levels, residents in neighbourhoods  hierarchical design  multi-level regression

Diagram - logic of the study

Demonstration & statistical inference 1
Univariate descriptives

Demonstration & statistical inference 2
Bivariate associations: descriptives

Adjusted associations: regression
Ex. cont’d (RQ 1) – regression model elaboration: I (bivariate/crude model)  II (confounders)  III mediators

Regression models Cf. description … ”All models are wrong but some are useful” (George Box) How to evaluate the model? Or, when is our model good / useful? Plain reason: we obtain a clear(er) idea of what is going on  e.g. confounders taken into account, possible mechanism elucidated Technical aspect: regression diagnostics

Regression diagnostics 1
Is the model correctly specified? Should include all relevant variables and nothing else  What is relevant? Confounders vs. mediators. See slide 10 above. Theoretical, not a technical matter! Quality of measurement is sufficient for all variables  Reliability and validity (session 4) DV-IV-relationship: linear (straight line)  The model may incorporate curvilinearities, e.g. a square term. Multicollinearity  loss of power, larger standard errors Outliers, weird cases?  Quite rare in typical surveys Important: check residuals…(next slides)

Regression diag. 2: residuals
Haslwanter 2013, Residuals for Linear Regression Fit. Wikimedia Commons.

Regression diag. 3: residuals
Residuals distributed with a mean of 0, Normal distribution Residuals should be independent from each other  important E.g. ESS  if all Finns have a negative residual  problem! Residuals should be random  no information, no structure, just random noise Constant variance (homoskedasticity)

Check the UCLA site for more information (if/when you need it)
SPSS and regression: SPSS and reg. diagnostics: Stata and reg. diag:

EFA, sum variables and regression 1
Value conservatism Survey items V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Indicator for value conservatism Political left-rihgt scale Indicator for political left-right scale Why? Content validity & reliability  better measurement, cf. regression evaluation. Measurement error See e.g. Vehkalahti, slides 4.

EFA, sum variables and regression 2
One or many dimensions in EFA? Factor score variables often ”extracted” with Varimax rotation  the resulting indicators do not correlate with each other If DV and IV’s in the same EFA…problems! Only IV’s in EFA  might be handy! There are ”oblique” rotations…interpretation? Manual sum variables often a good alternative Or taking DV and IV’s in separate EFA’s Cf. the example above

Dummy coding Enables a full control over categorical IV’s
Different programs/procedures treat categorical IV’s differently  a real nuisance! E.g. education, values 1/2/3/4 Make three dummy vars (0/1): e.g. edu2, edu3, edu4 The omitted category (edu1) is ”the reference” category Interpret results separately for each dummy Difference to the reference (p-value) What differences are important in your analysis?  choose the reference category accordingly!

Next lecture: logistic regression
How to analyse binary/dichotomic/dummy outcomes E.g. death, medications, crimes (registers) Good self-rated health, moving intentions, insecurity in the neighbourhood (surveys, often used as dummies OLS / LPM / logistic or logit / probit … Discipline, research area etc. Sociology and logistic?

Questions? Let’s practise!

Tue 8-10, Period III, Jan-Feb 2018

Similar presentations

Presentation on theme: "Tue 8-10, Period III, Jan-Feb 2018"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tue 8-10, Period III, Jan-Feb 2018

Similar presentations

Presentation on theme: "Tue 8-10, Period III, Jan-Feb 2018"— Presentation transcript:

Similar presentations

About project

Feedback