Download presentation
Presentation is loading. Please wait.
1
Applied quantitative analysis a practical introduction SOSM-405 (5 cr) Session 6
Tue 8-10, Period III, Jan-Feb 2018 Faculty of Social Sciences / University of Helsinki Teemu Kemppainen
2
Contents Statistical inference, causality, regression: a clarifying example (hopefully) Regression model evaluation & diagnostics some basic notions Don’t worry! Factor analysis, manual sum variables and regression: some important observations Dummy coding
3
Statistical inference - take 2
Population: we first assume H0, e.g. difference between two groups = 0 Random sample -> Sampling error, sampling distribution Non-response Data Different sampling techniques Statistical inference; e.g. p < 0.05 this kind of result in data is quite rare when H0 is true
4
Causality and regression revisited
Mediators (intermediate vars, mechanism) Modifiers (interaction) X, IV, predictor Y, DV, outcome Confounders (control vars, adjustments)
5
Example: a typical cross-sectional survey study
6
RQ’s How is the tenure structure of the estate related to perceived social disorder? Does local social disadvantage mediate the association? To what extent do social interaction and normative regulation of the estate explain why more disadvantaged estates expose their residents to social disorder? two levels, residents in neighbourhoods hierarchical design multi-level regression
7
Diagram - logic of the study
8
Demonstration & statistical inference 1
Univariate descriptives
9
Demonstration & statistical inference 2
Bivariate associations: descriptives
10
Adjusted associations: regression
Ex. cont’d (RQ 1) – regression model elaboration: I (bivariate/crude model) II (confounders) III mediators
11
Regression models Cf. description … ”All models are wrong but some are useful” (George Box) How to evaluate the model? Or, when is our model good / useful? Plain reason: we obtain a clear(er) idea of what is going on e.g. confounders taken into account, possible mechanism elucidated Technical aspect: regression diagnostics
12
Regression diagnostics 1
Is the model correctly specified? Should include all relevant variables and nothing else What is relevant? Confounders vs. mediators. See slide 10 above. Theoretical, not a technical matter! Quality of measurement is sufficient for all variables Reliability and validity (session 4) DV-IV-relationship: linear (straight line) The model may incorporate curvilinearities, e.g. a square term. Multicollinearity loss of power, larger standard errors Outliers, weird cases? Quite rare in typical surveys Important: check residuals…(next slides)
13
Regression diag. 2: residuals
Haslwanter 2013, Residuals for Linear Regression Fit. Wikimedia Commons.
14
Regression diag. 3: residuals
Residuals distributed with a mean of 0, Normal distribution Residuals should be independent from each other important E.g. ESS if all Finns have a negative residual problem! Residuals should be random no information, no structure, just random noise Constant variance (homoskedasticity)
15
Check the UCLA site for more information (if/when you need it)
SPSS and regression: SPSS and reg. diagnostics: Stata and reg. diag:
16
EFA, sum variables and regression 1
Value conservatism Survey items V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Indicator for value conservatism Political left-rihgt scale Indicator for political left-right scale Why? Content validity & reliability better measurement, cf. regression evaluation. Measurement error See e.g. Vehkalahti, slides 4.
17
EFA, sum variables and regression 2
One or many dimensions in EFA? Factor score variables often ”extracted” with Varimax rotation the resulting indicators do not correlate with each other If DV and IV’s in the same EFA…problems! Only IV’s in EFA might be handy! There are ”oblique” rotations…interpretation? Manual sum variables often a good alternative Or taking DV and IV’s in separate EFA’s Cf. the example above
18
Dummy coding Enables a full control over categorical IV’s
Different programs/procedures treat categorical IV’s differently a real nuisance! E.g. education, values 1/2/3/4 Make three dummy vars (0/1): e.g. edu2, edu3, edu4 The omitted category (edu1) is ”the reference” category Interpret results separately for each dummy Difference to the reference (p-value) What differences are important in your analysis? choose the reference category accordingly!
19
Next lecture: logistic regression
How to analyse binary/dichotomic/dummy outcomes E.g. death, medications, crimes (registers) Good self-rated health, moving intentions, insecurity in the neighbourhood (surveys, often used as dummies OLS / LPM / logistic or logit / probit … Discipline, research area etc. Sociology and logistic?
20
Questions? Let’s practise!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.