Using questionnaires in research: The statistical aspects Martin Kidd
Outline Statistical concepts Planning the project/questionnaire design Data entry Statistical analysis
Statistical concepts Types of data Likert scale Variables & cases Latent variables & items Latent variable structure Correlation Reliability (Cronbach Alpha) Confirmatory Factor Analysis (CFA) Exploratory Factor Analysis (EFA)
Types of data Categorical data Continuous data Fixed number of outcomes Eg: Faculty – Sciences, Agriculture, Arts No order in the outcomes Divide sample into groups Continuous data Measurements (age, height, weight etc)
Types of data Ordinal data Order is important Outcomes take on discrete values Eg Income: 0-1000, 1000-10000, 10000+ Likert scales: 0-disagree, to 5-agree Mostly treated as continuous, some times as categorical
Likert scale Way of quantifying people’s opinions, feelings experiences etc Most common: 1-5 scale or 1-7 Words assigned to each scale: 1 Completely disagree 2 Disagree 3 Neutral 4 Agree 5 Fully agree
Variables & cases Each item on a questionnaire is a variable Other variables can be derived from items, eg: BMI=weight/height2 Each respondent included in the survey is a case
Latent variables Variable that cannot be directly measured: Service quality Economic confidence index Family resilience Emotional intelligence It is measured indirectly through measurable variables like items on a questionnaire
Latent variables Other terms used: Score for latent variable: Constructs Scales & sub scales Dimensions & sub-dimensions Score for latent variable: Sum or avg of measured items
Latent variable structure More than one latent variable evaluated Structure: Indicates which items measure which variable How are the latent variables interrelated
Latent variable structure LV1 LV2 Item1 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9
Correlation Relationship between 2 continuous/ordinal variables Take on values between -1,+1
Correlation r+1: Positive relationship
Correlation r-1: Negative relationship
Correlation r0: No relationship
Reliability – Cronbach alpha How reliable is a set of items in measuring a latent variable If a person is supposed to score high on the latent variable, then the scores for the items should also be high and vice versa. If not, then the items are probably measuring something else
Reliability – Cronbach alpha Thus there should be a high degree of correlation between the items Cronbach alpha: Summary measure of the correlation between the items Upper bound of 1 (perfect correlation) 0.7 a guideline for good reliability
Reliability – Cronbach alpha Issues to keep in mind: The more items, the higher the reliability Will not indicate bi-modality: LV LV1 LV2 item1 item2 item3 item4 item5 item6
Confirmatory factor analysis (CFA) Determines whether a set of data supports a pre-specified latent variable structure Emphasis here is on pre-specified From the underlying theory, a latent model is drawn up Use data to determine whether this theoretical model holds in practice
Confirmatory factor analysis 1 Item1 1 2 2 Item2 LV1 3 3 Item3 4 4 Item4 5 Item5 5 6 Item6 6 LV2 7 7 Item7 8 8 Item8 9 9 Item9
Confirmatory factor analysis Analysis based on covariance matrix of measured items Goodness of fit: How well can the realised covariance matrix be reproduced by the CFA model If goodness of fit not good: Assumptions violated Model not correctly specified
Confirmatory factor analysis Goodness of fit ok: Investigate individual parameter estimates Usually many parameters to be estimated You need lots of data!
Exploratory factor analysis (EFA) Latent structure is derived from the data No prior structure needed, only the number of factors(latent variables) need to be specified There are ways of determining the optimal number of factors as guided by the data
Exploratory factor analysis Explained variance: How much of the variance in the original data was captured by the factors Factor loadings: which items define (or load on) which factors Factor should be interpretable, otherwise its meaningless
Exploratory factor analysis Example:
Planning the project Aim of project: Drives all other activities What statistical analyses are necessary? Content of the questionnaire dependent on the aim
Questionnaire design Is the purpose to design a measuring instrument for future use? Must make provision for repeated surveys to test & re-test questionnaire changes Is it going to be used only once? Use existing instruments! Be aware of possible danger of weak reliability
Questionnaire design Think about questionnaire validity: Content validity Reliability Discriminant validity Etc. Too many people ask these questions after the survey
Questionnaire design Types of responses for questions: List of choices (categorical, ordinal) Only one option can be selected Be as complete as possible Include “other” category only when necessary Number of options – Too many options might fragment the data Example: Gender: male/female Divide responses into 2 groups
Questionnaire design Types of responses Multiple selections More than one option can be selected In the analysis, each option becomes a variable Which of the following treatments have you had: Speech therapy Physiotherapy Psychotherapy
Questionnaire design Types of responses Continuous data Open ended If an accurate number can be filled in, don’t use categories Eg age: fill in exact age, not age categories Open ended If list of choices type, try not to leave open ended Time consuming to analyse
Questionnaire design Types of responses Likert scales: How many options? 4pt, 5pt, 7pt scale? Be careful of the wording Make sure its ordinal !!! 1 Completely disagree 2 Disagree 3 No opinion 4 Agree 5 Completely agree 1 Completely disagree 2 Disagree 3 Neutral 4 Agree 5 Completely agree
Questionnaire design Types of responses Likert scales: Be careful with the coding: 1 Completely disagree 2 Disagree 3 Neutral 4 Agree 5 Completely agree 6 Not applicable Usually treated as missing
Data entry Mostly done in Excel A column for each item/variable A row for each respondent First row (only first row) reserved for variable names Data for 1st respondent in row 2 Short concise names for variables
Data entry First column usually a respondent number For possible future back reference Leave missing values blank Categorizing open ended questions: Be consistent with spelling Agric & Agriscience will be 2 different faculties Be careful with the spacebar: Leading and trailing spaces can cause problems Use Excel autofilter to clean up data Use Excel freeze panes
Statistical analysis Statistical analysis can be divided into two phases: Preparatory (measurement) phase Reliability analysis Factor analysis Confirmatory factor analysis Main analysis phase Relationships between latent variables & other measured variables
Main analysis phase Correlation analysis Regression analysis Cross tabulation ANOVA Etc. Quality dependent on outcomes of preparatory phase
Data Analysis (measurement phase): Cronbach alpha When do we do reliability analysis (calculate Cronbach alpha): When there are constructs underlying the items and you know what they are When you are not worried about bi-modality When you don’t have enough responses for CFA
Data Analysis (measurement phase) : Cronbach alpha Cronbach alpha: What to watch out for: Many items inflate the Cronbach alpha Does not indicate bi-modality Negatively phrased questions need to be reversed scored When using existing measurement instrument, be sure to have correct instructions Alpha is too low: What now??
Data Analysis (measurement phase) : Cronbach alpha Alpha too low: When can we use “alpha if deleted” column?
Data Analysis (measurement phase) : Cronbach alpha When can we use “alpha if deleted” column? Strictly speaking only when you are designing a measurement instrument The adapted questionnaire can be re-tested on an independent set of data Otherwise you are being lead by the current data without any means of verification
Data Analysis (measurement phase) : Cronbach alpha Question remains: What do we do when alpha is low? Results from the main analysis could be degraded Eg correlations will be less pronounced You might not find significant differences between groups where there should be
Data Analysis (measurement phase) : CFA When do we do CFA: When a theoretical latent structure has been postulated When you have enough data When you are worried about bi-modality When you are in a position to adapt and re-test on independent data
Data Analysis (measurement phase) : CFA Things to watch out for: Similar issues to Cronbach alpha What do we do when the fit indices are not good? There are limited guidelines on how the CFA results can be used to update the latent model Fact remains, you must be in a position to verify on independent data
Data Analysis (measurement phase) : EFA When do we do EFA: When there is no latent structure When the items are generated independently of the researcher Eg focus group discussions Does it make sense to do EFA on a questionnaire designed from the basis of a theoretical structure? I don’t think so.
Data Analysis (measurement phase) : EFA However it appears to be common practice How do the new latent structure (derived from the data) tie in with the original aim of the project? The new latent structure has to be verified on independent data Or justified from a theoretical point of view
Data Analysis (measurement phase) : EFA Can one calculate Cronbach alphas or do CFA on EFA results? Not using the same set of data: Results will be too optimistic
Data Analysis (measurement phase) : EFA Why are results optimistic: Lets say we have items X1..Xp Let S1 to Sk represent all possible latent structures Lets say we have 2 independent data sets EFA done on 1st Let 1i be the Cronbach alpha of Si for the 1st data set and 2i for the 2nd data set
Data Analysis (measurement phase) : EFA Data set 1(EFA) Data set 2 S1 11 S1 21 S2 12 S2 22 EFA result Si 1i Max alpha Si 2i (2i< 1i) Sj 1j Sj 2j Max alpha Sk 1k Sk 2k
Summary Careful planning and thought Try and sort out analysis issues before designing the questionnaire and doing the survey I have hammered on independent data: If possible, plan for more than one survey Divide data into train/test data Thank you for your attention