Download presentation
Presentation is loading. Please wait.
Published byAmos Calvin Sutton Modified over 9 years ago
1
Questionnaire Design and Evaluation Mark Shevlin
2
Type of Psychological Tests n Psychological tests can be used to measure –General ability (IQ) –Specific abilities –Attitudes –Interests –Clinical pathology –Personality traits
3
Type of Psychological Tests n The guidelines in this lecture relate to –Attitudes –Interests –Personality traits n Always make sure that there is not an already published scale available.
4
Guidelines in Scale Construction n What do you want to measure n Generate an item pool n Decide on appropriate response format n Initial item review and development sample n Evaluate items n Optimise scale content
5
What do you want to measure n You will be attempting to measure a variable, a dimension along which people are different. n The variable will be latent, unobservable variables. n Developing a scale requires a clear and concise understanding of what you are trying to measure.
6
Level of generality n Variables can be measured a different levels of specificity. n Specificity refers to the breadth of the construct under consideration. n Some measures tap a very specific small group of behaviours (eg. Punctuality). n Some measures tap a very broad and general group of behaviours (eg. Intoversion).
7
Level of generality n The level of generality has an influence on the ‘bandwidth fidelity trade-off’. n A measure with narrow bandwidth (specific) should be good at predicting a small number of behaviours, but poor at predicting a range of behaviours. n A measure with broad bandwidth (general) should be reasonable at predicting a large number of behaviours, but poor at predicting specific behaviours.
8
Level of generality: Narrow n A punctuality measure would be good at predicting time of arrival at classes, how often a person was late for work etc. n A punctuality measure would be poor at predicting social or interpersonal behaviour.
9
Level of generality: Broad n A sociability measure would be poor at predicting time of arrival at classes, how often a person was late for work etc. n A sociability measure would be good at predicting many social or interpersonal behaviours.
10
Example Extraversion SociabilityActivityExcitability Do you enjoy meeting new people? Do you like plenty of bustle and excitement around you? Do you like mixing with people?
11
Exercise n Name three general variables that may interest psychologists. What type of behaviours would they predict. n Name three specific, or narrow, variables that may interest psychologists. What type of behaviours would they predict.
12
Item Pool n An item pool is a large number of initial questions that may be included in the final questionnaire. n Item pools can be generated simply by thinking of items that reflect the variable of interest. n Preferably you should use a blueprint.
13
Item Pool n A blueprint, or test specification, is a framework for developing the questionnaire. n It requires you to specify content areas. The content areas should cover everything that is relevant to the purpose of the questionnaire. n Manifestations refer to the way that the content areas may manifest themselves.
14
Item Pool n More specifically, different types of manifestations should be identified –Behavioural: instances of behaviour related to content area –Cognitive: the way of thinking related to a content area –Affective: the way a person feels related to a content area
15
Item Pool n The content areas and manifestations should form the axis for a grid. Content areas Manifestations
16
Item Pool n You should use between 4 and 7 categories for each axis. n An example of a blueprint for measuring social anxiety (defined as an anxiety response to social interaction). n Each cell should be completed showing how each content area may become manifest - BUT NOT NOW
17
Content areas Manifestations A. Anxiety at meeting new people B. Anxiety at speaking publicly C. Anxiety at being in a public place ABC A. Avoidance B. Tension C. Feelings of worry D. Thinking people do not like me A B C D
18
Exercise n Construct a test specification (5 x 5) for one of the following variables. –Fear of technology –Trust –Loneliness –Happiness
19
Weighting content areas and manifestations n You may decide that not all content areas and manifestations are equally important in representing the variable of interest. n You may want to weight some areas and manifestations more heavily depending on their importance. n First, determine number of items.
20
Weighting content areas and manifestations n Determining number of items. –At least 20. –Smaller numbers if sample is elderly or very young. –Remember than 50% of the items may be removed. –Rough guide is between 40 and 100. n In this example 100 items will be initially developed.
21
Weighting content areas and manifestations n In this example 100 items will be initially developed. n It is believed that anxiety at meeting new people is a very important content areas, and that all the manifestations are equally important. n The blueprint could be specified as follows.
22
Content areas Manifestations A. Anxiety at meeting new people B. Anxiety at speaking publicly C. Anxiety at being in a public place ABC A. Avoidance B. Tension C. Feelings of worry D. Thinking people do not like me A B C D 60% 20% 25%
23
Weighting content areas and manifestations n If 100 items are to be developed, the number to be written for each cell can be calculated.
24
ABC A B C D 25% 60%20% Content areas Manifestations 15 5 5 5 5 5 5 5 5 25 6020 100
25
Writing Items n Writing items involves constructing questions or statement relating to each cell in the test specification. n The nature of the statements will depend on the response format used. n There are some guidelines to writing good items.
26
Writing Items n Items should be concise, clear and unambiguous. n You should avoid long, wordy items. n Construct your items to be compatible with the target sample in terms of reading difficulty (e.g. children or elderly).
27
Writing Items n Avoid double negatives –‘I am not in favour of the government not making drugs legal’ n Avoid double barrelled items that include two or more issues –‘I agree that crime should always be punished and hanging should return’
28
Writing Items n Try to avoid floor effects (all respondents scoring low or negatively) by making items too extreme. –‘I try to kill myself regularly’ –‘I hear voices telling me what to do’ –‘I am too nervous to speak to anyone’ –‘I drink more than 300 units of alcohol each week’
29
Writing Items n Try to avoid ceiling effects (all respondents scoring high or positively) by making items too extreme. –‘I have some positive attributes’ –‘What is 1+1?’ –‘I am too nervous to speak to anyone’
30
Writing Items n Include some negatively worded items to reduce response set, or acquiescence (agreeing with all the items). Remember to reverse code these items. –I feel I have a number of good qualities –On the whole, I am satisfied with myself –I feel useless at times –I feel I do not have a lot to be proud of
31
Response Format n Types of scaling –Likert –Semantic differential –Visual analog –Forced choice binary
32
Likert n The item is presented as a declarative statement and the response options reflect varying degrees of agreement or disagreement. n Between 5 and 7 options is usual. n The respondent is asked to circle the appropriate category.
33
Likert n The categories should be labelled as to represent equal intervals. n An optional midpoint can be used, but –how is it scored? –what does it mean? n Scale the items so that a high level of the variable you are measuring is reflected in a high value of a category that reflects the variable.
34
Likert
35
Likert: Assessing frequency
36
Semantic differential n Typically used in attitudinal research (Osgood & Tannenbaum, 1955). n Is generally used in reference to one or more stimuli, such as a particular person, political party, or racial/religious group. n The target stimulus is followed by a list of adjective pairs representing opposite ends of a continuum.
37
Semantic differential n The adjective pairs can be unipolar –UnfriendlyFriendly Or bipolar –HostileFriendly n The respondent is required to to place a mark between the adjectives to indicate the appropriate level of their response.
38
Students HappySad Hard Working Lazy StressedRelaxed __ __ __ __ __ __ __ Semantic differential
39
Visual Analog n The visual analog scale is similar to the semantic differential in that the respondent is required to mark their response between a pair of descriptors. n The difference is that the visual analog uses a continuum.
40
Visual Analog At the dentist I feel RelaxedFrightened Comfortable Uncomfortable No painA lot of pain______________________
41
Visual Analog n The visual analog scale is very sensitive and can detect smaller changes than the Likert or semantic differential scales. n Therefore useful if an intervention is being assessed, or if the variable is transient (e.g. mood). n Memory effects minimal in visual analog.
42
Forced Choice n Forced choice usually involves a binary choice choice as ‘yes/no’ or ‘agree/disagree’. n Generally considered inappropriate for clinical symptoms, mood or aptitude measures. n Can be effective at discriminating between different ‘types’.
43
Forced Choice n Some forced choice may include a ‘don’t know’ or ‘?’ option. A decision has to made on how to score this response. n Found by many respondents to be too restrictive. n Many items needed to generate variability.
44
Forced Choice
45
All Questionnaires n All questionnaires should include –Background information, with space for demographic details –Instructions; clear and concise with example if thought necessary –Keep layout clear
46
All Questionnaires n Do not mix type of response formats. n Do not mix labels on a Likert scale in the same scale. n Different scales can be included in a questionnaire, but make sure that the is information an instructions for each section.
47
Initial item review n The initial pool of items should be reviewed by experts in the content area on the basis of –relevance –clarity and conciseness –content area omissions –alternative manifestations
48
Initial scale administration n The new scale needs to be administered to a large sample. Nunnally (1978) recommends no less than 300. n If the scale is measuring a single construct, with few items, a smaller sample size may be used. n Ensure that the sample is as representative of your target population as possible.
49
Exercise n Using the test specification from the first exercise –decide on a weighting scheme –write three items for each cell –decide on a response format: explain why –what sample would the scale be administered to? n 5 minute presentation of work.
50
Evaluate items n Items must be evaluated in terms of reliability and validity. n A necessary prerequisite is determining how many variables, or factors, are being measured. This is done by using factor analysis. n Each subscale is then analysed separately.
51
Reliability: Item to total n All the items should be highly correlated. n Each item can be correlated with the remaining total scale items (including or excluding itself). n Items with low item to scale correlations will have low reliability.
52
Reliability: Coefficient alpha n This gives an estimate of the scales reliability. n Scaled between 0.0 and 1.0. Higher values indicating higher reliability. n There is a positive relationship between the number of items in a scale and estimates of alpha.
53
Item analysis: Item variances n The variance ( 2 ) of an item indicates its variability. n If an item has a relatively low variance, this indicates that it is not differentiating individuals.
54
Item analysis: Item means n Extremely low or high means for individual items suggests that the wording of the item is too extreme and floor or ceiling effects are occurring. n Such items will have little power to discriminate and therefore should be discarded.
55
Criterion references items n Items can be selected on their ability to predict some external criteria. n For a conservatism scale items should be retained that can predict political preferences. n For an IQ test scale items should be retained that can predict school/university performance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.