Questionnaire Design and Evaluation Mark Shevlin.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Standardized Scales.
Developing a Questionnaire
What is a Survey? A scientific social research method that involves
Measurement and Scaling: Noncomparative Scaling Techniques.
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Chapter Ten. Figure 10.1 Relationship of Noncomparative Scaling to the Previous Chapters and the Marketing Research Process Focus of This Chapter Relationship.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Survey Methodology Survey Instruments (2) EPID 626 Lecture 8.
Copyright © 2010 Pearson Education, Inc. 9-1 Chapter Nine Measurement and Scaling: Noncomparative Scaling Techniques.
Chapter Ten Chapter 10.
Measurement and Observation. Choices During Operationalization Researchers make a number of key decisions when deciding how to measure a concept Researchers.
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Non Comparative Scaling Techniques
MEASUREMENT. Measurement “If you can’t measure it, you can’t manage it.” Bob Donath, Consultant.
1 Sources:  SusanTurner - Napier University  C. Robson, Real World Research, Blackwell, 1993  Steve Collesano: Director, Corporate Research and Development.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 8 Using Survey Research.
Instrumentation: Performance & Surveys : Research I Spring 2010 Dr. Leonard.
LECTURE 4 ATTITUDE SCALING.
Practical Psychometrics Preliminary Decisions Components of an item # Items & Response Approach to the Validation Process.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Study announcement if you are interested!. Questions  Is there one type of mixed design that is more common than the other types?  Even though there.
Proposal Writing.
Reliability, Validity, & Scaling
Measurement and Scaling
Measurement, Scales and Attitudes. Nominal Ordinal?
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
1 Chapter 11: Survey Research Summary page 343 Asking Questions Obtaining Answers Multi-item Scales Response Biases Questionnaire Design Questionnaire.
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Questionnaire Design. Questionnaires Inexpensive – postage and photocopies Potential of large # of respondents Easy to administer confidentially – embarrassing.
MARKETING SURVEYS Constructing the Questionnaire validity  A questionnaire has validity when the questions asked measure what they were intended.
Chapter 12 Survey Research.
Psychology 307: Cultural Psychology Lecture 3
Chapter 12 Advanced Measurement Designs for Survey Research.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Measurement and Questionnaire Design. Operationalizing From concepts to constructs to variables to measurable variables A measurable variable has been.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
1 Psychology 305A: Personality Psychology September 9 Lecture 2.
The Practice of Social Research Chapter 6 – Indexes, Scales, and Typologies.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College Ability, Intelligence, Aptitude and Achievement Testing For Class #12 Copyright.
1 Psychology 305A: Personality Psychology January 14 Lecture 3.
Surveys.
Psychology 307: Cultural Psychology Lecture 3
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Tests and Measurements
Psychology 3051 Psychology 305A: Theories of Personality Lecture 2 1.
Chapter Twelve Copyright © 2006 McGraw-Hill/Irwin Attitude Scale Measurements Used In Survey Research.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Survey Methodology Reliability and Validity
Designing Questionnaire
Indexes, Scales, and Typologies
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Measurement and Observation
Chapter Nine Measurement and Scaling: Noncomparative Scaling
5. Reliability and Validity
RESEARCH METHODS Lecture 18
Marketing Surveys Lecture: min 29.2.
4.13 Explain characteristics of effective data-collection instruments.
Chapter 6 Indexes, Scales, and Typologies
MEASUREMENT AND QUESTIONNAIRE CONSTRUCTION:
The Concept of Measurement and Attitude Scales
Chapter 3: How Standardized Test….
Presentation transcript:

Questionnaire Design and Evaluation Mark Shevlin

Type of Psychological Tests n Psychological tests can be used to measure –General ability (IQ) –Specific abilities –Attitudes –Interests –Clinical pathology –Personality traits

Type of Psychological Tests n The guidelines in this lecture relate to –Attitudes –Interests –Personality traits n Always make sure that there is not an already published scale available.

Guidelines in Scale Construction n What do you want to measure n Generate an item pool n Decide on appropriate response format n Initial item review and development sample n Evaluate items n Optimise scale content

What do you want to measure n You will be attempting to measure a variable, a dimension along which people are different. n The variable will be latent, unobservable variables. n Developing a scale requires a clear and concise understanding of what you are trying to measure.

Level of generality n Variables can be measured a different levels of specificity. n Specificity refers to the breadth of the construct under consideration. n Some measures tap a very specific small group of behaviours (eg. Punctuality). n Some measures tap a very broad and general group of behaviours (eg. Intoversion).

Level of generality n The level of generality has an influence on the ‘bandwidth fidelity trade-off’. n A measure with narrow bandwidth (specific) should be good at predicting a small number of behaviours, but poor at predicting a range of behaviours. n A measure with broad bandwidth (general) should be reasonable at predicting a large number of behaviours, but poor at predicting specific behaviours.

Level of generality: Narrow n A punctuality measure would be good at predicting time of arrival at classes, how often a person was late for work etc. n A punctuality measure would be poor at predicting social or interpersonal behaviour.

Level of generality: Broad n A sociability measure would be poor at predicting time of arrival at classes, how often a person was late for work etc. n A sociability measure would be good at predicting many social or interpersonal behaviours.

Example Extraversion SociabilityActivityExcitability Do you enjoy meeting new people? Do you like plenty of bustle and excitement around you? Do you like mixing with people?

Exercise n Name three general variables that may interest psychologists. What type of behaviours would they predict. n Name three specific, or narrow, variables that may interest psychologists. What type of behaviours would they predict.

Item Pool n An item pool is a large number of initial questions that may be included in the final questionnaire. n Item pools can be generated simply by thinking of items that reflect the variable of interest. n Preferably you should use a blueprint.

Item Pool n A blueprint, or test specification, is a framework for developing the questionnaire. n It requires you to specify content areas. The content areas should cover everything that is relevant to the purpose of the questionnaire. n Manifestations refer to the way that the content areas may manifest themselves.

Item Pool n More specifically, different types of manifestations should be identified –Behavioural: instances of behaviour related to content area –Cognitive: the way of thinking related to a content area –Affective: the way a person feels related to a content area

Item Pool n The content areas and manifestations should form the axis for a grid. Content areas Manifestations

Item Pool n You should use between 4 and 7 categories for each axis. n An example of a blueprint for measuring social anxiety (defined as an anxiety response to social interaction). n Each cell should be completed showing how each content area may become manifest - BUT NOT NOW

Content areas Manifestations A. Anxiety at meeting new people B. Anxiety at speaking publicly C. Anxiety at being in a public place ABC A. Avoidance B. Tension C. Feelings of worry D. Thinking people do not like me A B C D

Exercise n Construct a test specification (5 x 5) for one of the following variables. –Fear of technology –Trust –Loneliness –Happiness

Weighting content areas and manifestations n You may decide that not all content areas and manifestations are equally important in representing the variable of interest. n You may want to weight some areas and manifestations more heavily depending on their importance. n First, determine number of items.

Weighting content areas and manifestations n Determining number of items. –At least 20. –Smaller numbers if sample is elderly or very young. –Remember than 50% of the items may be removed. –Rough guide is between 40 and 100. n In this example 100 items will be initially developed.

Weighting content areas and manifestations n In this example 100 items will be initially developed. n It is believed that anxiety at meeting new people is a very important content areas, and that all the manifestations are equally important. n The blueprint could be specified as follows.

Content areas Manifestations A. Anxiety at meeting new people B. Anxiety at speaking publicly C. Anxiety at being in a public place ABC A. Avoidance B. Tension C. Feelings of worry D. Thinking people do not like me A B C D 60% 20% 25%

Weighting content areas and manifestations n If 100 items are to be developed, the number to be written for each cell can be calculated.

ABC A B C D 25% 60%20% Content areas Manifestations

Writing Items n Writing items involves constructing questions or statement relating to each cell in the test specification. n The nature of the statements will depend on the response format used. n There are some guidelines to writing good items.

Writing Items n Items should be concise, clear and unambiguous. n You should avoid long, wordy items. n Construct your items to be compatible with the target sample in terms of reading difficulty (e.g. children or elderly).

Writing Items n Avoid double negatives –‘I am not in favour of the government not making drugs legal’ n Avoid double barrelled items that include two or more issues –‘I agree that crime should always be punished and hanging should return’

Writing Items n Try to avoid floor effects (all respondents scoring low or negatively) by making items too extreme. –‘I try to kill myself regularly’ –‘I hear voices telling me what to do’ –‘I am too nervous to speak to anyone’ –‘I drink more than 300 units of alcohol each week’

Writing Items n Try to avoid ceiling effects (all respondents scoring high or positively) by making items too extreme. –‘I have some positive attributes’ –‘What is 1+1?’ –‘I am too nervous to speak to anyone’

Writing Items n Include some negatively worded items to reduce response set, or acquiescence (agreeing with all the items). Remember to reverse code these items. –I feel I have a number of good qualities –On the whole, I am satisfied with myself –I feel useless at times –I feel I do not have a lot to be proud of

Response Format n Types of scaling –Likert –Semantic differential –Visual analog –Forced choice binary

Likert n The item is presented as a declarative statement and the response options reflect varying degrees of agreement or disagreement. n Between 5 and 7 options is usual. n The respondent is asked to circle the appropriate category.

Likert n The categories should be labelled as to represent equal intervals. n An optional midpoint can be used, but –how is it scored? –what does it mean? n Scale the items so that a high level of the variable you are measuring is reflected in a high value of a category that reflects the variable.

Likert

Likert: Assessing frequency

Semantic differential n Typically used in attitudinal research (Osgood & Tannenbaum, 1955). n Is generally used in reference to one or more stimuli, such as a particular person, political party, or racial/religious group. n The target stimulus is followed by a list of adjective pairs representing opposite ends of a continuum.

Semantic differential n The adjective pairs can be unipolar –UnfriendlyFriendly Or bipolar –HostileFriendly n The respondent is required to to place a mark between the adjectives to indicate the appropriate level of their response.

Students HappySad Hard Working Lazy StressedRelaxed __ __ __ __ __ __ __    Semantic differential

Visual Analog n The visual analog scale is similar to the semantic differential in that the respondent is required to mark their response between a pair of descriptors. n The difference is that the visual analog uses a continuum.

Visual Analog At the dentist I feel RelaxedFrightened Comfortable Uncomfortable No painA lot of pain______________________   

Visual Analog n The visual analog scale is very sensitive and can detect smaller changes than the Likert or semantic differential scales. n Therefore useful if an intervention is being assessed, or if the variable is transient (e.g. mood). n Memory effects minimal in visual analog.

Forced Choice n Forced choice usually involves a binary choice choice as ‘yes/no’ or ‘agree/disagree’. n Generally considered inappropriate for clinical symptoms, mood or aptitude measures. n Can be effective at discriminating between different ‘types’.

Forced Choice n Some forced choice may include a ‘don’t know’ or ‘?’ option. A decision has to made on how to score this response. n Found by many respondents to be too restrictive. n Many items needed to generate variability.

Forced Choice

All Questionnaires n All questionnaires should include –Background information, with space for demographic details –Instructions; clear and concise with example if thought necessary –Keep layout clear

All Questionnaires n Do not mix type of response formats. n Do not mix labels on a Likert scale in the same scale. n Different scales can be included in a questionnaire, but make sure that the is information an instructions for each section.

Initial item review n The initial pool of items should be reviewed by experts in the content area on the basis of –relevance –clarity and conciseness –content area omissions –alternative manifestations

Initial scale administration n The new scale needs to be administered to a large sample. Nunnally (1978) recommends no less than 300. n If the scale is measuring a single construct, with few items, a smaller sample size may be used. n Ensure that the sample is as representative of your target population as possible.

Exercise n Using the test specification from the first exercise –decide on a weighting scheme –write three items for each cell –decide on a response format: explain why –what sample would the scale be administered to? n 5 minute presentation of work.

Evaluate items n Items must be evaluated in terms of reliability and validity. n A necessary prerequisite is determining how many variables, or factors, are being measured. This is done by using factor analysis. n Each subscale is then analysed separately.

Reliability: Item to total n All the items should be highly correlated. n Each item can be correlated with the remaining total scale items (including or excluding itself). n Items with low item to scale correlations will have low reliability.

Reliability: Coefficient alpha n This gives an estimate of the scales reliability. n Scaled between 0.0 and 1.0. Higher values indicating higher reliability. n There is a positive relationship between the number of items in a scale and estimates of alpha.

Item analysis: Item variances n The variance (  2 ) of an item indicates its variability. n If an item has a relatively low variance, this indicates that it is not differentiating individuals.

Item analysis: Item means n Extremely low or high means for individual items suggests that the wording of the item is too extreme and floor or ceiling effects are occurring. n Such items will have little power to discriminate and therefore should be discarded.

Criterion references items n Items can be selected on their ability to predict some external criteria. n For a conservatism scale items should be retained that can predict political preferences. n For an IQ test scale items should be retained that can predict school/university performance.