Reliability, Validity, & Scaling

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Reliability and Validity of Researcher-Made Surveys.
Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Cronbach’s Alpha & Maximized 4. SAS proc corr nosimple nocorr nomiss alpha; var q1-q10; run; Cronbach Coefficient Alpha VariablesAlpha Raw
Chapter Eight & Chapter Nine
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Chapter 4 – Reliability Observed Scores and True Scores Error
1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High.
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Chapter Ten. Figure 10.1 Relationship of Noncomparative Scaling to the Previous Chapters and the Marketing Research Process Focus of This Chapter Relationship.
Research Methodology Lecture No : 11 (Goodness Of Measures)
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability & Validity.  Limits all inferences that can be drawn from later tests  If reliable and valid scale, can have confidence in findings  If.
VALIDITY vs. RELIABILITY by: Ivan Prasetya. Because of measuring the social phenomena is not easy like measuring the physical symptom and because there.
Measurement Reliability and Validity
Copyright © 2010 Pearson Education, Inc. 9-1 Chapter Nine Measurement and Scaling: Noncomparative Scaling Techniques.
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
5/15/2015Marketing Research1 MEASUREMENT  An attempt to provide an objective estimate of a natural phenomenon ◦ e.g. measuring height ◦ or weight.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Research Methods in MIS
Chapter 7 Evaluating What a Test Really Measures
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Scaling. Survey Research Questionnaires and Interviews Both experimental and nonexperimental research Read pages 212 through 223 in Martin Scaling = construction.
Survey Methods & Design in Psychology Lecture 6 Reliabilities, Composite Scores & Review of Lectures 1 to 6 (2007) Lecturer: James Neill.
MEASUREMENT OF VARIABLES: OPERATIONAL DEFINITION AND SCALES
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Tests and Measurements Intersession 2006.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Validity.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
Chapter 4-Measurement in Marketing Research
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Reliability: The degree to which a measurement can be successfully repeated.
Measurement Issues General steps –Determine concept –Decide best way to measure –What indicators are available –Select intermediate, alternate or indirect.
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Measurement and Scaling Concepts
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Survey Methodology Reliability and Validity
Reliability Analysis.
Questions What are the sources of error in measurement?
Concept of Test Validity
پرسشنامه کارگاه.
5. Reliability and Validity
RESEARCH METHODS Lecture 18
Reliability Analysis.
Measurement Concepts and scale evaluation
The Concept of Measurement and Attitude Scales
Presentation transcript:

Reliability, Validity, & Scaling

Reliability Repeatedly measure unchanged things. Do you get the same measurements? Charles Spearman, Classical Measurement Theory. If perfectly reliable, then corr between true scores and measurements = +1. r < 1 because of random error. error symmetrically distributed about 0.

True Scores and Measurements Reliability is the squared correlation between true scores and measurement scores. Reliability is the proportion of the variance in the measurement scores that is due to differences in the true scores rather than due to random error.

Reliability cannot be known, can be estimated. Systematic error not random measuring something else, in addition to the construct of interest Reliability cannot be known, can be estimated.

Test-Retest Reliability Measure subjects at two points in time. Correlate ( r ) the two sets of measurements. .7 OK for research instruments need it higher for practical applications and important decisions. M and SD should not vary much from Time 1 to Time 2, usually.

Alternate/Parallel Forms Estimate reliability with r between forms. M and SD should be same for both forms. Pattern of corrs with other variables should be same for both forms.

Split-Half Reliability Divide items into two random halves. Score each half. Correlate the half scores. Get the half-test reliability coefficient, rhh Correct with Spearman-Brown

Cronbach’s Coefficient Alpha Obtained value of rsb depends on how you split the items into haves. Find rsb for all possible pairs of split halves. Compute mean of these. But you don’t really compute it this way. This is a lower bound for the true reliability. That is, it underestimates true reliability.

Maximized Lambda4 This is the best estimator of reliability. Compute rsb for all possible pairs of split halves. The largest rsb = the estimated reliability. If more than a few items, this is unreasonably tedious. But there are ways to estimate it.

Construct Validity To what extent are we really measuring/manipulating the construct of interest? Face Validity – do others agree that it sounds valid?

Content Validity Detail the population of things (behaviors, attitudes, etc.) that are of interest. Consider your operationalization of the construct (the details of how you proposed to measure it) as a sample of that population. Is your sample representative of the population – ask experts.

Criterion-Related Validity Established by demonstrating that your operationalization has the expected pattern of correlations with other variables. Concurrent Validity – demonstrate the expected correlation with other variables measured at the same time. Predictive Validity – demonstrate the expected correlation with other variables measured later in time.

Convergent Validity – demonstrate the expected correlation with measures of other constructs. Discriminant Validity – demonstrate the expected lack of correlation with measures of other constructs.

Scaling Scaling = construction of instruments for measuring abstract constructs. I shall discuss the creation of a Likert-scale, my favorite type of scale.

Likert Scales Define the Concept Generate Potential Items About 100 statements. On some, agreement indicates being high on the measured attribute On others, agreement indicates being low on the measured attribute

Likert Response Scale Use a multi-point response scale like this: 1. People should make certain that their actions never intentionally harm others even to a small degree. Strongly Disagree Disagree Neutral Agree Strongly Agree

Evaluate the Potential Items Get judges to evaluate each item on a 5-point scale 1 – Agreement = very low on attribute 2 – Agreement = low on attribute 3 – Agreement tells you nothing 4 – Agreement = high on attribute 5 – Agreement = very high on attribute Select items with very high or very low means and little variability among the judges.

Alternate Method of Item Evaluation Ask some judges to respond to the items in the way they think someone high in the attribute would respond. Ask other judges to respond as would one low in the attribute. Prefer items that best discriminate between these two groups Also ask judges to identify items that are unclear or confusing.

Pilot Test the Items Administer to a sample of persons from the population of interest Conduct an item analysis (more on this later) Prefer items which have high item-total correlations Consider conducting a factor analysis (more on this later)

Administer the Final Scale on each item, response which indicates least amount of the attribute scored as 1 next least amount response scored as 2 and so on respondent’s total score = sum of item scores or mean of item scores dealing with nonresponses on some items reflecting items (reverse scoring)

Item Analysis You believe the scale is unidimensional. Each item measures the same thing. Item scores should be well correlated. Evaluate this belief with an item analysis. is the scale internally consistent? if so, it is also reliable. are there items that do not correlate well with the others?

Item Analysis of Idealism Scale Bring KJ-Idealism.sav into PASW. Available at http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm

Click Analyze, Scale, Reliability Analysis.

Select all ten items and scoot them to the Items box on the right. Click the Statistics box.

Check “Scale if item deleted” and then click Continue.

Back on the initial window, click OK. Look at the output. The Cronbach alpha is .744, which is acceptable.

Item-Total Statistics

Troublesome Items Items 7 and 10 are troublesome. Deleting them would increase alpha. But not by much, so I retained them. Item 7 stats are especially distressing: “Deciding whether or not to perform an act by balancing the positive consequences of the act against the negative consequences of the act is immoral.”

What Next? I should attempt to rewrite item 7 to make it more clear that it applies to ethical decisions, not other cost-benefit analysis. But this is not my scale, And who has the time?

Scale Might Not Be Unidimensional If the items are measuring two or more different things, alpha may well be low. You need to split the scale into two or more subscales. Factor analysis can be helpful here (but no promises).