Reliability, Validity, & Scaling

Name: Reliability, Validity, & Scaling
Uploaded: 2017-08-15T12:31:55+00:00
Duration: PTM8S32
Channel: Kristin Norton
Description: Reliability, Validity, & Scaling

Reliability, Validity, & Scaling

Reliability Repeatedly measure unchanged things.
Do you get the same measurements? Charles Spearman, Classical Measurement Theory. If perfectly reliable, then corr between true scores and measurements = +1. r < 1 because of random error. error symmetrically distributed about 0.

True Scores and Measurements
Reliability is the squared correlation between true scores and measurement scores. Reliability is the proportion of the variance in the measurement scores that is due to differences in the true scores rather than due to random error.

Reliability cannot be known, can be estimated.
Systematic error not random measuring something else, in addition to the construct of interest Reliability cannot be known, can be estimated.

Test-Retest Reliability
Measure subjects at two points in time. Correlate ( r ) the two sets of measurements. .7 OK for research instruments need it higher for practical applications and important decisions. M and SD should not vary much from Time 1 to Time 2, usually.

Alternate/Parallel Forms
Estimate reliability with r between forms. M and SD should be same for both forms. Pattern of corrs with other variables should be same for both forms.

Split-Half Reliability
Divide items into two random halves. Score each half. Correlate the half scores. Get the half-test reliability coefficient, rhh Correct with Spearman-Brown

Cronbach’s Coefficient Alpha
Obtained value of rsb depends on how you split the items into haves. Find rsb for all possible pairs of split halves. Compute mean of these. But you don’t really compute it this way. This is a lower bound for the true reliability. That is, it underestimates true reliability.

Maximized Lambda4 This is the best estimator of reliability.
Compute rsb for all possible pairs of split halves. The largest rsb = the estimated reliability. If more than a few items, this is unreasonably tedious. But there are ways to estimate it.

Construct Validity To what extent are we really measuring/manipulating the construct of interest? Face Validity – do others agree that it sounds valid?

Content Validity Detail the population of things (behaviors, attitudes, etc.) that are of interest. Consider your operationalization of the construct (the details of how you proposed to measure it) as a sample of that population. Is your sample representative of the population – ask experts.

Criterion-Related Validity
Established by demonstrating that your operationalization has the expected pattern of correlations with other variables. Concurrent Validity – demonstrate the expected correlation with other variables measured at the same time. Predictive Validity – demonstrate the expected correlation with other variables measured later in time.

Convergent Validity – demonstrate the expected correlation with measures of other constructs.
Discriminant Validity – demonstrate the expected lack of correlation with measures of other constructs.

Scaling Scaling = construction of instruments for measuring abstract constructs. I shall discuss the creation of a Likert-scale, my favorite type of scale.

Likert Scales Define the Concept Generate Potential Items
About 100 statements. On some, agreement indicates being high on the measured attribute On others, agreement indicates being low on the measured attribute

Likert Response Scale Use a multi-point response scale like this: 1. People should make certain that their actions never intentionally harm others even to a small degree. Strongly Disagree Disagree Neutral Agree Strongly Agree

Evaluate the Potential Items
Get judges to evaluate each item on a 5-point scale 1 – Agreement = very low on attribute 2 – Agreement = low on attribute 3 – Agreement tells you nothing 4 – Agreement = high on attribute 5 – Agreement = very high on attribute Select items with very high or very low means and little variability among the judges.

Alternate Method of Item Evaluation
Ask some judges to respond to the items in the way they think someone high in the attribute would respond. Ask other judges to respond as would one low in the attribute. Prefer items that best discriminate between these two groups Also ask judges to identify items that are unclear or confusing.

Pilot Test the Items Administer to a sample of persons from the population of interest Conduct an item analysis (more on this later) Prefer items which have high item-total correlations Consider conducting a factor analysis (more on this later)

Administer the Final Scale
on each item, response which indicates least amount of the attribute scored as 1 next least amount response scored as 2 and so on respondent’s total score = sum of item scores or mean of item scores dealing with nonresponses on some items reflecting items (reverse scoring)

Item Analysis You believe the scale is unidimensional.
Each item measures the same thing. Item scores should be well correlated. Evaluate this belief with an item analysis. is the scale internally consistent? if so, it is also reliable. are there items that do not correlate well with the others?

Item Analysis of Idealism Scale
Bring KJ-Idealism.sav into PASW. Available at

Click Analyze, Scale, Reliability Analysis.

Select all ten items and scoot them to the Items box on the right.
Click the Statistics box.

Check “Scale if item deleted” and then click Continue.

Back on the initial window, click OK.
Look at the output. The Cronbach alpha is .744, which is acceptable.

Item-Total Statistics

Troublesome Items Items 7 and 10 are troublesome.
Deleting them would increase alpha. But not by much, so I retained them. Item 7 stats are especially distressing: “Deciding whether or not to perform an act by balancing the positive consequences of the act against the negative consequences of the act is immoral.”

What Next? I should attempt to rewrite item 7 to make it more clear that it applies to ethical decisions, not other cost-benefit analysis. But this is not my scale, And who has the time?

Scale Might Not Be Unidimensional
If the items are measuring two or more different things, alpha may well be low. You need to split the scale into two or more subscales. Factor analysis can be helpful here (but no promises).

Reliability, Validity, & Scaling

Similar presentations

Presentation on theme: "Reliability, Validity, & Scaling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reliability, Validity, & Scaling

Similar presentations

Presentation on theme: "Reliability, Validity, & Scaling"— Presentation transcript:

Similar presentations

About project

Feedback