Presentation is loading. Please wait.

Presentation is loading. Please wait.

REVIEW I Reliability scraps Index of Reliability

Similar presentations


Presentation on theme: "REVIEW I Reliability scraps Index of Reliability"— Presentation transcript:

1 REVIEW I Reliability scraps Index of Reliability
Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree to which an observed score fluctuates due to measurement errors Factors affecting reliability A test must be RELIABLE to be VALID Chapter 3

2 REVIEW II Types of validity Content-related (face) Criterion-related
Represent knowledge Use “experts” to establish Criterion-related Evidence of a statistical relationship w/ trait being measured Alternative measures must be validated w/ criterion measure Construct-related Validates unobservable theoretical measures Chapter 3

3 REVIEW III Standard Error of Estimate
Validity measure Degree of error in estimating a score based on the criterion Methods of obtaining a criterion measure Actual participation Experts Perform criterion Known valid test Interpreting “r” Chapter 3

4 Criterion-Referenced Measurement
Tell the students that you have spoken about Norm-Referenced reliability and validity where you are concerned about differences among students. We will now turn our focus to Criterion-Referenced reliability and validity. The interest here is still in consistency and truthfulness of measurement but you are no longer interested in differences among or between performers. You are now interested in whether the performer meets the criterion. Poor Sufficient Better Chapter 7

5 Guidelines for Writing Behavioral Objectives (Mager , 1962)
Identify the desired behavior/action by name Define the desired condition Specify the criteria of acceptable performance When setting standards, you often encounter “behavioral objectives.” This slide presents the steps in developing behavioral objectives which help you set standards for Mastery learning and the evaluation thereof. Chapter 7

6 Criterion-Referenced Testing
Mastery Learning Standard Development Judgmental: use experts Normative: theoretically accepted criteria Empirical: cutoff based on available data Combination: expert & norms typically combined CRT has been described at Mastery Learning Standard setting is quite difficult and always somewhat subjective in nature The four methods of setting standards are presented (see page 107 in text) Chapter 7

7 Advantages of Criterion-Referenced Measurement
Represent specific, desired performance levels linked to a criterion Independent of the %of the population that meets the standard If not met, specific diagnostic evaluations can be made Degree of performance is not important reaching the standard is Performance linked to specific outcomes Individuals know exactly what is expected of them These are advantages of using CRS Chapter 7

8 Limitations of Criterion-Referenced Measurement
Cutoff scores always involve subjective judgment Misclassifications can be severe Student motivation can be impacted; frustrated/bored However, there are limitations to CRT Chapter 7

9 Setting a Cholesterol “Cut-Off”
N of deaths Setting a cut-off standard or a health standard is difficult. There are typically to levels of cholesterol that are used as standards. With CRT the standard must be related to a criterion. Here the criterion is health (i.e., number of deaths). Can the students identify the “break-points” in this distribution? It is not a “clear-cut” as some would have you believe. Cholesterol mg/dl Chapter 7

10 Setting a Cholesterol “Cut-Off”
N of deaths The two most often utilized cut-points (CRS) for cholesterol are 200 and 240 mg/dl. Notice there is not a great increase in risk at 200 mg/dl but more so at 240 mg/dl. Cholesterol mg/dl Chapter 7

11 Considerations with CRT
The same as norm-referenced testing Reliability Consistency of measurement Validity Truthfulness of measurement The psychometric issues in CRT are the same as with NRT (i.e., reliability and validity). However, the procedures used to determine reliability and validity are somewhat different. However, the general nature of the procedures is much like that with NRT. Chapter 7

12 Statistical Analysis of CRTs
Nominal Data Contingency Table Development (2x2 Chi2) Phi Coefficient (PPM for dichotomous variables) Chi-Square Analysis The scores are categorical (i.e., nominal) in nature so the analyses must match the nature of the variables being measured. A contingency table is a 2X2 table that you will see several example of. The phi coefficient is the PPM correlation between two dichotomous variables (coded 0 and 1). The Chi-Square analysis (presented in chapter 5) is used with categorical variables. Chapter 7

13 CRT Reliability Test/Retest of a single measure
Fail Day 1 Pass Day 2 Note this is a reliability example because the people are tested on two occasions (Day 1 and Day 2) of the SAME measure and interest focuses on whether or not the person met the criterion on each day Chapter 7

14 CRT Validity Use of a field test and criterion measure
Fail Field Test Pass Criterion Note this is a validity example because the people are tested on a field test and a criterion. The use of a criterion measure generally makes this a validity example Interest focuses on whether or not the field test provides a good estimate of how well the person would perform on the criterion measure Chapter 7

15 Figure 7.1 (a) FITNESSGRAM Standards (1987)
Below the criterion VO2max Above the criterion VO2max 24 (4%) 21 64 (11%) 472 (81%) Did not achieve the standard on the run/walk test Notice the number of correct classifications here ( ) or 85% P = 85% is proportion of agreement. The Kappa coefficient and Chi-Square test of association can also be calculated for these data. Compare these results with the next picture using the standards for the Physical Best Test. Did achieve the standard on the run/walk test Chapter 7

16 Figure 7.1 (b) AAHPERD Standards (1988)
Below the criterion VO2max Above the criterion VO2max 130 (22%) 23 (4%) 201 (35%) 227 (39%) Did not achieve the standard on the run/walk test Notice the number of correct classifications here ( ) or 61% P = 61% is proportion of agreement. Compare these standards with those from Figure 7.1(a) and see that the standards for the FITNESSGRAM correctly classify more individuals. The Kappa coefficient and Chi-Square test of association can also be calculated for these data. Did achieve the standard on the run/walk test Chapter 7

17 Meeting Criterion-Referenced Standards Possible Decisions
Truly Below Criterion Truly Above Criterion Did not achieve standard Correct Decision False Positive Did achieve standard False Negative Note the correct decisions are made when people who actually meet the criterion pass the field test and those who actually fail the criterion do NOT achieve the field test standard. Note the incorrect decisions that occur – a false negative occurs when the person achieves the standard on the field test but does not meet the standard on the criterion. A false positive occurs when a person fails to achieve the field test standard but truly achieves the standard on the criterion. Students sometimes get confused with false positive and false negative. Use the example of them going to the MD for a tests of whether they have a disease. They want the results to come back “negative.” If it is a “false negative” – that means they actually have the disease but the test told them that they did not. A “false positive” from your physician says that the lab results tell you that you HAVE the disease but you truly do not. Chapter 7

18 Table 7.1 Test-Retest Reliability Example
Day 2 Day 1 Did not achieve the standard Did achieve the standard 80 20 100 50 250 300 130 270 400 P=n1+n3/N OR (80+250)/400 K=multiply the marginals/n2 and sum OR 130*100/400sq=.081 & 270*300/4002=.506 THEN =.587=Pc SO K=(P-Pc)/(1-Pc): ( )/ = .238/.413=.576 P = .825 K = .576 Phi = .586 C2 = , df = 1, p < .001 Chapter 7

19 Table 7-2 Criterion-Referenced Equivalence Reliability Between the 1 Mile Run/Walk and PACER
Tests Total sample Boys Girls Trial 1 P .76 .83 .66 K .51 .65 .33 Trial 2 .71 .43 .52 .30 Use the text to describe the equivalence nature of these statistics. Both the 1 Mile Run/Walk and the PACER were administered to all subjects. Scores were converted to meeting or not meeting the criterion before conducting this analysis. Chapter 7

20 Figure 7.3 A theoretical example of the divergent group method
Point out that a possible cut-point for this distribution of physical activity might be where the two curves cross. There will obviously be some misclassifications but only a few. This points out that no test will be totally valid. Chapter 7

21 Examples of Criterion Referenced Standards
Cholesterol < 240 mg / dl Systolic Blood Pressure < 140 mmHg Diastolic Blood Pressure < 90 mmHg FITNESSGRAM 1-mile run time for boy age 10 < 11:30

22 Criterion-referenced Measurement
Find a friend: Explain one thing that you learned today and share WHY IT MATTERS to you as a future professional Chapter 3


Download ppt "REVIEW I Reliability scraps Index of Reliability"

Similar presentations


Ads by Google