REVIEW I Reliability scraps Index of Reliability

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Statistical Issues in Research Planning and Evaluation
LECTURE 9.
Chapter 4 Validity.
REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.
Chi-square Test of Independence
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Richard M. Jacobs, OSA, Ph.D.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Measurement and Data Quality
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Instrumentation.
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 8 Introduction to Hypothesis Testing
CRT Dependability Consistency for criterion- referenced decisions.
Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Correlation & Prediction REVIEW Correlation BivariateDirect/IndirectCause/Effect Strength of relationships (is + stronger than negative?) Coefficient of.
+ Chi Square Test Homogeneity or Independence( Association)
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Chapter 6: Analyzing and Interpreting Quantitative Data
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.
Chapter 6 - Standardized Measurement and Assessment
Chapter 6 Norm-Referenced Reliability and Validity.
Chapter 7 Criterion-Referenced Reliability and Validity PoorSufficientBetter.
Chapter 7 Criterion-Referenced Measurement PoorSufficientBetter.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Quantitative Methods in the Behavioral Sciences PSY 302
Statistics & Evidence-Based Practice
Warm up On slide.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Inferential Statistics
Logic of Hypothesis Testing
Sample Power No reading, class notes only
Lecture8 Test forcomparison of proportion
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Concept of Test Validity
Assessment Theory and Models Part II
Evaluation of measuring tools: validity
Understanding Results
Analyzing and Interpreting Quantitative Data
Social Research Methods
Chapter 11 Goodness-of-Fit and Contingency Tables
Chapter 9 Hypothesis Testing.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 13: Inference for Distributions of Categorical Data
Understanding Statistical Inferences
Chapter 18: The Chi-Square Statistic
InferentIal StatIstIcs
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree to which an observed score fluctuates due to measurement errors Factors affecting reliability A test must be RELIABLE to be VALID Chapter 3

REVIEW II Types of validity Content-related (face) Criterion-related Represent knowledge Use “experts” to establish Criterion-related Evidence of a statistical relationship w/ trait being measured Alternative measures must be validated w/ criterion measure Construct-related Validates unobservable theoretical measures Chapter 3

REVIEW III Standard Error of Estimate Validity measure Degree of error in estimating a score based on the criterion Methods of obtaining a criterion measure Actual participation Experts Perform criterion Known valid test Interpreting “r” Chapter 3

Criterion-Referenced Measurement Tell the students that you have spoken about Norm-Referenced reliability and validity where you are concerned about differences among students. We will now turn our focus to Criterion-Referenced reliability and validity. The interest here is still in consistency and truthfulness of measurement but you are no longer interested in differences among or between performers. You are now interested in whether the performer meets the criterion. Poor Sufficient Better Chapter 7

Guidelines for Writing Behavioral Objectives (Mager , 1962) Identify the desired behavior/action by name Define the desired condition Specify the criteria of acceptable performance When setting standards, you often encounter “behavioral objectives.” This slide presents the steps in developing behavioral objectives which help you set standards for Mastery learning and the evaluation thereof. Chapter 7

Criterion-Referenced Testing Mastery Learning Standard Development Judgmental: use experts Normative: theoretically accepted criteria Empirical: cutoff based on available data Combination: expert & norms typically combined CRT has been described at Mastery Learning Standard setting is quite difficult and always somewhat subjective in nature The four methods of setting standards are presented (see page 107 in text) Chapter 7

Advantages of Criterion-Referenced Measurement Represent specific, desired performance levels linked to a criterion Independent of the %of the population that meets the standard If not met, specific diagnostic evaluations can be made Degree of performance is not important . . . reaching the standard is Performance linked to specific outcomes Individuals know exactly what is expected of them These are advantages of using CRS Chapter 7

Limitations of Criterion-Referenced Measurement Cutoff scores always involve subjective judgment Misclassifications can be severe Student motivation can be impacted; frustrated/bored However, there are limitations to CRT Chapter 7

Setting a Cholesterol “Cut-Off” N of deaths Setting a cut-off standard or a health standard is difficult. There are typically to levels of cholesterol that are used as standards. With CRT the standard must be related to a criterion. Here the criterion is health (i.e., number of deaths). Can the students identify the “break-points” in this distribution? It is not a “clear-cut” as some would have you believe. Cholesterol mg/dl Chapter 7

Setting a Cholesterol “Cut-Off” N of deaths The two most often utilized cut-points (CRS) for cholesterol are 200 and 240 mg/dl. Notice there is not a great increase in risk at 200 mg/dl but more so at 240 mg/dl. Cholesterol mg/dl Chapter 7

Considerations with CRT The same as norm-referenced testing Reliability Consistency of measurement Validity Truthfulness of measurement The psychometric issues in CRT are the same as with NRT (i.e., reliability and validity). However, the procedures used to determine reliability and validity are somewhat different. However, the general nature of the procedures is much like that with NRT. Chapter 7

Statistical Analysis of CRTs Nominal Data Contingency Table Development (2x2 Chi2) Phi Coefficient (PPM for dichotomous variables) Chi-Square Analysis The scores are categorical (i.e., nominal) in nature so the analyses must match the nature of the variables being measured. A contingency table is a 2X2 table that you will see several example of. The phi coefficient is the PPM correlation between two dichotomous variables (coded 0 and 1). The Chi-Square analysis (presented in chapter 5) is used with categorical variables. Chapter 7

CRT Reliability Test/Retest of a single measure Fail Day 1 Pass Day 2 Note this is a reliability example because the people are tested on two occasions (Day 1 and Day 2) of the SAME measure and interest focuses on whether or not the person met the criterion on each day Chapter 7

CRT Validity Use of a field test and criterion measure Fail Field Test Pass Criterion Note this is a validity example because the people are tested on a field test and a criterion. The use of a criterion measure generally makes this a validity example Interest focuses on whether or not the field test provides a good estimate of how well the person would perform on the criterion measure Chapter 7

Figure 7.1 (a) FITNESSGRAM Standards (1987) Below the criterion VO2max Above the criterion VO2max 24 (4%) 21 64 (11%) 472 (81%) Did not achieve the standard on the run/walk test Notice the number of correct classifications here (472 + 24) or 85% P = 85% is proportion of agreement. The Kappa coefficient and Chi-Square test of association can also be calculated for these data. Compare these results with the next picture using the standards for the Physical Best Test. Did achieve the standard on the run/walk test Chapter 7

Figure 7.1 (b) AAHPERD Standards (1988) Below the criterion VO2max Above the criterion VO2max 130 (22%) 23 (4%) 201 (35%) 227 (39%) Did not achieve the standard on the run/walk test Notice the number of correct classifications here (227 + 130) or 61% P = 61% is proportion of agreement. Compare these standards with those from Figure 7.1(a) and see that the standards for the FITNESSGRAM correctly classify more individuals. The Kappa coefficient and Chi-Square test of association can also be calculated for these data. Did achieve the standard on the run/walk test Chapter 7

Meeting Criterion-Referenced Standards Possible Decisions   Truly Below Criterion Truly Above Criterion Did not achieve standard Correct Decision False Positive Did achieve standard False Negative Note the correct decisions are made when people who actually meet the criterion pass the field test and those who actually fail the criterion do NOT achieve the field test standard. Note the incorrect decisions that occur – a false negative occurs when the person achieves the standard on the field test but does not meet the standard on the criterion. A false positive occurs when a person fails to achieve the field test standard but truly achieves the standard on the criterion. Students sometimes get confused with false positive and false negative. Use the example of them going to the MD for a tests of whether they have a disease. They want the results to come back “negative.” If it is a “false negative” – that means they actually have the disease but the test told them that they did not. A “false positive” from your physician says that the lab results tell you that you HAVE the disease but you truly do not. Chapter 7

Table 7.1 Test-Retest Reliability Example Day 2 Day 1 Did not achieve the standard Did achieve the standard 80 20 100 50 250 300 130 270 400 P=n1+n3/N OR (80+250)/400 K=multiply the marginals/n2 and sum OR 130*100/400sq=.081 & 270*300/4002=.506 THEN .081+.506=.587=Pc SO K=(P-Pc)/(1-Pc): (.825-.587)/1-.587 = .238/.413=.576 P = .825 K = .576 Phi = .586 C2 = 137.13, df = 1, p < .001 Chapter 7

Table 7-2 Criterion-Referenced Equivalence Reliability Between the 1 Mile Run/Walk and PACER Tests Total sample Boys Girls Trial 1 P .76 .83 .66 K .51 .65 .33 Trial 2 .71 .43 .52 .30 Use the text to describe the equivalence nature of these statistics. Both the 1 Mile Run/Walk and the PACER were administered to all subjects. Scores were converted to meeting or not meeting the criterion before conducting this analysis. Chapter 7

Figure 7.3 A theoretical example of the divergent group method Point out that a possible cut-point for this distribution of physical activity might be where the two curves cross. There will obviously be some misclassifications but only a few. This points out that no test will be totally valid. Chapter 7

Examples of Criterion Referenced Standards Cholesterol < 240 mg / dl Systolic Blood Pressure < 140 mmHg Diastolic Blood Pressure < 90 mmHg FITNESSGRAM 1-mile run time for boy age 10 < 11:30

Criterion-referenced Measurement Find a friend: Explain one thing that you learned today and share WHY IT MATTERS to you as a future professional Chapter 3