-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 101521090 授課老師 : 蔡章仁.

Slides:

Advertisements

Similar presentations

Chapter 8 Flashcards.

Advertisements

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Consistency in testing

Topics: Quality of Measurements

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

The Department of Psychology

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Lesson Six Reliability.

1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Explaining Cronbach’s Alpha

Methods for Estimating Reliability

Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.

Reliability and Validity of Research Instruments

Reliability n Consistent n Dependable n Replicable n Stable.

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.

Research Methods in MIS

Classroom Assessment A Practical Guide for Educators by Craig A

Validity/Reliability

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.

Reliability, Validity, & Scaling

Foundations of Educational Measurement

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

Reliability Lesson Six

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.

Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Tests and Measurements Intersession 2006.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Reliability n Consistent n Dependable n Replicable n Stable.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

MEASUREMENT: PART 1. Overview  Background  Scales of Measurement  Reliability  Validity (next time)

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.

Chapter 6 Norm-Referenced Reliability and Validity.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Professor Jim Tognolini

RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS

Classical Test Theory Margaret Wu.

Reliability & Validity

Calculating Reliability of Quantitative Measures

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

The first test of validity

15.1 The Role of Statistics in the Research Process

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 101521090 授課老師 : 蔡章仁

Reliability From the perspective of classical test theory, an examinee's obtained test score (X) is composed of two components, a true score component (T) and an error component (E): X=T+E

Reliability The true score component reflects the examinee's status with regard to the attribute that is measured by the test, while the error component represents measurement error. Measurement error is random error. It is due to factors that are irrelevant to what is being measured by the test and that have an unpredictable (unsystematic) effect on an examinee's test score.

Reliability The score you obtain on a test is likely to be due both to the knowledge you have about the topics addressed by exam items (T) and the effects of random factors (E) such as the way test items are written, any alterations in anxiety, attention, or motivation you experience while taking the test, and the accuracy of your "educated guesses."

Reliability Whenever we administer a test to examinees, we would like to know how much of their scores reflects "truth" and how much reflects error. It is a measure of reliability that provides us with an estimate of the proportion of variability in examinees' obtained scores that is due to true differences among examinees on the attribute(s) measured by the test.

Consistency = Reliability When a test is reliable, it provides dependable, consistent results and, for this reason, the term consistency is often given as a synonym for reliability (e.g., Anastasi, 1988). Consistency = Reliability

The Reliability Coefficient Ideally, a test's reliability would be calculated by dividing true score variance by the obtained (total) variance to derive a reliability index. This index would indicate the proportion of observed variability in test scores that reflects true score variability. True Score Variance/Total Variance = Reliability Index

The Reliability Coefficient A test's true score variance is not known, however, and reliability must be estimated rather than calculated directly. There are several ways to estimate a test's reliability. Each involves assessing the consistency of an examinee's scores over time, across different content samples, or across different scorers. The common assumption for each of these reliability techniques that consistent variability is true score variability, while variability that is inconsistent reflects random error.

The Reliability Coefficient Most methods for estimating reliability produce a reliability coefficient, which is a correlation coefficient that ranges in value from 0.0 to + 1.0. When a test's reliability coefficient is 0.0, this means that all variability in obtained test scores is due to measurement error. Conversely, when a test's reliability coefficient is + 1.0, this indicates that all variability in scores reflects true score variability.

The Reliability Coefficient The reliability coefficient is symbolized with the letter "r" and a subscript that contains two of the same letters or numbers (e.g., ''rxx''). The subscript indicates that the correlation coefficient was calculated by correlating a test with itself rather than with some other measure.

The Reliability Coefficient Regardless of the method used to calculate a reliability coefficient, the coefficient is interpreted directly as the proportion of variability in obtained test scores that reflects true score variability. For example, as depicted in Figure 1, a reliability coefficient of .84 indicates that 84% of variability in scores is due to true score differences among examinees, while the remaining 16% (1.00 - .84) is due to measurement error. True Score Variability (84%) Error (16%) Figure 1. Proportion of variability in test scores

The Reliability Coefficient Note that a reliability coefficient does not provide any information about what is actually being measured by a test! A reliability coefficient only indicates whether the attribute measured by the test— whatever it is—is being assessed in a consistent, precise way. Whether the test is actually assessing what it was designed to measure is addressed by an analysis of the test's validity.

METHODS FOR ESTIMATING RELIABILITY The selection of a method for estimating reliability depends on the nature of the test. Each method not only entails different procedures but is also affected by different sources of error. For many tests, more than one method should be used.

TYPES OF RELIABILITY Test-retest reliability A measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. Parallel forms reliability A measure of reliability obtained by administering different versions of an assessment tool Inter-rater reliability A measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions Internal consistency reliability Average inter-item correlation Split-half reliability Two subset http://www.uni.edu/chfasoa/reliabilityandvalidity.htm

INTERNAL CONSISTENCY RELIABILITY Split-half reliability and coefficient alpha are two methods for evaluating internal consistency. Both involve administering the test once to a single group of examinees, and both yield a reliability coefficient that is also known as the coefficient of internal consistency.

Internal Consistency Reliability To determine a test's split-half reliability, the test is split into equal halves so that each examinee has two scores (one for each half of the test). Scores on the two halves are then correlated. Tests can be split in several ways, but probably the most common way is to divide the test on the basis of odd- versus even-numbered items.

SPLIT-HALF METHODOLOGY EXAMPLE Example 1: 12 students take a test with 50 questions. For each student the total score is recorded along with the sum of the scores for the even questions and the sum of the scores for the odd question as shown in Figure 1. Determine whether the test is reliable by using the split-half methodology. The statistical test consists of looking at the correlation coefficient (cell G3 of Figure 1). If it is high then the questionnaire is considered to be reliable. r = CORREL(C4:C15,D4:D15) = 0.667277

SPLIT-HALF METHODOLOGY EXAMPLE One problem with the split-half reliability coefficient is that since only half the number of items is used the reliability coefficient is reduced. To get a better estimate of the reliability of the full test, we apply the Spearman-Brown correction, namely: =0.800439

Internal Consistency Reliability A problem with the split-half method is that it produces a reliability coefficient that is based on test scores that were derived from one-half of the entire length of the test. If a test contains 30 items, each score is based on 15 items. Because reliability tends to decrease as the length of a test decreases, the split-half reliability coefficient usually underestimates a test's true reliability. For this reason, the split-half reliability coefficient is ordinarily corrected using the Spearman-Brown prophecy formula, which provides an estimate of what the reliability coefficient would have been had it been based on the full length of the test.

Internal Consistency Reliability Cronbach's coefficient alpha also involves administering the test once to a single group of examinees. However, rather than splitting the test in half, a special formula is used to determine the average degree of inter-item consistency. One way to interpret coefficient alpha is as the average reliability that would be obtained from all possible splits of the test. Coefficient alpha tends to be conservative and can be considered the lower boundary of a test's reliability (Novick and Lewis, 1967). When test items are scored dichotomously (right or wrong), a variation of coefficient alpha known as the Kuder-Richardson Formula 20 (KR-20) can be used.

KUDER-RICHARDSON FORMULA 20 The Kuder and Richardson Formula 20 test checks the internal consistency of measurements with dichotomous choices. It is equivalent to performing the split half methodology on all combinations of questions and is applicable when each question is either right or wrong. A correct question scores 1 and an incorrect question scores 0. The test statistic is where k = number of questions pj = number of people in the sample who answered question j correctly qj = number of people in the sample who didn’t answer question j correctly σ2 = variance of the total scores of all the people taking the test = VARP(R1) where R1 = array containing the total scores of all the people taking the test.

KUDER-RICHARDSON FORMULA 20 EXAMPLE(1/2) Example 1: A questionnaire with 11 questions is administered to 12 students. The results are listed in the upper of Figure 1. Determine the reliability of the questionnaire using Kuder and Richardson Formula 20. Figure 1 – Kuder and Richardson Formula 20 for Example 1

KUDER-RICHARDSON FORMULA 20 EXAMPLE(2/2) The values of p in row 18 are the percentage of students who answered that question correctly. We can calculate ρKR20 as described in Figure 2. The value ρKR20 = 0.738 shows that the test has high reliability. Figure 2 – Key formulas for worksheet in Figure 1

Internal Consistency Reliability Content sampling is a source of error for both split-half reliability and coefficient alpha. For split-half reliability, content sampling refers to the error resulting from differences between the content of the two halves of the test (i.e., the items included in one half may better fit the knowledge of some examinees than items in the other half); for coefficient alpha, content (item) sampling refers to differences between individual test items rather than between test halves.

CRONBACH'S COEFFICIENT ALPHA One problem with the split-half method is that the reliability estimate obtained using any random split of the items is likely to differ from that obtained using another. One solution to this problem is to compute the Spearman-Brown corrected split-half reliability coefficient for every one of the possible split-halves and then find the mean of those coefficients. This mean is known as Cronbach’s alpha. Cronbach’s alpha is superior to Kuder and Richardson Formula 20 since it can be used with continuous and non-dichotomous data. In particular, it can be used for testing with partial credit and for questionnaires using a Likert scale.

CRONBACH'S COEFFICIENT ALPHA Definition 1: Given variable x1, …, xk and x0 = and Cronbach’s alpha is defined to be

CRONBACH'S COEFFICIENT ALPHA EXAMPLE(1/2) Example 1: Calculate Cronbach’s alpha for the data in Example 1 of Kuder and Richardson Formula 20 (repeated in Figure 1 below).

CRONBACH'S COEFFICIENT ALPHA EXAMPLE(2/2) Row 17 contains the variance for each of the questions. E.g. the variance for question 1 (cell B17) is calculated by the formula =VARP(B4:B15). Other key formulas used to calculate Cronbach’s alpha in Figure 1 are described in Figure 2. Since the questions only have two answers, Cronbach’s alpha .73082 is the same as the KR20 reliability calculated in Example 1 Figure 2 – Key formulas for worksheet in Figure 1

Internal Consistency Reliability Content sampling is a source of error for both split-half reliability and coefficient alpha. For split-half reliability, content sampling refers to the error resulting from differences between the content of the two halves of the test (i.e., the items included in one half may better fit the knowledge of some examinees than items in the other half); for coefficient alpha, content (item) sampling refers to differences between individual test items rather than between test halves.

Internal Consistency Reliability Coefficient alpha also has as a source of error, the heterogeneity of the content domain. A test is heterogeneous with regard to content domain when its items measure several different domains of knowledge or behavior.

Internal Consistency Reliability The greater the heterogeneity of the content domain, the lower the inter-item correlations and the lower the magnitude of coefficient alpha. Coefficient alpha could be expected to be smaller for a 200-item test that contains items assessing knowledge of test construction, statistics, ethics, epidemiology, environmental health, social and behavioral sciences, rehabilitation counseling, etc. than for a 200-item test that contains questions on test construction only.

Internal Consistency Reliability The methods for assessing internal consistency reliability are useful when a test is designed to measure a single characteristic, when the characteristic measured by the test fluctuates over time, or when scores are likely to be affected by repeated exposure to the test. They are not appropriate for assessing the reliability of speed tests because, for these tests, they tend to produce spuriously high coefficients. (For speed tests, alternate forms reliability is usually the best choice.)

Thanks for your attention!!