Reliability. Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring.

Slides:

Advertisements

Similar presentations

Conceptualization and Measurement

Advertisements

Chapter Eight & Chapter Nine

Consistency in testing

Topics: Quality of Measurements

Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.

Procedures for Estimating Reliability

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Psychological Testing Principle Types of Psychological Tests  Mental ability tests Intelligence – general Aptitude – specific  Personality scales Measure.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

Statistical Issues in Research Planning and Evaluation

Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.

Reliability and Validity of Research Instruments

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.

Concept of Measurement

RELIABILITY consistency or reproducibility of a test score (or measurement)

A quick introduction to the analysis of questionnaire data John Richardson.

MGTO 231 Human Resources Management Personnel selection I Dr. Kin Fai Ellick WONG.

LECTURE 5 TRUE SCORE THEORY. True Score Theory OBJECTIVES: - know basic model, assumptions - know definition of reliability, relation to TST - be able.

Session 3 Normal Distribution Scores Reliability.

Research Methods in MIS

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Classical Test Theory By ____________________. What is CCT?

Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.

Characteristics of Psychological Tests

Measurement and Data Quality

Validity and Reliability

Reliability, Validity, & Scaling

VALIDITY, RELIABILITY, and TRIANGULATED STRATEGIES

Instrumentation.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.

Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.

Tests and Measurements Intersession 2006.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.

Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Reliability Ability to produce similar results when repeated measurements are made under identical conditions. Consistency of the results Can you get.

Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.

Chapter 6 - Standardized Measurement and Assessment

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.

Measurement and Scaling Concepts

Classical Test Theory Margaret Wu.

پرسشنامه کارگاه.

PSY 614 Instructor: Emily Bullock, Ph.D.

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

Reliability

Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring instruments are imperfect, the score observed for each person almost always differs from the person’s true ability or characteristic.

We never no the true score, only estimate it. If E is very small or zero, you can that the measurement is very reliable. If E is very big, then we can not trust the observed score is close to the true score – not reliable….

What is reliability? The extent to which a score or measure is free of measurement error. Theoretically, reliability is the ratio of true score variance to observed score variance. The ratio can be estimated using a variety of correlation methods, including coefficient alpha, spilt-half, test-retest and parallel forms.

Observed scores and reliabilities. Which one is the most reliable? A B C T (true score, 진점수 )E (error, 오차 ) X (observed score, 관찰점수 ) T (true score, 진점수 )E (error, 오차 ) X (observed score, 관찰점수 ) T (true score, 진점수 )E (error, 오차 ) X (observed score, 관찰점수 )

If you repeat testing the same person over and over, you may get close to the true score.

Which one uses the most reliable measurement?

Can you estimate their true scores? 송이민준휘경 아동학 적성 검사 (12 회 반복 ) 3, 4,4,4, 5,5,5,5,6,6,7 9 1,4,5,5,6,6,7,7,7,7,8,9 2,5,5,5,6,6,6,6,6, 7,7,9

Source of Error and Reliability There are many sources of errors that make observed scores deviated from your true scores!!! You may spoil your exam because of loud noises or hot temperatures in the room. You may spoil your exam because the test items are not representative of what you have studied. We can fix it (but still not perfect!!)

Test-retest method Source of error- time sampling Same test given at two points in time Correlation between scores obtained on the two occasions.

Which test is more reliable? Test ATest B Test 1Test 2Test1Test2 민준 7788 휘경 7678 송이 4756 수정 5479 인욱 4323 철수 7756 영수 5957 지영 6856 옥경 8754 효진 9789

Test A r=.33Test B r=.90

Alternate forms or parallel forms Source of error-item sampling errors, selected items are not representative of what to be measured. Two parallel forms of tests are administered to the same group of people on the same day. Correlation between equivalent forms of the test that have different items. A large pool of items, two parallel forms

Results of the two parallel types of tests Type AType B 민준 76 휘경 66 송이 76 수정 44 인욱 34 철수 75 영수 97 지영 87 옥경 77 A pool of items Type A Type B

r=.879

Split-half method Source of error-internal consistency. Some items in the test are measuring something else. Consistency of items in within the same test. Corrected correlation between two halves of the test.

Split-half method 민준 76 휘경 65 송이 75 수정 43 인욱 34 철수 75 영수 98 지영 87 옥경 77 A 20 item test

r=.869

Coefficient alpha Source of error-internal consistency. Some items in the test are measuring something else. Corrected correlation between items and total scores. The most popular method.

Which item is measuring something else? A nine item test and four people took the test….

SPSS Coefficient Alpha results

How reliable is reliable? In basic psychological research it has been suggested that reliability estimates in the range of.70 and.80 are good enough for most purposes Reliabilities greater than.95 are not very useful because all of the items are testing essentially the same thing and that the measure could easily be shortened. In clinical settings, high reliability is extremely important. When tests are used to make important decisions about someone’s future, evaluators must be certain to minimize any error in classification. Thus, a test with a reliability of.90 might not be good enough.

Questions Do we ever know the true score in psychological testing? Explain the reliability using the terms, true score, observed score, and errors. Can you explain the source of errors that make test scores unreliable and the possible ways to fix them? You have a serious medical problem and you have to make a decisions based on a test result. How reliable do you want the test to be?