RELIABILITY consistency or reproducibility of a test score (or measurement)

Slides:



Advertisements
Similar presentations
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Advertisements

Topics: Quality of Measurements
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
VALIDITY AND RELIABILITY
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
VALIDITY.
Reliability and Validity in Experimental Research ♣
Concept of Measurement
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Validity, Reliability, & Sampling
Research Methods in MIS
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Reliability, Validity, & Scaling
Chapter 2: The Research Enterprise in Psychology
Measurement in Exercise and Sport Psychology Research EPHE 348.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Chapter 1: Research Methods
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Correlational Research Chapter Fifteen Bring Schraw et al.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Independent vs Dependent Variables PRESUMED CAUSE REFERRED TO AS INDEPENDENT VARIABLE (SMOKING). PRESUMED EFFECT IS DEPENDENT VARIABLE (LUNG CANCER). SEEK.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
Generalizability Theory Nothing more practical than a good theory!
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
EPSY 546: LECTURE 3 GENERALIZABILITY THEORY AND VALIDITY
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Measurement Issues General steps –Determine concept –Decide best way to measure –What indicators are available –Select intermediate, alternate or indirect.
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Chapter Eight: Quantitative Methods
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Chapter 6 - Standardized Measurement and Assessment
Writing A Review Sources Preliminary Primary Secondary.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Measuring Research Variables
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Reliability. Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Measurement and Scaling Concepts
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Reliability and Validity
Questions What are the sources of error in measurement?
Assessment Theory and Models Part II
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Understanding Results
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

RELIABILITY consistency or reproducibility of a test score (or measurement)

Common approaches to estimating reliability n Classical True Score Theory – test-retest, alternate forms, internal consistency n useful for estimating relative decisions – intraclass correlation n useful for estimating absolute decisions n Generalizability Theory – can estimate both relative & absolute

Reliability is a concept central to all behavioral sciences. To some extent all measures are unreliable. This is especially true with psychological measures and measurements based on human observation

Sources of Error n Random – fluctuations in the measurement based purely on chance. n Systematic – Measurement error that affect a score because of some particular characteristic of the person or the test that has nothing to due with the construct being measured.

CTST n X = T + E – Recognizes only two sources of variance n test -retest (stability) n alternate forms (equivalence in item sampling) n test-retest with alternate forms (stability & equivalence but these are confounded) – Cannot adequately estimate individual sources of error influencing a measurement

ICC n Uses ANOVA to partition variance due to between subjects and within subjects – Has some ability to accommodate multiple sources of variance – Does not provide an integrated approach to estimating reliability under multiple conditions

Generalizability Theory The Dependability of Behavioral Measures, (1972) Cronbach, Glaser, Nanda, & Rajaratnam

Dependability The accuracy of generalizing from a person’s observed score on a measure to the average score that person would have received under all possible testing conditions the tester would be willing to accept.

The Decision Maker n The score on which the decision is to be based is only one of many scores that might serve the same purpose. The decision maker is almost never interested in the response given to the particular moment of testing. n Ideally the decision should be based on that person’s mean score over all possible measurement occasions.

Universe of Generalization n Definition & establishment of the universe admissible observations: – observations that the decision maker is willing to treat as interchangeable. – all sources of influence acting on the measurement of the trait under study. n What are the sources of ERROR influencing your measurement?

Generalizability Issues n Facet of Generalization – raters, trials, days, clinics, therapists n Facet of Determination – usually people, but can vary (e.g. raters)

Types of Studies n Generalizability Study (G-Study) n Decision Study (D-Study)

G-Study n Purpose is to anticipate the multiple uses of a measurement. n To provide as much information as possible about the sources of variation in the measurement. n The G-Study should attempt to identify and incorporate into its design as many potential sources of variation as possible.

D-Study n Makes use of the information provided by the G- Study to design the best possible application of the measurement for a particular purpose. n Planning a D-Study: – defines the Universe of Generalization – specifies the proposed interpretation of the measurement. – uses G-Study information to evaluate the effectiveness of alternative designs for minimizing error and maximizing reliability.

Design Considerations n Fixed Facets n Random Facets

Fixed Facet n When the levels of the facet exhaust all possible conditions in the universe to which the investigator wants to generalize. n When the level of the facet represent a convenient sub-sample of all possible conditions in the universe.

Random Facets n When it is assumed that the levels of the facet represent a random sample of all possible levels described by the facet. n If you are willing to EXCHANGE the conditions (levels) under study for any other set of conditions of the same size from the universe.

Types of Decisions n Relative – establish a rank order of individuals (or groups). – the comparison of a subject’s performance against others in the group. n Absolute – to index an individual’s (or group’s) absolute level of measurement. – measurement results are to be made independent from the performance of others in the group.

Statistical Modeling ANOVA – just as ANOVA partitions a dependent variable into effects for the independent variable (main effects & interactions), G-theory uses ANOVA to partition an individual’s measurement score into an effect for the universe-score and an effect for each source of error and their interactions in the design.

n In ANOVA we were driven to test specific hypotheses about our independent variables and thus sought out the F statistic and p- value. n In G-theory we will use ANOVA to partition the different sources of variance and then to estimate their amount (Variance Component). Statistical Modeling

One Facet Design n 4 Sources of Variability – systematic differences among subjects n (object of measurement) – systematic differences among raters (occasions, items) – subjects*raters interaction – random error confounded

Two Facet Design Components of Variance n Example of a fully crossed two facet design (Kroll, et. al.) n Seven sources of variance are estimated: – subjects – raters – observations – s x r – s x o – r x o – s x r x o,e

Variance Components Subjects (s)Observations (o) Raters (r) (s x r)(o x r) (s x o) (s x r x o) + Error

Relative Error Facet of Determination: Subjects Subjects (s)Observations (o) Raters (r) (s x r)(o x r) (s x o) (s x r x o) + Error F 2 rel = F 2 sr /n r + F 2 so /n o + F 2 sro,e /n r n o

Absolute Error Facet of Determination: Subjects Subjects (s)Observations (o) Raters (r) (s x r)(o x r) (s x o) (s x r x o) + Error F 2 abs = F 2 r /n r + F 2 o /n o + F 2 sr /n r + F 2 so /n o + F 2 or /n o n r + F 2 sro,e /n o n r

Generalizability Coefficients AKA: Reliability Coefficients Absolute Generalizability Coefficient for Subjects: F 2 s  = F 2 s + F 2 abs Relative Generalizability Coefficient for Subjects: F 2 s  2 = F 2 s + F 2 rel

D-Study: consider raters as fixed, no generalization made to other raters Subjects (s)Observations (o) Raters (r) sxosxo sxrsxroxroxr s x o x r,e Variation in s is no longer affected by raters: a.) average over the levels the fixed effect b.) analyze the fixed effect levels separately F s 2* = F s 2 + 1/n r F ro 2 F o 2* = F o 2 + 1/n r F ro 2 F so,e 2* = F so 2 + 1/n r F sor,e 2 Consider (a) averaging results in a s x o design

Subjects (s)Observations (o)so,e F 2 rel = F 2 so,e /n o F 2 abs = F 2 o /n o + F 2 so,e /n o Error