Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Slides:

Advertisements

Similar presentations

Questionnaire Development

Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

Topics: Quality of Measurements

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.

Procedures for Estimating Reliability

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Chapter 3. Reliability: As my grand pappy, Old Reliable,

Lesson Six Reliability.

1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Methods for Estimating Reliability

-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 授課老師 : 蔡章仁.

Reliability and Validity of Research Instruments

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Reliability n Consistent n Dependable n Replicable n Stable.

Reliability n Consistent n Dependable n Replicable n Stable.

Reliability and Validity

Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.

Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.

Psych 231: Research Methods in Psychology

Validity, Reliability, & Sampling

Research Methods in MIS

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Measurement and Data Quality

Validity and Reliability

Reliability Presented By: Mary Markowski, Stu Ziaks, Jules Morozova.

Reliability and Validity what is measured and how well.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Reliability Lesson Six

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Technical Adequacy Session One Part Three.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Reliability & Validity

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

Research in Communicative Disorders1 Research Design & Measurement Considerations (chap 3) Group Research Design Single Subject Design External Validity.

Unit 5: Improving and Assessing the Quality of Behavioral Measurement

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Reliability n Consistent n Dependable n Replicable n Stable.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Reliability Ability to produce similar results when repeated measurements are made under identical conditions. Consistency of the results Can you get.

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

RELIABILITY BY DONNA MARGARET. WHAT IS RELIABILITY?  Does this test consistently measure what it’s supposed to measure?  The more similar the scores,

Chapter 6 Norm-Referenced Reliability and Validity.

Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.

Professor Jim Tognolini

Assessment Theory and Models Part II

Classical Test Theory Margaret Wu.

PSY 614 Instructor: Emily Bullock, Ph.D.

The first test of validity

Chapter 8 VALIDITY AND RELIABILITY

Procedures for Estimating Reliability

Presentation transcript:

Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5

Let’s take a quiz…

In a nutshell… Reliability is the consistency of a measurement. Today and next class we will look at: 6 ways to estimate reliability The standard error of measurement What influences reliability Cut scores One slide on usability or practical considerations

Reliability via a mathematical proof. X = T + E X = Obtained or observed score (fallible) T = True score (reflects stable characteristics) E = Measurement error (reflects random error)

Some initial thoughts… All measurement is susceptible to error. Any factor that introduces error into the measurement process effects reliability. To increase our confidence in scores, we try to detect, understand, and minimize random measurement error.

More initial thoughts Just like validity, when talking about reliability one talks about the scores, not the test or assessment. Validity is the sum of its parts, reliability is only valid in its parts. (let me explain…) One must have reliability to have validity. (Again, validity must have consistency.)

Error from Content Sampling… Many theorists believe the major source of random measurement error is “Content Sampling.” Since tests represent performance on only a sample of behavior, the adequacy of that sample is very important.

And from temporal instability… Random measurement error can also be the result “temporal instability:” Random and transient: - Situation-Centered influences (e.g., lighting & noise) - Person-Centered influences (e.g., fatigue, illness)

Or error comes from… Administration errors (e.g., incorrect instructions, inaccurate timing). Scoring errors (e.g., subjective scoring, clerical errors).

Back to correlations… to 1.00 Let’s look on p.106 Correlation coefficients – 2 scores from same group Ex - # of kids and level of stress Validity coefficients are based on some other criterion – predicts or estimates Ex – SAT r GPA Reliability coefficients are based on results within the same procedure – measure same construct Let’s look at 6 options…

Method #1 – Test-Retest What if I gave you the exact same quiz right now? Again? How many of you asked classes ahead of you in the day what was on a quiz? This measures the “stability” of the test. The longer one goes before the retest, the more external factors influence the reliability Two examples GRE Scores and 5 years IEP re-evaluations every 3 years

Method #2 – Equivalent Forms Just make sure they are equivalent All standardized tests use this This measures the equivalence of the test Now your book uses the phrase, “in close succession” Like, the same time close

Method #3 – Test-retest with Equivalent Forms One gets both stability and equivalence How many took the SAT or ACT more than once? Improve? The best method in determining reliability is test-retest with equivalent forms

Method #4 – Split-halves This you can calculate with only 1 administration of the test. Let’s do it to your quiz Spearman’s Rho or Spearman-Brown formula  = 2(r of the halves)/1+r Do an example… This is a reliability r that estimates the full reliability. This is used to measure internal consistency

Method #5 – Coefficient Alpha  Also known as Kuder-Richardson Formula, or KR-20 if dichotomous variables Also is calculated with only one administration of the test to measure internal consistency Technically, it is the average of all split- half coefficients Should be used when the test is specific to a topic

Method #6 – Interrater Consistency Think of the Russian judge in Olympic figure skating How do they handle this now? Table on p.113 Table on p.114 is more likely Raters are trained using rubrics, etc… Back to figure skating…

In summing up these methods… Test-retest with equivalent forms is the best for determining reliability See p.115 Each one can be used for specific types of assessments. It is the assessment which determines which type of reliability you will report.

Standard Error of Measurement You scored x and you scored y; so what did you really score? Scores vary, this estimates how much. Computers make this somewhat doable. Let me show the bell curve and explain standard error of measurement (it’s kinda like standard deviations…) Use IQ for an example…

More on SE M High reliability coefficients gives small SE M, and vice-versa. Statistics can calculate a table… p.122 Really see p.117 SE M = 9.3, John scores a 722 and Mary a 732, are Mary and John different? Finally, two great things: The units one measures with in SE M are in the same units as the measurement itself, and The SE M is normally consistent from group to group on the same assessment.

Factors that influence reliability Have you ever taken a test in a hot room? Hungry? Tired? Three the book deals with

Factor #1 Number of Items or Assessment Tasks Generally, the more the items, the higher the reliability will be. Adding 10 very hard or very easy questions would not increase reliability at all, because…

Factor #2 Spread of Scores The larger the spread, the higher the reliability or 0-200? My physics story

Factor #3 Objectivity Back to MC tests Back to the Russian judges

Fixed Standards (cut scores) Why my cousin-in-law says we test… Do they meet a standard? GWB Who cares the score, can they do it? Back to a rater table…sort of… p126 Cut scores are dangerous, where to put the line? SE M ?

Now for the practical… Are the tests easy to administer? Do you have time constraints? Can you easily score these? Do these scores apply to what you are after? Can you get an equivalent measure? And it’s all about money.