LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Slides:

Advertisements

Similar presentations

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.

Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

Topics: Quality of Measurements

The Research Consumer Evaluates Measurement Reliability and Validity

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Part II Sigma Freud & Descriptive Statistics

Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.

Reliability and Validity of Research Instruments

RESEARCH METHODS Lecture 18

Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.

Concept of Measurement

Beginning the Research Design

Characteristics of Sound Tests

Validity, Reliability, & Sampling

Research Methods in MIS

Chapter 7 Correlational Research Gay, Mills, and Airasian

Classroom Assessment A Practical Guide for Educators by Craig A

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Measurement and Data Quality

Validity and Reliability

Instrumentation.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Technical Adequacy Session One Part Three.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

The Basics of Experimentation Ch7 – Reliability and Validity.

Reliability & Validity

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Tests and Measurements Intersession 2006.

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

EDU 8603 Day 6. What do the following numbers mean?

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.

Measurement MANA 4328 Dr. Jeanne Michalski

Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Chapter 6 - Standardized Measurement and Assessment

Reliability and Validity in Testing. What is Reliability? Consistency Accuracy There is a value related to reliability that ranges from -1 to 1.

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

PSYCHOMETRICS. SPHS 5780, LECTURE 6: PSYCHOMETRICS, “STANDARDIZED ASSESSMENT”, NORM-REFERENCED TESTING.

Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.

Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

Measurement and Scaling Concepts

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Ch. 5 Measurement Concepts.

Lecture 5 Validity and Reliability

Reliability & Validity

Basic Statistics for Non-Mathematicians: What do statistics tell us

The first test of validity

Presentation transcript:

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS

RIGOR OF ASSESSMENT IN NORM-REFERENCED TESTING (HUTCHINSON, 1996)

RIGOR OF ASSESSMENT (PART OF ASSESSING PSYCHOMETRIC ADEQUACY) Validity  Extent which a procedure actually measures what it claims to measure Reliability  Consistency of response/performance elicitation Remember: Can be applied to both norm-referenced and criterion referenced testing

RIGOR OF ASSESSMENT IN NORM- REFERENCED TESTING: SUBTOPIC = VALIDITY

ASSESSING VALIDITY IN NORM-REFERENCED TESTING Definition of and evidence for validity  Extent which a procedure actually measures what it is supposed to measure  Defined relative to a specific purpose  E.g. valid for screening, but not valid for Tx planning  Issue of the quality and extent of available evidence  Logical analysis  Empirical data

TYPES OF VALIDITY (H&P, 2012) Construct validity “Degree to which a test measures the theoretical construct it is intended to measure”

Content validity Degree to which the content of a test is consistent with the purpose of a test --appropriateness of items --completeness of the item sample --the way in which the items assess the content Cf. face validity, which has surface appearance of content validity TYPES OF VALIDITY (H&P, 2012)

Criterion-related validity Degree to which the test performance predict performance on other (external) criteria --subtype = predictive Ability to predict score on future test in related area --subtype = concurrent compared to present performance on other tests in related area TYPES OF VALIDITY (H&P, 2012)

SOURCES OF EVIDENCE OF VALIDITY, (HUTCHINSON, 1996) Evidence used to support the argument that a test is valid for its stated purpose  First source category = Logical evidence  Test’s purpose well stated  Construct (theory/framework) well defined  Good rationale for content of the test, which includes documentation that both easy and hard test items have been included, to discriminate disorder Key concept: Are the test authors’ logically-based arguments convincing?

SOURCES OF EVIDENCE OF VALIDITY, (HUTCHINSON, 1996) Evidence used to support the argument that a test is valid for its stated purpose  Second source category = Empirical evidence  Correlation (r), a measure of relationship between ____________________ and _____________________  Good prediction of group membership with measures of __________________ and _____________________  Pattern of relationship among sub-test results should match the pattern predicted by the construct  Via correlation  Via factor analysis Key concept: Are the test authors’ empirically-based arguments convincing?

What are the labels on the axes when one uses correlation as evidence for validity? Empirical evidence for validity, using correlation… Measure of relationship between _____________ and ____________

Empirical evidence for validity, using correlation… Measure of relationship between _____________ and ____________ Is the test authors’ empirical argument convincing? What evidence is given to describe the relationship between the test of interest and others considered to be similar? Note that valid tests should also have low correlations with test measuring different parameters

Sensitivity--the test’s accuracy in correctly identifying the clients WITH the disorder Specificity-- the test’s accuracy in correctly identifying the clients WITHOUT the disorder Empirical evidence for validity, using measures of sensitivity and specificity…

Empirical evidence for validity, using measures of sensitivity and specificity… Let’s “visualize” these concepts

Empirical evidence for validity, using measures of sensitivity and specificity… In the test manual, we’re looking for reports of high specificity and high sensitivity. Is the test authors’ empirical argument convincing? What evidence is given to support the accuracy of this test in classifying subjects into already- established performance categories? Do you see how this type of evidence for validity is directly related to the purpose of norm- referenced tests?

Empirical evidence for validity, using patterns of correlations among subtests, to see if the patterns fit what the construct would predict (construct in this example = what makes up writing ability?) Is the test authors’ empirical argument convincing? What statistical data support the relationship among separate components of the test or their relationship with the overall contruct?

Empirical evidence for validity, using factor analysis of sub-test scores, e.g. to see if patterns of factor loadings follow what the construct of writing ability would predict I: “Writer’s development of the work” II: “Writer’s fluency with mechanics” III: “Sentence structure” IV: “Writer’s orientation to the reader” Is the test authors’ empirical argument convincing?

RIGOR OF ASSESSMENT IN NORM- REFERENCED TESTING: SUBTOPIC = RELIABILITY

Reliability  Consistency of response/performance elicitation (includes consistency of scoring and measurement) Remember….

TYPES OF RELIABILITY, AND EVIDENCE FOR THEM Agreement OR Inter-rater reliability  Correlation of scores of two raters (good = )*  Item by item or total score Stability OR Test-retest reliability  Correlation of scores from two separate test administrations with same person, across testees (good = )* (continued….) Can you see why the authors should optimally provide reliability scores for: 1) each age group separately? 2) both normal and disordered groups?

TYPES OF RELIABILITY, AND EVIDENCE FOR THEM (CONT.) Internal consistency OR split-half reliability  Split test in two halves and obtain correlation between the two sets: Measured as r  E.g. Split top from bottom  E.g. split even items from odd items  Test items assigned to two halves through random assignment, and obtain r. Then do this again, and again, and again…..  “Average” all the r’s = Cronbach’s coefficient alpha

What are the labels on the axes when one uses correlation as evidence for --inter-rater reliability? --test/retest reliability? --split half reliability? Empirical evidence for reliability, using patterns of correlations…

Think: Even when a test is very carefully designed and reliable (consistent) in its ability to measure a construct (e.g. narrative comprehension), a client’s responses to test items may not always reflect a true picture of his underlying ability (e.g. his true ability to understand narrative passages). Error in measurement cannot be avoided, especially when measuring human performance. Even with the most reliable test, what are some of the other factors that affect a client’s performance on a test, on a given day? Transition slide from topic of reliability to topic of Standard Error of Measurement (SEM) Observed score = the actual raw score that a test-taker earns True score = hypothetical “ideal” score that the person would have earned if there were no error in measurement

STANDARD ERROR OF MEASUREMENT SEM If a person took a test 100 times, their scores: 1) would tend to fall near some central score (represented by a measure of central tendency, such as the average), e.g. 42 2) would deviate from the central score (due to error of measurement) in predictable way, with most of them not too far from the center  The “average deviation” (or “average distance”) from the central score is known as the standard deviation, e.g. 2.  This standard deviation (“average deviation”) due to error of measurement is called the standard error of measurement (SEM), e.g. 2 away from 42 (either above or blow) Number of times the person earned the score few many Score __ Can you fill in the values that would be two SEM away from the average?

STANDARD ERROR OF MEASUREMENT SEM Now, test-makers don’t really calculate SEM by giving people a test 100 times! They calculate SEM using: 1)estimates of the test’s reliability (at least one of the three types) 2)the distribution of scores earned by the normative sample 3)the way in which reliability varies at different score levels SO, clinicians don’t calculate SEM. SEM is provided in the test manual to help guide us in our interpretation of a client’s score. Number of times the person earned the score few many Score __ Can you fill in the values that would be two SEM away from the average?

STANDARD ERROR OF MEASUREMENT SEM 68% of the scores would be predicted to fall within one SEM of the average e.g. we could predict that 68/100 would fall between 40 and 44 95% of the scores would be predicted to fall within two SEMs of the average e.g. we could predict that 95/100 would fall between ____ and ____ Number of times the person earned the score few many Score __

SEM AND ITS RELATIONSHIP TO CONFIDENCE INTERVALS (See Hutchinson and H&P readings) Observed score The actual raw score that the test taker earns True score The score that the person would have earned if there were no measurement error

SEM AND ITS RELATIONSHIP TO CONFIDENCE INTERVALS (See Hutchinson and H&P readings) + 1 SEM to -1 SEM = 68% confidence interval. We can have 68% confidence that the client’s true score would fall somewhere in this range + 2 SEM to -2 SEM = 95% confidence interval. We can have 95% confidence that the client’s true score would fall somewhere in this range

INTERPRETATION OF CONFIDENCE INTERVAL RELATIVE TO CUT-OFF SCORE How do we interpret performance when confidence interval : a)is completely above the cut-off score? b)is completely below the cut-off score? c)straddles the cut-off score?

LECTURE 06B ENDS HERE