Chapter 4. Validity: Does the test cover what we are told (or believe)

Slides:



Advertisements
Similar presentations
Summative Assessment Kansas State Department of Education ASSESSMENT LITERACY PROJECT1.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
VALIDITY AND RELIABILITY
Testing What You Teach: Eliminating the “Will this be on the final
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 6 Validity.
ASSESSMENT LITERACY PROJECT4 Student Growth Measures - SLOs.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Assessment: Reliability, Validity, and Absence of bias
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Test Validity: What it is, and why we care.
VALIDITY.
SELECTION & ASSESSMENT SESSION THREE: MEASURING THE EFFECTIVENESS OF SELECTION METHODS.
BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Understanding Validity for Teachers
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
FLCC knows a lot about assessment – J will send examples
Ch 6 Validity of Instrument
Near East University Department of English Language Teaching Advanced Research Techniques Correlational Studies Abdalmonam H. Elkorbow.
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Technical Adequacy Session One Part Three.
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Validity & Practicality
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
6. Conceptualization & Measurement
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
Measurement Validity.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Week 4 Slides. Conscientiousness was most highly voted for construct We will also give other measures – protestant work ethic and turnover intentions.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
The Practice of Social Research Chapter 6 – Indexes, Scales, and Typologies.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
“Teaching”…Chapter 11 Planning For Instruction
Chapter 6 - Standardized Measurement and Assessment
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
VALIDITY by Barli Tambunan/
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Concept of Test Validity
Test Validity.
Classroom Assessment Validity And Bias in Assessment.
Validity.
Week 3 Class Discussion.
پرسشنامه کارگاه.
PSY 614 Instructor: Emily Bullock Yowell, Ph.D.
Reliability and Validity of Measurement
VALIDITY Ceren Çınar.
RESEARCH METHODS Lecture 18
Validity and Reliability II: The Basics
Presentation transcript:

Chapter 4. Validity: Does the test cover what we are told (or believe) it covers? To what extent? Is the assessment being used for an appropriate purpose?

Validity Topics: Definition (usual and refined) Categories of validity evidence A. face validity B. content validity : table of specifications, alignment analysis, opportunity to learn C. criterion-related validity D. construct validity E. consequential validity Test fairness

Introduction Without good validity, all else is lost. Validity is the most important characteristic of a test or assessment technique. Usual Definition: It measures what it purports to measure. Refined Definition: It involves the interpretation of a score for a particular purpose or use (because, a score may be valid for one use but not another) It is a matter of degree, not all-or-none. As a practical matter, our concern is to determine the extent (for example in non-mathematical terms we might say: slight, moderate, considerable)

Some Helpful Terms Construct: The trait or characteristic that interests us. We might call it a “target” or “what we want to get at”. We create a test to “cover” this attribute. Validity addresses how well an assessment technique provides useful information about the construct / target. Construct underrepresentation: The test we made is not assessing all of the construct; our test misses things we should be assessing. Construct irrelevant variance: The test we made is assessing things that are not really part of our construct; we are assessing irrelevant stuff that we don’t want. [see next two slides for illustrations]

The Construct and Valid Measurement

Varying Degrees of Construct Underrepresentation and Construct Irrelevant Variance

A. Face Validity Think of the idiom “on the face of it . . .” A test is said to have face validity if it "looks like" it is going to measure what it is supposed to measure Face validity is not empirical; one is saying that the test “appears it will work,” as opposed to saying “it has been shown to work.” Face validity is often “created” to influence the opinions of participants who are not expert in testing methodologies, e.g. test takers, parents, politicians.

B. Content Validity Most used in achievement tests and employment exams Meaning of this type of validity there is a good match between the content of the test and some well-defined domain of knowledge or behavior. Reference to content defines the orientation of the test. For teachers, considered most important type of validity for your own classroom tests achievement tests Where do we find the “well-defined domain” Examination of textbooks in the field with special attention to the learning objectives at beginning of chapter and terms at the end. Curriculum guides of school districts Ohio’s Academic Content Standards So, we now we have the content topics identified, but what should we actually expect “students to know and be able to do” in relation to these topics? This question deals with “process” or “depth” indicators. How should we make sure we include both the content and the depth expected in our tests?

The Table of Specifications Building content validity into my own classroom tests Table of Specifications – this connects the content determined earlier to the mental processes students are expected to employ regarding this content Two way table Content Bloom’s taxonomy (simplest mental operation to the most complex) Each test item I create then falls into one cell By creating the table, I can see the relative weight assign to each cell. Is this what I want?

Alignment Analysis Checking content validity in existing tests These steps are parallel to building your own good test and the table of specifications construction. There are some things to watch for and consider as you do this: Be wary of using the summary outline provided by the test maker; examine the actual test items Match items on test with content you are teaching; watch for mismatches Items on the test you are not teaching Content you are teaching that is not tested This matching requires considerable judgment The test does not have to cover every detail; it could be a representative sample If stakes are high, use a panel of individuals

Opportunity to Learn But was it taught . . . An emerging idea related to content validity is a concern called instructional validity. This relates to your behavior as teacher. The content may be in the book; the content may be in the state standards . . . BUT . . . did you actually teach it? Some teachers skip items of instruction they don’t like, don’t understand or don’t have time for. If related items appear on a test, this would reduce the validity of the test since the students had no opportunity to learn the knowledge or skill being assessed.

C. Criterion-Related Validity While the term “test” is used, also think “measure” or “procedure” The basic idea – to demonstrate the degree of accuracy of a test by comparing it with another “test, measure or procedure which has been demonstrated to be valid” (i.e. a valued criterion). Two general contexts predictive validity - one measure is now one is later. The later test is known to be valid. This approach allows me to show my current test is valid by comparing it to a future valid test. For example, a behind-the-wheel driving test has been shown to be an accurate test of driving skills. By comparing the scores on a written rules-of-the-road test with the scores from the driving test, the written test can be validated by using a criterion related strategy. concurrent validity – both measures are current. This approach allows me to show my test is valid by comparing it with an already valid test. I can do this if I can show my test varies directly with a measure of the same construct or indirectly with a measure of an opposite construct. The computed statistic in both cases is “r” (which we now call a validity coefficient) and it has all the characteristics we have already discussed about correlations coefficients in general.

Special Considerations for Interpreting Criterion-Related Validity Group Variability Greater the variability, the greater the “r”. Reliability-Validity Relationship Reliability limits validity; reliability is a prerequisite to validity Validity of the Criterion How good is the criterion? Do you agree with the operational definition of the critierion?

D. Construct Validity When we ask about a test’s construct validity, we are taking a broad view of the test. Does the test adequately measure the underlying, unobserved construct?  The question is asked both in terms of convergent validity, are test scores related to behaviors and tests that it should be related to and divergent validity, are test scores unrelated to behaviors and tests that it should be unrelated to? There is no single measure of construct validity.  Construct validity is based on the accumulation of knowledge about the test and its relationship to other tests and behaviors. To establish construct validity, we demonstrate that the measure changes in a logical way when other conditions change.

E. Consequential Validity Recent controversial entry into assessment lexicon . . . Some professionals feel that, in the real world, the consequences that follow from the use of assessments are important indications of validity. Some professionals feel that these consequences are matters of politics and policymaking; important considerations, yes, but not matters of validity. On which side are we? As educators, we sometimes see the consequences as more important than the technical validity of the test. Judgments based on assessments we give and use have value implications and social consequences. What is the intended use of these test scores? How are the scores really being used? Does this testing lead to educational benefits? Are there negative spin-offs?

Test Fairness, Test Bias Test fairness / test bias have the same meaning with opposite connotations Fairness – an assessment or test measures a trait, construct, or target with equal validity for different groups. Bias – the groups do not differ in terms of real status on the trait, construct, or target being assessed; yet, the test suggests they do.

Methods of Reviewing Fairness Test Companies : (look in test manual to see what a particular company did about test fairness issues on this test) Panel review - most “popular” but is this just face validity? Differential item functioning (DIF) - subsets Criterion-related validity – whole test Teacher –Created Assessments : (teachers need to be knowledgeable about, and sensitive to, issues of test fairness) Is there anything about my test that will unfairly advantage or disadvantage a student or group of students? Is there anything about the mechanics of the test that calls for skills other than those I intend to measure?

Practical Advice For building your own tests, think content validity. For judging externally prepared achievement test, start with a clear definition of what’s to be covered. For criterion-related validity, take into account group variability; and think about validity of the criterion. For test fairness (bias), distinguish between differences in groups’ average scores and group status on the trait. For your own assessments, try to eliminate the influence of any factors not related to what you want to measure.

Terms Concepts to Review and Study on Your Own (1) alignment analysis Bloom’s taxonomy concurrent validity consequential validity construct construct irrelevant variance construct underrepresentation construct validity content validity criterion-related validity

Terms Concepts to Review and Study on Your Own (2) differential item functioning (DIF) external criterion face validity Fairness (or its opposite, bias) instructional validity opportunity to learn predictive validity table of specifications (two-way table) validity validity coefficient