Technical Issues Two concerns Validity Reliability

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Topics: Quality of Measurements
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Increasing your confidence that you really found what you think you found. Reliability and Validity.
VALIDITY AND RELIABILITY
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
MEQ Analysis. Outline Validity Validity Reliability Reliability Difficulty Index Difficulty Index Power of Discrimination Power of Discrimination.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Reliability and Validity of Research Instruments
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
8-1 Chapter Eight MEASUREMENT. 8-2 Measurement Selecting observable empirical events Using numbers (0, 1, #, %) or symbols (M, F, etc.) to represent aspects.
Research Methods in MIS
Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.
Classroom Assessment A Practical Guide for Educators by Craig A
Standardized Test Scores Common Representations for Parents and Students.
Chapter 5 Selecting Measuring Instruments Gay, Mills, and Airasian
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Measurement and Data Quality
Validity and Reliability
Reliability and Validity what is measured and how well.
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
SELECTION OF MEASUREMENT INSTRUMENTS Ê Administer a standardized instrument Ë Administer a self developed instrument Ì Record naturally available data.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Validity and Reliability THESIS. Validity u Construct Validity u Content Validity u Criterion-related Validity u Face Validity.
EDU 8603 Day 6. What do the following numbers mean?
Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.
Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.
Chapter 6 - Standardized Measurement and Assessment
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Educational Research Chapter 5 Selecting Measuring Instruments Gay and Airasian.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Survey Methodology Reliability and Validity
Reliability and Validity
Lecture 5 Validity and Reliability
Concept of Test Validity
Selecting Employees – Validation
Assessment Theory and Models Part II
Validity and Reliability
Journalism 614: Reliability and Validity
Reliability & Validity
Chapter 6: Selecting Measurement Instruments
Part Two THE DESIGN OF RESEARCH
مركز مطالعات و توسعه آموزش دانشگاه علوم پزشكي كرمان
PSY 614 Instructor: Emily Bullock, Ph.D.
Chapter 4 Characteristics of a Good Test
The first test of validity
TEST CHARACTERISTICS VALIDITY TYPES OF VALIDITY 1- CONTENT VALIDITY
Chapter 8 VALIDITY AND RELIABILITY
Qualities of a good data gathering procedures
Presentation transcript:

Technical Issues Two concerns Validity Reliability Let’s turn our attention to the technical issues related to measurement. Two very, very important concerns are the validity and reliability of the instruments being used.

Data Collection – Quiz 1 Answer the five questions on Quiz 1. Before we begin our discussion I’d like to take a few moments to work on an exercise. Use the link on this slide to access Quiz 1. Take this quiz assuming I’m going to use your score as a grade in the class. You can pause the slide show if needed. PAUSE

Data Collection – Quiz 1 Answers Score you paper using the following key A B The answers to Questions 1-5 are A, B, A, B, and B. Score your paper and remember the number of items answered correctly.

Data Collection – Quiz 1 How well did you do? Should I use this score as a part of your grade? Does this score indicate your level as a graduate student? “What we have here is a serious lack of communication!” Most students object strongly to using this score as a part of their grade because it isn’t fair. Most students object strongly to being labeled “bright” or “challenged” on the basis of their grade. Their reasoning is the test isn’t fair – it doesn’t cover material relevant to this course. Welcome to the technical world of instrumentation. How well did you do? If you did well, would you mind if I used your score as a part of your grade for EDF 800? If you didn’t do well, and most students do not, would you mind if I used your score as a part of your grade for EDF 800? If you did well, can I conclude you are exceptionally bright? If you didn’t do well, can I conclude you are quite challenged academically? Most everyone objects to using their score from this quiz – good or bad – for any purpose because it isn’t fair. The test simply doesn’t cover material appropriate to an introductory educational research course. We’ve studied absolutely none of the content on this quiz; expecting you to know it just isn’t right. Welcome to the technical world of instrumentation.

Technical Issues Validity – extent to which interpretations made from a test score are appropriate Characteristics The most important technical characteristic Situation specific Does not refer to the instrument but to the interpretations of scores on the instrument Best thought of in terms of degree The formal definition of validity is written on this slide. If you think for a moment, the definition makes a lot of sense. When you give a test to the students in your class, you use the scores to make some decisions about each student. If one student had a very high score, you usually “infer” this is a “good” student. If another student had a very low score you could “infer” this student was having serious difficulties mastering this material. The question ultimately comes down to whether or not such inferences or decisions you make are appropriate, meaningful, or useful. The answer depends on two characteristics of the test.

Technical Issues Validity (continued) Four types Content – to what extent does the test measure what it is supposed to measure Item validity Sampling validity Determined by expert judgment If your test covered appropriate content for the instruction provided to students, then the extent to which your inferences are appropriate, meaningful, or useful is high. If, like the quiz I gave you, the content is not relevant to the instruction, your inferences are not appropriate, meaningful, or useful to anyone. This is known as content validity and is a fundamental characteristic of any test. Please note that whether a test has evidence of content validity or not, nothing stops someone from using the scores to make decisions. Anyone ever taken an exam where the professor wrote items that had nothing to do with what was being taught? Did he or she still use your scores in your grades? Was that fair? Appropriate? Meaningful? Useful? I need to caution you about the situation specific nature of validity evidence. The quiz you took earlier was not content valid for this course, but it was taken off of a History of Education exam where every question was appropriate to the instruction. In our case the test was not content valid; in the case of another course it is 100% content valid.

Technical Issues Validity (continued) Construct – the extent to which a test measures the construct it represents Underlying difficulty defining constructs Estimated in many ways Criterion-related Predictive – to what extent does the test predict a future performance Concurrent - to what extent does the test predict a performance measured at the same time Estimated by correlations between two tests Sometimes the purpose of a test is not to measure specific concrete content like that we are studying. Often what is being measured is very nebulous or abstract in nature. How would you measure my intelligence? Probably with an intelligence test, but would the test be developed around Binet’s conception of intelligence as verbal and mathematical reasoning or Gardner’s 8 or 9 – I forget the number - multiple intelligences? Obviously the “tests” would look very, very different based on the manner by which the researcher interprets the “construct” of intelligence. While closely related to content validity in that we worry about whether the test “measures what it is supposed to measure” construct validity is difficult to estimate. If a test has sufficient evidence to suggest it measures intelligence, my score on that test and your use of it is reasonable. If not, any decision you make on the basis of that score is not appropriate, meaningful, or useful. Many times we find ourselves using test scores to predict a student’s performance on some later task. The ACT, SAT, GRE, or MCAT are good examples of such tests. Scores on the ACT or SAT are supposed to predict a student’s performance in their freshman year in college. Do they do so well? If so, we can make some decisions about whether or not to admit a student to a university; if not such a decision is not appropriate, meaningful, or useful.

Technical Issues Validity (continued) Factors affecting validity Consequential – to what extent are the consequences that occur from the test harmful Estimated by empirical and expert judgment Factors affecting validity Unclear test directions Confusing and ambiguous test items Vocabulary that is too difficult for test takers Consequential validity is a relatively new way to think about validity evidence. As the definition implies, we are interested in the consequences of testing that might prove to be particularly disconcerting for some students. For example, the Louisiana Department of Education mandates that all special needs students take the LEAP test that corresponds to the grade in which they are enrolled. Often this means a student is taking an exam that is well beyond their ability to read the test much less understand the content. Is this fair to that student? What about non-English speaking students? Is it fair to give them grade level tests that are completely dependent on the ability to read the English language? Welcome to the concerns related to consequential validity. There are many factors that can affect the validity of a test. Can you see how each of the three factors on this slide will have a negative effect on validity?

Technical Issues Factors affecting validity (continued) Overly difficult and complex sentence structure Inconsistent and subjective scoring Untaught items Failure to follow standardized administration procedures Cheating by the participants or someone teaching to the test items How about these factors?

Technical Issues Reliability – the degree to which a test consistently measures whatever it is measuring Characteristics Expressed as a coefficient ranging from 0 to 1 A necessary but not sufficient characteristic of a test Reliability is the second technical characteristic important to measurement. Reliability is basically the consistency with which we measure. If you took Exam 1 a first time and made a 40, a second time and made a 45, and a third time and made a 43, what score should I use to provide a reliable estimate of your knowledge of the material? There are three perspectives from which reliability is viewed: test reliability, score reliability, and agreement.

Technical Issues Test reliability Stability – consistency over time with the same instrument Test – retest Estimated by a correlation between the two administrations of the same test Equivalence – consistency with two parallel tests administered at the same time Parallel forms Estimated by a correlation between the parallel tests When speaking of test reliability, we estimate the extent to which the results of a test are likely to be the same. An estimate could be calculated using two administrations of the same test. This is know as stability or test-retest reliability. Coefficients close to 1 suggest a test that produces very consistent scores; those close to 0 suggest a lack of consistency for the test. Sometimes we don’t want to give one test twice – what a pain for the students! Besides, there is often a high chance that you’ll correct something from the first to the second administration of the test. When we develop two tests that examine the same material with different items, we are creating an opportunity to estimate reliability through equivalence or parallel forms. Comparing the scores from Form 1 of a test to those of Form 2 results in a coefficient that ranges from 0 to 1. Again the closer to 1 the more consistent the test.

Technical Issues Test reliability (continued) Internal consistency – artificially splitting the test into halves Several coefficients – split halves, KR 20, KR 21, Cronbach alpha All coefficients provide estimates ranging from 0 to 1 If one test is hard to develop, think about two! Think also about giving a second form of the test to your students! I’m sure they’d be delighted to help you out! Because of this limitation, researchers have developed an estimate of test reliability called internal consistency. In essence, we think of one test of say 100 items as two tests of 50 items each. We “split” the test into halves. The two most common estimates of internal consistency are the KR 20 and Cronbach alpha. The former is used when the items for a test are scored as right or wrong; the latter when the answers can fall on a continuous scale. An example of this is a Likert scale where a student responds to a five point scale ranging from strongly disagreeing to strongly agreeing. Regardless of which estimate is used, the coefficients always range from 0 to 1 with 1 representing greater reliability.