Validity and Reliability

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Lesson Six Reliability.
Validity and Reliability
Research Methodology Lecture No : 11 (Goodness Of Measures)
Part II Sigma Freud & Descriptive Statistics
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Reliability and Validity of Research Instruments
Reliability, Validity, Trustworthiness If a research says it must be right, then it must be right,… right??
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.
Concept of Measurement
Research Methods in MIS
Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.
Classroom Assessment A Practical Guide for Educators by Craig A
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Technical Issues Two concerns Validity Reliability
Measurement and Data Quality
Instrumentation.
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
EDU 8603 Day 6. What do the following numbers mean?
Measurement Validity.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
Assessing Student Performance Characteristics of Good Assessment Instruments (c) 2007 McGraw-Hill Higher Education. All rights reserved.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Data Collection Methods NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Reliability and Validity in Research
Classical Test Theory Margaret Wu.
Week 3 Class Discussion.
پرسشنامه کارگاه.
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Validity and Reliability

VALIDITY In scientific research validity refers to whether a study is able to scientifically answer the questions it is intended to answer. Instrument selection is important for validity because instruments are used to collect data Data are used to make inferences related to the questions Thus, the inferences about the specific uses of an instrument should also be validated.

What do we mean with validity of inferences? Our inferences should be relevant to the purpose of the study (appropriate) If we want to see what our students’ attitudes are towards learning English, there is no use in making inferences using their scores in English tests. Our inferences should be meaningful and correct We should say something about the meaning of the information we collect. E.g. What does a high score on a particular test mean?

Our inferences should be useful They should help researchers make a decision related to what they are trying to find out. E.g. If you want to see positive effects of formative assessment on student achievement, you should have information that will help you infer whether your students’ achievement is affected by formative assessment or not. Thus, validity depends on the amount and type of the evidence you have!

Kinds of evidence of validity Content-Related evidence of validity Content and format of the instrument: the degree to which an instrument logically appears to measure an intended variable How appropriate is the content? Is the format appropriate? Does it logically get at the intended variable? How adequately does the sample of items or questions represent the content to be assessed?, etc.

Two points to consider in content-related evidence i) adequacy of sampling Whether the content of the instrument has adequate sample of the domain of content it is supposed to represent E.g. If you want to see your students’ achievement at macro level, you should have enough number of items that show this skill.

ii) format of the instrument Clarity of printing, size of type, adequacy of work space, appropriateness of language, clarity of directions, etc. E.g. If you want to see students’ attitudes towards English, the questionnaire should be in their target language if their level of target language proficiency is not high enough.

How do we obtain content-related evidence of validity? Write out the definition of what you want to measure and give this definition (together with the instrument and the intended sample) to a number of judges. The judges look at the definition and place a checkmark in front of each item in the instrument that they feel does not measure the objectives. They also place a checkmark in front of each aspect in the definitions that is not assessed by the instrument. They evaluate the appropriateness of the format. Then the researcher rewrites these items. This continues until all judges approve of all items.

Example Judge No: ___________ Match to Portfolio Assessment Objectives No Match Perfect Match A RANGE 1. ability to link ideas in a variety of ways 2. ability to use wide range of genres (stories, reports, articles, etc) 3. evidence of various topics 1 2 3 4 5 B FLEXIBILITY 4. evidence of variations in the style, vocab, tone,lang., voice and ideas 5. evidence for the appropriateness of style, vocab, tone, lang. and voice C CONNECTIONS 6. evidence of applications of already-known concepts to newly- learned ones 7. evidence of new concepts and/or metaphors

General Aims of the Portfolio Assessment System 1. improving students’ writing abilities 2. improving students’ metacognitive skills 3. leading students to become autonomous language learners Specific objectives of the Portfolio Assessment System I- Helping students improve their linguistic skills in writing from the point of A) Grammar, punctuation and spelling, B) Vocabulary C) Coherence and Cohesion II- Helping students improve their metacognitive skills from the point of A) Applying and/or creating new concepts or ideas B) Using varieties in writing appropriately C) Analysing and Synthesising what they have learned/read D) Using other sources III- Helping students become autonomous language learners from the point of A) Applying their own views B) Connecting other sources with what they know

Criterion-Related Evidence Comparing performance on one instrument with performance on some other. Two forms are available: a) predictive validity: compares the scores on the original test with scores on one or more criterion measures obtained in a follow-up testing b) concurrent validity: compares the test results with results obtained through a parallel, substitute measure

On both forms, a correlation coefficient is used. Correlation coefficient (r) shows the degree of relationship that exists between the scores individuals obtain on two instruments. A positive relationship : a high (low) score on one instrument is accompanied by a high score (low) score on the other A negative relationship: a high (low) score on one instrument is accompanied by a low (high) score on the other Correlation coefficients fall somewhere between +1.00 and -1.00. An r of .00 indicates that no relationship exists.

Construct-Related Evidence Establishing a link between the underlying theoretical construct we wish to measure and the visible performance we choose to observe Construct validation consists of building a strong logical case based on circumstantial evidence that a test measures the construct it is intended to measure

Generally there are 3 steps i) the variable being measure is clearly defined ii) hypotheses, based on a theory underlying the variable, are formed about how people who possess a lot versus a little of the variable will behave in a particular situation iii) hypotheses are tested both logically and empirically

RELIABILITY The consistency of the scores obtained. Possible to have quite reliable but invalid scores (Unreliable scores can never be valid!) What is desirable is to have both high reliability and high validity.

Errors of Measurement When someone takes the same test twice, they rarely perform exactly the same, due to many factors. Such factors result in errors of measurement. Because of errors of measurement, researchers expect some variation in scores. Reliability estimates help researchers have an idea of how much variation to expect.

This estimate is another application of correlation coefficient, known as a reliability coefficient. A reliability coefficient is again a relationship, but it is between scores of the same individuals on the same instrument on two different times, or between two parts of the same instrument. There are three best ways to obtain reliability coefficient.

1. Test-Retest Method Administering the same test twice to the same group after a certain time. A reliability coefficient indicates the relationship between the two sets of scores obtained. Reliability coefficient is affected by the length of the time interval. The longer the time, the lower the reliability coefficient. The interval should be determined by the researcher considering that the individuals would retain their relative position. Most of the time 1-3 month interval is sufficient!

2. Equivalent-Forms Method Two different but equivalent (parallel) forms of an instrument are administered to the same group of individuals during the same period of time. The questions (items) are different but they sample the same content. A reliability coefficient indicates strong evidence that the two forms are measuring the same thing.

3. Internal-Consistency Methods There are several internal-consistency methods and they all require only a single administration of an instrument.

Split-half procedure Two halves of a test (odd items vs even items) is scored and a correlation coefficient is calculated for the two sets of scores. Spearman-Brown prophecy formula is used for calculation. The reliability of a test (instrument) can be increased by adding more items.

Kuder-Richardson Approaches Two formulas: KR20 and KR21 KR21 is used when all items are of equal difficulty: you need the number of items on the test, the mean, and the standard deviation KR20 is more complicated but must be used when you cannot assume that all items are of equal difficulty

Alpha Coefficient (Cronbach alpha) (α) General form of the KR20 formula Used to calculate the reliability of items that are not scored right versus wrong e.g. some essays where more than one answer is possible

Scoring Agreement When there is subjective evaluation (like essay scoring), there is the possibility of observer differences. In that case, scoring agreement should be reported. Such cases require training to obtain as high reliability as possible. The expected correlation is at least .90 correlation or 80% of agreement.

In case of subjective rating, we can talk about two kinds of reliability: Intra-rater reliability: similar to test-retest strategy. The same raters score the papers of the same group of students in two separate occasions (e.g. two weeks apart). Thus, the intra-rater reliability is an estimate of the consistency of judgments over time

Inter-rater reliability: similar to the equivalent-forms strategy since the scores are obtained from two different raters Inter-rater reliability estimates the extent to which two or more raters agree on the score that should be assigned to a written sample. A correlation coefficient is calculated between the scores. Then the obtained coefficients are adjusted by the use of Spearman-Brown Prophecy formula.