Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Slides:



Advertisements
Similar presentations
Topics: Quality of Measurements
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Measurement the process by which we test hypotheses and theories. assesses traits and abilities by means other than testing obtains information by comparing.
Methods for Estimating Reliability
-生醫統計期末報告- Reliability 學生 : 劉佩昀 學號 : 授課老師 : 蔡章仁.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.
Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.
Conny’s Office Hours will now be by APPOINTMENT ONLY. Please her at if you would like to meet with.
Research Methods in MIS
Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.
Classroom Assessment A Practical Guide for Educators by Craig A
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Measurement and Data Quality
Validity and Reliability
Reliability and Validity what is measured and how well.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Measurement and Questionnaire Design. Operationalizing From concepts to constructs to variables to measurable variables A measurable variable has been.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Designs and Reliability Assessing Student Learning Section 4.2.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.
Chapter 6 Norm-Referenced Reliability and Validity.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.
MGMT 588 Research Methods for Business Studies
Ch. 5 Measurement Concepts.
Lecture 5 Validity and Reliability
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Reliability & Validity
پرسشنامه کارگاه.
PSY 614 Instructor: Emily Bullock, Ph.D.
RESEARCH METHODS Lecture 18
The first test of validity
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Definition: the degree to which a test can measure accurately what it is intended to measure. (the appropriateness of a given test or any of its component parts of what it is purported to measure) Relation Between Validity & Reliability: a. For validity, reliability is a necessary but not sufficient conditions (i.e. there are other factors involved in the degrees of validity) b. A valid measurement must be a reliable one. (validity entails reliability)

The degree of validity is presented by validity coefficient (Not all types of validity are presented in coefficient): Threats to Validity: a. Invalid Application of Tests b. Inappropriate Selection of Content: (Content/Face Validity) c. Imperfect Cooperation of Examinee: (Response Validity) d. Inappropriate Referent or Norming Populating:

Types of Validity: a. Content & Face Validity: 1. Providing arguments to justify whether the contents of the tests are sufficiently representative & comprehensive to be a measure that can fully measure what is intended to measure. 2. No validity coefficients should be offered 3. How to obtain:

For Content Validity: (1) Seeking content experts’ endorsements from a panel of experts (1) Seeking content experts’ endorsements from a panel of experts (2) Design a great number of items in a variety of domains that are representative according to elaborate specifications.

For Face Validity: Seeking respondents’ feeling and feedback about whether tests look representative or comprehensive. b. Response Validity: the extent to which respondents responded in the manner expected by the test developers (Halo Effects): How to obtain: The manners to reduce halo effects including filler items & arrangements of similar (identical) items

c. Concurrent Validity: How to obtain: Compute correlation coefficients between two measures d. Construct Validity: How to obtain: 1. Providing theoretical constructs and defining them in terms of more specific operational/theoretical definitions of these constructs. (In this case, no validity coefficient can be reported)

2. Point-biserial Correlation (Internal Construct Validation): Correlation Coefficients between scores on one category and scores on entire tests (e.g., r between Section 1 & Total Scores on TOEFL) Note: For tests with items with 2 values (yes or no; 1 or O), factor analysis is not appropriate. Factor analysis (Exploratory/Confirmatory Factor Analysis) is merely suitable for Likert scales

Definition: the degree to which a test can yield very consistent, similar scores on the same respondents at different points of time and/or on different occasions. (An Ideal Situation: The consistency of scores on a given test or any of its component parts when this test is repeated to the same testees) Meaning: A reliable test means a group of respondents’ scores on a repeated (or similar but not identical) measurement are consistent.

Interpretation: A reliable test indicates that a respondent will obtain very similar scores on the test, no matter how many times this test is repeated or at what time. It also indicates that a respondent’s score on a criterion measurement will be similar on the test (that’s, if she/he scores high on the criterion measurement, the she/he will score high on the test, and vice versa.) Index of Reliability: Reliability Coefficients (a correlation coefficient), Ranging from +1 to –1.

Cutting-Off Reliability Coefficients (Lado, 1961): a. Reading/Vocabulary/Structure: b. Listening Comprehension: c. Speaking Proficiency:

Types of Reliability: a. Test-Retest Reliability Coefficient: (Think about practice effects and interval of time) b. Parallel Form Methods: (Two tests are administrated to the same groups at the same time) 1. Restrictive Equivalent Form: All items used in both tests are different but equivalent in covariance (the most restrictive method of constructing items in both tests)

(Think about if it is possible) 2. Less Restrictive Equivalent Form: All items used in both tests are quite equivalent but covariance may be present 3. Random Parallel Form: All items are randomly selected from the item banks.  Internal Consistency Coefficient (Cronbah’s a) c. Interrater Reliability: Reliability of Two (or more) Raters’ Scores on a Test

(Spearman or Pearson r); More Than Two Raters  Cronbach’s a. (Spearman or Pearson r); More Than Two Raters  Cronbach’s a. d. Split-Half Reliability: Divide the test into two halves (by odds or evens) and compute the correlation between these two halves  Internal Consistency Reliability Estimates e. Kuder-Richardson Formula 20: Suitable for binary Items (with yes/no values) by Using SPSS Cronbach a.

f. Kuder-Richardson Formula 21: Not Available in SPSS Relation between Reliability & Test Length: By adding a certain number of items, reliabilities will be increased: (Think about reliability of 5-items and 100-items)

True Score: Observed Scores + Standard Errors of Measurement (Think about your weight on a scale) a. Formula: SE = SD √ (1 – r); r = reliability coefficient b. Meaning: True scores fall within observed scores + and – 2 SE Manners to Improve Reliability: a. Enough Unambiguous Items b. Scoring Keys (Scoring Rubrics for Essay/Short Answer Questions)