Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He the@tea.ntptc.edu.tw

Definition: the degree to which a test can measure accurately what it is intended to measure. (the appropriateness of a given test or any of its component parts of what it is purported to measure) Relation Between Validity & Reliability: a. For validity, reliability is a necessary but not sufficient conditions (i.e. there are other factors involved in the degrees of validity) b. A valid measurement must be a reliable one. (validity entails reliability)

The degree of validity is presented by validity coefficient (Not all types of validity are presented in coefficient): Threats to Validity: a. Invalid Application of Tests b. Inappropriate Selection of Content: (Content/Face Validity) c. Imperfect Cooperation of Examinee: (Response Validity) d. Inappropriate Referent or Norming Populating:

Types of Validity: a. Content & Face Validity: 1. Providing arguments to justify whether the contents of the tests are sufficiently representative & comprehensive to be a measure that can fully measure what is intended to measure. 2. No validity coefficients should be offered 3. How to obtain:

For Content Validity: (1) Seeking content experts’ endorsements from a panel of experts (1) Seeking content experts’ endorsements from a panel of experts (2) Design a great number of items in a variety of domains that are representative according to elaborate specifications.

For Face Validity: Seeking respondents’ feeling and feedback about whether tests look representative or comprehensive. b. Response Validity: the extent to which respondents responded in the manner expected by the test developers (Halo Effects): How to obtain: The manners to reduce halo effects including filler items & arrangements of similar (identical) items

c. Concurrent Validity: How to obtain: Compute correlation coefficients between two measures d. Construct Validity: How to obtain: 1. Providing theoretical constructs and defining them in terms of more specific operational/theoretical definitions of these constructs. (In this case, no validity coefficient can be reported)

2. Point-biserial Correlation (Internal Construct Validation): Correlation Coefficients between scores on one category and scores on entire tests (e.g., r between Section 1 & Total Scores on TOEFL) Note: For tests with items with 2 values (yes or no; 1 or O), factor analysis is not appropriate. Factor analysis (Exploratory/Confirmatory Factor Analysis) is merely suitable for Likert scales

Definition: the degree to which a test can yield very consistent, similar scores on the same respondents at different points of time and/or on different occasions. (An Ideal Situation: The consistency of scores on a given test or any of its component parts when this test is repeated to the same testees) Meaning: A reliable test means a group of respondents’ scores on a repeated (or similar but not identical) measurement are consistent.

Interpretation: A reliable test indicates that a respondent will obtain very similar scores on the test, no matter how many times this test is repeated or at what time. It also indicates that a respondent’s score on a criterion measurement will be similar on the test (that’s, if she/he scores high on the criterion measurement, the she/he will score high on the test, and vice versa.) Index of Reliability: Reliability Coefficients (a correlation coefficient), Ranging from +1 to –1.

Cutting-Off Reliability Coefficients (Lado, 1961): a. Reading/Vocabulary/Structure:.90-.99 b. Listening Comprehension:. 80-.89 c. Speaking Proficiency:.70-.79

Types of Reliability: a. Test-Retest Reliability Coefficient: (Think about practice effects and interval of time) b. Parallel Form Methods: (Two tests are administrated to the same groups at the same time) 1. Restrictive Equivalent Form: All items used in both tests are different but equivalent in covariance (the most restrictive method of constructing items in both tests)

(Think about if it is possible) 2. Less Restrictive Equivalent Form: All items used in both tests are quite equivalent but covariance may be present 3. Random Parallel Form: All items are randomly selected from the item banks.  Internal Consistency Coefficient (Cronbah’s a) c. Interrater Reliability: Reliability of Two (or more) Raters’ Scores on a Test

(Spearman or Pearson r); More Than Two Raters  Cronbach’s a. (Spearman or Pearson r); More Than Two Raters  Cronbach’s a. d. Split-Half Reliability: Divide the test into two halves (by odds or evens) and compute the correlation between these two halves  Internal Consistency Reliability Estimates e. Kuder-Richardson Formula 20: Suitable for binary Items (with yes/no values) by Using SPSS Cronbach a.

f. Kuder-Richardson Formula 21: Not Available in SPSS Relation between Reliability & Test Length: By adding a certain number of items, reliabilities will be increased: (Think about reliability of 5-items and 100-items)

True Score: Observed Scores + Standard Errors of Measurement (Think about your weight on a scale) a. Formula: SE = SD √ (1 – r); r = reliability coefficient b. Meaning: True scores fall within observed scores + and – 2 SE Manners to Improve Reliability: a. Enough Unambiguous Items b. Scoring Keys (Scoring Rubrics for Essay/Short Answer Questions)

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Similar presentations

Presentation on theme: "Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Similar presentations

Presentation on theme: "Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He"— Presentation transcript:

Similar presentations

About project

Feedback