1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Slides:

Advertisements

Similar presentations

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

Advertisements

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Consistency in testing

Topics: Quality of Measurements

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

The Department of Psychology

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Lesson Six Reliability.

1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 授課老師 : 蔡章仁.

Reliability and Validity of Research Instruments

Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Reliability n Consistent n Dependable n Replicable n Stable.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 8 Using Survey Research.

A quick introduction to the analysis of questionnaire data John Richardson.

Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.

7-2 Estimating a Population Proportion

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.

Research Methods in MIS

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Technical Issues Two concerns Validity Reliability

Validity and Reliability

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

SELECTION OF MEASUREMENT INSTRUMENTS Ê Administer a standardized instrument Ë Administer a self developed instrument Ì Record naturally available data.

Reliability Lesson Six

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Technical Adequacy Session One Part Three.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

+ Old Reliable Testing accurately for thousands of years.

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Tests and Measurements Intersession 2006.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

Reliability n Consistent n Dependable n Replicable n Stable.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Chapter 6 - Standardized Measurement and Assessment

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Lesson 5.1 Evaluation of the measurement instrument: reliability I.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Measurement and Scaling Concepts

1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.

RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS

Classical Test Theory Margaret Wu.

Reliability & Validity

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

Using statistics to evaluate your test Gerard Seinhorst

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

1 LANGUAE TEST RELIABILITY

2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different time, test form, raters, and other characteristics of the measurement context.

3 Con. Test reliability is related to high variance of the true score distribution. (person separability) reliability is a measure of accuracy, consistency, dependability or fairness of scores resulting from administration of the particular examination.

4 The Measurement Model Observed score= True score + Error score X = T + E Observed score: a score that a test taker actually received on a test. (Raw or Obtained score). True Score: as there is always some error in any measurement an individual true score on a test would be his observed score minus some error. T = X - E

5 Standard Error of Measurement The standard error of measurement (SEM) is an estimate of error to use in interpreting an individual’s test score. SEM = s  1 – r) S = the standard deviation for the test r = the reliability coefficient for the test

6 Standard Error of Measurement For example, A test has a split-half reliability coefficient of.96 and a standard deviation of 15 calculate the SEM for this test.

7 Standard Error of Measurement SEM = s  ( 1 – r ) = 15  ( 1-.96) = 15 .04 = 15 x.2 = 3

8 Threat to test Reliability Sources of Error What are some of the factors that introduce error into measurement? 1) Student Factors 2) Construction of the Items 3) Test administration- 4) Scoring 5) Length, difficulty and boundary effect of the Test 6) Regulatory Fluctuation 7)Discriminability, Speediness, and Homogeneity 8)Fluctuation in Response

9 Sources of Error (1) Student Factors--Student fatigue, illness, or anxiety can induce error and lower reliability because they affect performance and keep a test from being a measure of their true ability or achievement. 2) Construction of the Items -- A major threat to reliable measurement is poorly worded or ambiguous questions or tricky questions.

10 Sources of Error 3) Test administration--Environmental factors such as heat, light, noise, confusing directions, and different testing time allowed to different students can affect students' scores.

11 Sources of Error 4) Scoring – An objective test is more reliable because the test scores reflect true differences in achievement among students and not the judgment and opinions of the scorer. subjectivity in score or mechanical errors in scoring process may introduce inconsistency in score and produce unreliable measurement, that usually occur with in or between the rater themselves.

12 Scoring A. Intra- Rater Reliability(mark/er-mark reliability) (Bachman, 1990) when an individual subjectively judges or rates the adequacy of a given sample of language performance for at least two times and gives consistent results, we say that this rating have intra- rater reliability. B. Inter-rater reliability Which refers to consistency of rating given by different raters to a sample of language performance.

13 Sources of Error (5) length, difficulty and boundary effect of the Test A- reliability is affected by number of item in the test. More items in the test make a grater range of score and grater reliability. B- A test that is either too easy or too difficult for the class taking it will typically have low reliability. This occurs because the scores will be clustered together at either the high end or the low end of the scale, with small differences among students( boundary effect).

14 Sources of Error (6) Regulatory Fluctuation –Differences in the clarity of instructions, the time of test administration, test administrator interaction with examinees, prevention of cheating behavior, and reporting of time remaining are all potential source of measurement error.

15 Sources of Error (7)Discriminability, Speediness, and Homogeneity A- Discriminability: the degree to which a test or an item of the test distinguishes among stronger and weaker test taker. Great discriminate = Great reliability

16 Sources of Error B- Speediness: Speed test: A test in which the items are easy but the time limits are so short that a few or non of the test takers can complete all the items. such a test aims to determining the speed of the testees to do certain task Power test :A test in which item difficulty generally increase gradually but ample time is given to all candidate. The aim is determine how much an individual is able to do, not how rapidly.

17 Sources of Error In power test failure to allow examinees a reasonable amount of time to complete the test will reduce the reliability. If the test becomes more difficult as a result of the element of speedness, reliability will diminish. C- Homogeneity We can increase reliability and reduce error by including items of similar format and content.(e.g split half method)

18 Sources of Error (8)Fluctuation in Response A- Response arbitrariness B- Wiseness and familiarity Response

19 Methods of Reliability Computation The choice of the method of computation of reliability will depend on such factor as Nature of threats to reliability present Ease of computation Nature of the test Testing situation

20 Methods of Reliability Computation Test-Retest Method Parallel Form Method Inter – Rater Reliability Split Half Reliability KR-20 KR-21

21 Methods of Reliability Computation 1-Test-Retest Method Refer to correlation of two sets of score for the same persons. An approach to estimating reliability in which we administer the test twice to the group of individuals and then compute the correlation between two sets of scores. R= r1,2

22 Methods of Reliability Computation Test-Retest Method disadvantage: 1-time consuming. it is difficult to arrange two testing session an preparing similar condition for the same group of examinee. 2- test effect.students may learn or memorize some question

23 Methods of Reliability Computation 2-Parallel Form Method Two tests of the same ability, and with equal length and difficulty that are administrated to the same sample of persons. disadvantage: constructing two parallel forms of a test is not an easy task.

24 Methods of Reliability Parallel Form Method Equated test: Any two sets of scores from different test (assuming that the same trait is being tested) that have been reduced to a common scale to facilitate comparison.

25 Methods of Reliability Parallel Form Method Random parallel tests: It has been used to described tests that have been composed of items drawn randomly from the same population of items ru = rA,B ru = the reliability coefficient rA,B = the correlation of form A with the form B of the test

26 Methods of Reliability Computation 3-Inter – Rater Reliability Estimation based on the correlation of scores between/among two or more raters who rate the same item, scale, or instrument. the actual level of reliability will depend on number of raters or judges. the more rater present in the determination of the mark, the more reliable will be the mark.

27 Intra- Rater Reliability(mark/er-mark reliability) (Bachman, 1990) when an individual subjectively judges or rates the adequacy of a given sample of language performance for at least two times and gives consistent results, we say that this rating have intra- rater reliability.

28 Methods of Reliability Inter – Rater Reliability there are two steps in the estimation of inter-rater reliability: 1-an average of all correlation coefficients 2-Spearman Brown Prophecy Formula

29 Methods of Reliability Computation 4-Split Half Reliability Obtained from a single administration by dividing the tests into two comparable halves and comparing the resulting scores for each individual (split into odds and evens). an approach to estimating the internal consistency of a test.

30 Methods of Reliability Split Half Reliability disadvantage: a- reliability can be change according to the manner in which the test is divided. (split into odds and evens) b- homogeneous item. Because assuming the equality between the two halves is not always the safe assumption.( different subsection, in a test e.g. grammar, vocab, reading,…will change test homogeneity and thus reduce the test score reliability )

31 Methods of Reliability Split Half Reliability advantage: it is more practical than other. Because: 1-no need to administer the same test twice. 2-not necessary to develop two parallel forms of the same test. 3-single administration will be enough.

32 Methods of Reliability Split Half Reliability Spearman Brown Prophecy Formula. e.g. if the reliability coefficient of half of the test is computed to be 0.80.what would be the reliability of the total test?

33 Methods of Reliability Split Half Reliability it should be logically clear that the reliability of the total test will always be higher than the reliability of half of the test.

34 Methods of Reliability Computation 5-KR-20 Kuder-Richardson Formula 20 Permit us to arrive at the same final estimate of reliability without having to compute reliability estimates for every possible split half combination.

35 Kuder-Richardson Formula 20 It is based on number of item on the test = n or K difficulty of the individual items variance of the total test score = V

36 Methods of Reliability Computation 6-KR-21 Kuder-Richardson Formula 21 is a formula that is easier to use but less accurate than KR 20. This formula is based on the assumption that all item in the test are designs to measure a single trait.

37 Kuder-Richardson Formula 21 This formula, known as KR-21

38 Kuder-Richardson Formula 21 e.g. Suppose we gave a 50-item test and the mean score was 43 and the variance was 25 Putting these values into KR-21. K= 50 X= 43 V= 25

39 Kuder-Richardson Formula 21 Solving for r obtains: r= (1.02) (0.76) = 0.78 the reliability coefficient is greater than 0.70, so we can use this test with some degree of confidence.

40 Correction for Attenuation Henning,1987 A way of holding reliability constant when making comparison among correlation coefficient. It is made by dividing the correlation coefficient by the square root of the cross-product of reliability.

41 Correction for Attenuation E.g If a test of composition writing correlated 0.55 with the test of grammar usage, disattenuate this correlation, assuming that KR20 reliabilities of the tests were 0.70 for composition writing and 0.80 for grammar usage.

42 Correction for Attenuation