LG675 Session 5: Reliability II Sophia Skoufaki 15/2/2012.

Slides:

Advertisements

Similar presentations

LG675 Session 4: Reliability I Sophia Skoufaki 8/2/2012.

Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Part II Sigma Freud & Descriptive Statistics

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Chapter 4 Validity.

REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.

MSc Applied Psychology PYM403 Research Methods Validity and Reliability in Research.

Item PersonI1I2I3 A441 B 323 C 232 D 112 Item I1I2I3 A(h)110 B(h)110 C(l)011 D(l)000 Item Variance: Rank ordering of individuals. P*Q for dichotomous items.

Multiple Choice Test Item Analysis Facilitator: Sophia Scott.

Research Methods in MIS

Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.

Spearman Rho Correlation

Chapter 14 Inferential Data Analysis

MEASUREMENT AND EVALUATION

Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.

Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

Instrumentation.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

CRT Dependability Consistency for criterion- referenced decisions.

The Genetics Concept Assessment: a new concept inventory for genetics Michelle K. Smith, William B. Wood, and Jennifer K. Knight Science Education Initiative.

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Analyzing and Interpreting Quantitative Data

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

1 Chapter 24 Scale Development and Statistical Analysis Methods for Scale Data.

NRTs and CRTs Group members: Camila, Ariel, Annie, William.

Item specifications and analysis

Reliability vs. Validity.  Reliability  the consistency of your measurement, or the degree to which an instrument measures the same way each time it.

Quantitative SOTL Research Methods Krista Trinder, College of Medicine Brad Wuetherick, GMCTE October 28, 2010.

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.

VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.

Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.

Designs and Reliability Assessing Student Learning Section 4.2.

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

Chapter 6: Analyzing and Interpreting Quantitative Data

Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.

Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.

REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.

Chapter 6 - Standardized Measurement and Assessment

Part II – Chapters 6 and beyond…. Reliability, Validity, & Grading.

Chapter 7 Criterion-Referenced Measurement PoorSufficientBetter.

Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Spearman Rho Correlation

Reliability Analysis.

Multivariate Analysis - Introduction

Analyzing and Interpreting Quantitative Data

Week 3 Class Discussion.

5. Reliability and Validity

RELIABILITY IN TESTING

Spearman Rho Correlation

Reliability Analysis.

UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE

Statistics for the Behavioral Sciences

REVIEW I Reliability scraps Index of Reliability

Chapter 8 VALIDITY AND RELIABILITY

Multivariate Analysis - Introduction

Presentation transcript:

LG675 Session 5: Reliability II Sophia Skoufaki 15/2/2012

2 What is item analysis? How can we conduct item analysis for a) norm-referenced data-collection instruments?  Only statistical analyses provided through SPSS b) criterion-referenced data-collection measures? How can we examine the reliability of criterion- referenced data-collection instruments?  Work with some typical scenarios

3 Item analysis: definition  The kind of reliability analysis used to identify items in a data-collection instrument (e.g., questions in a questionnaire, tasks/questions in a language test) which do not measure the same thing as the other items.  It is conducted on data from the pilot study. The aim is to improve our data-collection instrument by removing any irrelevant items.

NB: This item analysis is different from item analysis (also called ‘analysis by items’) which is part of data analysis in experiments. This analysis is done to ensure that the findings of an experiment are generalisable not only to people with similar characteristics to those who participated in the experiment but also to items similar to those in the experiment (Clark 1973). If you plan to conduct an experiment, see Phil’s discussion of this term and SPSS how-to: s.htm#item 4

5 5 Reminder: Classification of data- collection instruments according to the basis of grading Data-collection instruments Norm- referenced Criterion- referenced

Item analysis for norm-referenced measures According to the traditional approach to item analysis, items are examined in terms of: 1. Item facility: It is a measure of how easy an item is. High facility means easy item. An easy way to assess it is by looking at the percentage of people who answer each item correctly. The data-collection instrument as a whole should have facility of 0.5 and most items should have around such a level of facility. 6

Understanding item facility  This is an activity from  Input the file ‘three_tests_IF.sav’ into SPSS.  This file shows the item facility for each question in three tests.  Examine the item facilities in each test and try to spot problematic item facilities.  Which test seems to be the best in that it contains items which will be able to distinguish among students of various proficiency levels? 7

Item analysis for norm- referenced measures (cont.) 2. Item discrimination: It is a measure of how different performance on an item in comparison to performance on the other items. It can be assessed via a correlation between the item’s score and the score of the whole measure. It can also be assessed via Cronbach’s a if item deleted. 8

SPSS: Item analysis for norm- referenced measures Do the activity described in the box on pages from Phil’s ‘Simple statistical approaches to reliability and item analysis’ handout. Then do the activity described in the box on pages Calculate also item facility as a percentage of correct answers. 9

Item analysis for criterion-referenced measures (Brown 2003) Difference Index: Item facility in the post- test – item facility in the pre-test B-Index: Item facility for students who passed the test – item facility for those who failed it 10

11 SPSS: Item analysis for criterion-referenced measures This is an activity from Brown (2003). He used excel to calculate DI and B-I on two data sets. Download this article as a pdf file from Input the data from page 20 in SPSS. Calculate DI via Transform…Compute.

12 Reliability of criterion-referenced measures There are two basic approaches: 1. Threshold loss agreement This approach examines the proportion of people who consistently did better than the cut-off point (‘masters’) and the proportion of those who consistently did worse (‘non- masters’). It uses a test-retest method. Example statistic: Cohen’s Kappa (AKA ‘kappa coefficient’)

The structure of Cohen’s kappa table in this scenario (figure from Brown and Hudson 2002: 171) 13

Reliability of criterion-referenced measures (cont.) 2. Squared error loss agreement These statistical tests are like the previous ones but they also assess how consistent the degree of mastery/non-mastery is. Example: phi(lamda) dependability index (Not available in SPSS, see Brown 2005: ) 14

SPSS: Assessing reliability of a criterion- referenced measure through Cohen’s Kappa Go to page 172 at C&pg=PA169&source=gbs_toc_r&cad=3#v=one page&q&f=false. Input the data in SPSS. Conduct the Kappa test. 15

Next week Statistics for validity assessment ANOVA with one independent variable 16

17 References Brown, J.D Criterion-referenced item analysis (The difference index and B-index). Shiken: JALT Testing & Evaluation SIG Newsletter 7 (3), Brown, J.D Testing in language programs: a comprehensive guide to English language assessment. New York: McGraw Hill. Clark, H.H The language-as-fixed-effect fallacy. Journal of Verbal Learning and Verbal Behavior 12, Scholfield, P Simple statistical approaches to reliability and item analysis. LG675 Handout. University of Essex.

18 Suggested readings On the statistics used for item analysis Brown, J.D Criterion-referenced item analysis (The difference index and B-index). Shiken: JALT Testing & Evaluation SIG Newsletter 7 (3), Scholfield, P Simple statistical approaches to reliability and item analysis. LG675 Handout. University of Essex. (pp ) On the statistics used to assess the reliability of criterion-referenced measures Brown, J.D Testing in language programs: a comprehensive guide to English language assessment. New York: McGraw Hill. (chapter 9) Brown, J.D. and Hudson, T Criterion-referenced language testing. Cambridge: Cambridge University Press. (chapter 5)