Lies, Damned Lies & Statistical Analysis for Language Testing

Slides:

Advertisements

Similar presentations

Advertisements

Test Development.

FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.

How to teach heterogeneous groups

DEPT, FERRARI AND MENDELOVITS: HOW TO ANALYZE AND EXPLOIT FIELD TEST RESULTS WITH A VIEW TO MAXIMIZING CROSS-LANGUAGE COMPARABILITY OF MAIN SURVEY DATA.

Rebecca Sleeper July  Statistical  Analysis of test taker performance on specific exam items  Qualitative  Evaluation of adherence to optimal.

By taking the PSAT and the PLAN, you have already taken your first steps toward college. Both tests show you the kinds of reading, math and writing skills.

Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

Dr. Majed Wadi MBChB, MSc Med Edu

Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)

Lesson Seven Item Analysis. Contents Item Analysis Item Analysis Item difficulty (item facility) Item difficulty (item facility) Item difficulty Item.

Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.

Lesson Nine Item Analysis.

Perceptions of the Role of Feedback in Supporting 1 st Yr Learning Jon Scott, Ruth Bevan, Jo Badge & Alan Cann School of Biological Sciences.

Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.

Techniques to improve test items and instruction

NRTs and CRTs Group members: Camila, Ariel, Annie, William.

Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –

Grading and Analysis Report For Clinical Portfolio 1.

Assessment and Testing

Introduction to Item Analysis Objectives: To begin to understand how to identify items that should be improved or eliminated.

Item Response Theory in Health Measurement

Tests and Measurements

Stages of Test Development By Lily Novita

Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.

Psychometrics: Exam Analysis David Hope

ACCESS for ELLs Score Changes

Information for Parents Statutory Assessment Arrangements

Information for Parents Key Stage 3 Statutory Assessment Arrangements

Using Data to Drive Decision Making:

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Assessing Young Learners

Information for Parents Statutory Assessment Arrangements

An introduction to Research Methods

ARDHIAN SUSENO CHOIRUL RISA PRADANA P.

Test Based on Response There are two kinds of tests based on response. They are subjective test and objective test. 1. Subjective Test Subjective test.

Data Analysis and Standard Setting

Introduction to the Validation Phase

Classroom Analytics.

Item Analysis: Classical and Beyond

Interpreting Science and Social Studies Assessment Results

Survey What? It's a way of asking group or community members what they see as the most important needs of that group or community is. The results of the.

More about Tests and Intervals

Calculating Reliability of Quantitative Measures

Classroom Assessment Ways to improve tests.

Dept. of Community Medicine, PDU Government Medical College,

Challenges of Piloting Test Items

Chapter 8: Estimating with Confidence

Analyzing test data using Excel Gerard Seinhorst

What to do with your data?

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Item Analysis: Classical and Beyond

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Item Analysis: Classical and Beyond

Tests are given for 4 primary reasons.

Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.

Constructing a Test We now know what makes a good question:

Presentation transcript:

Lies, Damned Lies & Statistical Analysis for Language Testing Stephen Walker UECA Assessment Symposium, Saturday, 14 July 2018

Hands up if you know what these mean? Dichotomous Vs Polytomous items P values Point-biserial correlations CTT Vs IRT

1 2 3 4 Presentation Aims Why do you need to do statistical analysis? How do you actually do it? 3 What information do you get? 4 How do you use the results?

Why do we need to do statistical analysis?

An Art and a Science “…good test developers and creative item writers are probably born rather than trained.” Charles Alderson

Statistical Analysis is… an absolutely essential, but often the most misunderstood step in developing a defensible test…

Numbers … - reveal how well items & tests work, or don’t work, and lead to an understanding of why provide feedback to test designers & item writers; as teachers we know the value of feedback to learning - are to applied statistics what language is to applied linguistics - help to make the results of tests meaningful and useful to test users

How do you actually do it?

Prepare the Data

Get yourself a Matrix Not that kind of Matrix

This kind of Matrix! Student ID Item 1 Item 2 Item 3 Item 4 Item 5 12345678 C D A B E F 98754321 11111111 22222222 G 33333333 44444444 55555555 66666666 77777777 88888888 99999999 10101010

A Control File Contains the answers Tells the software what to do Looks something like this

Get some software

What information can we get from different analyses?

P value P value = item difficulty = item facility = item easiness - the probability that examinees will get an item correct - to calculate P value, count the number of test takers who got it right and divide be the total number of test takers - the result is a proportion, like a percentage but on a 0-1 scale rather than 0-100

P value Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Ann   Tony Jim Ruth Hong P Value 0.0 0.2 0.4 0.6 0.8 1.0 Everyone got Item 6 right. It’s very easy for these test takers. It’s P Value is 1.0 5÷5=1 This approach to calculating difficulty is sample-dependent. If we had a different sample of people, the statistics could be quite different. Only 1 person got this right. It’s difficult for these test takers. It’s P Value is 0.2 1÷5=0.2

P value interpretation Range Possible Interpretation Notes 0.0-0.3 Too difficult Your item might be mis-keyed or have other issues so need to be checked 0.3-0.7 Difficult to moderately difficult Test takers are finding items in this range challenging 0.7-0.9 Moderately easy Most test takers are getting these items correct 0.9-1.0 Too easy These items are too easy to provide much info on examinees, and can be detrimental to reliability.

Rpbis - point-biserial correlation Measures how well items differentiate between high and low ability test takers Ranges from -1.0 to 1.0 Items which discriminate well have higher Rpbis values but rarely above 0.5 A negative Rpbis means high-ability test takers answer incorrectly while those of low ability answer correctly. Usually indicates that the specified answer is actually wrong! 0.0-0.1 no to little discrimination (noise) Rpbis and P value are considered together

Rpbis value interpretation 0.20+ = Good items - higher ability test takers tend to get these items correct 0.10-0.20 = maybe OK item - review it 0.0-0.10 = Problems suggested - revise or replace <0.0 = Problematic items- replace NB: if the correct answer has a negative Rpbis and a distractor has a positive Rpbis the distractor is probably correct

Using the results within the test development cycle?

UQ-ICTE Reading & Listening Test Development Cycle

Pre-test Review Meeting Item writer team should be involved Use common wrong answers, item analysis results, pilot-test, script for listening tests, and the answer key and meet somewhere to discuss

Don’t forget to show your examples here Stephen!

Decisions made in Pre-test review Which items should be cut because they are too easy or too hard for these learners? Which items should be re-written? Which distractors are not tempting or too tempting because they are actually correct (double keys)? Are test takers lost?

I hope this presentation encourages you to: - use statistics as a tool to help you understand your own tests - produce better tests with evidence to support any claims made - explain to others why piloting & statistical analysis are an essential part of reliable test development - do the analysis yourself along with those involved in the test development cycle

Thank you Stephen Walker, Academic Manager: Assessment E: s.walker@icte.uq.edu.au T: (07) 3346 6770