Ydy itemau wedi eu cyfieithu yn perfformio yn debyg?

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Brief introduction on Logistic Regression
Item Response Theory in Health Measurement
HONG KONG EXAMINATIONS AND ASSESSMENT AUTHORITY PROPOSED HKDSE ENGLISH LANGUAGE ASSESSMENT FRAMEWORK.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
BA 555 Practical Business Analysis
The reform of A level qualifications in the sciences Dennis Opposs SCORE seminar on grading of practical work in A level sciences, 17 October 2014, London.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
You got WHAT on that test? Using SAS PROC LOGISTIC and ODS to identify ethnic group Differential Item Functioning (DIF) in professional certification exam.
Collecting Quantitative Data
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Living arrangements, health and well-being: A European Perspective UPTAP Meeting 21 st March 2007 Harriet Young and Emily Grundy London School of Hygiene.
WJEC GCSE Geography A Autumn CPD Today’s agenda 9:00 – 9:30 Arrival and coffee 9:30 – 10:00 Welcome by WJEC officer 10:00 – 12:15 (includes coffee.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Sicrhau Dyfodol Dwyieithog i Bawb Securing a Bilingual Future for All.
How to Conduct a Meta-Analysis Arindam Basu MD MPH About the Author Required Browsing.
State Board of Education Achievement and Graduation Requirements Committee January 11, 2016.
Using statistics in the analysis of quantitative data A good way to use this material for detailed study is to print the whole file then to run the slide.
Reformed English and Maths GCSEs AOSEC 18 November 2015 Phil Beach CBE Director Strategic Relationships.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
GS/PPAL Section N Research Methods and Information Systems
Regression Must have Interval or Ratio scaled measures
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Correlation analysis is undertaken to define the strength an direction of a linear relationship between two variables Two measurements are use to assess.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
CCEA Support Events GCSE Science – Double Award Revised Specification
Multiple Regression Prof. Andy Field.
English Hub School networks GCSE English Language
CHAPTER 7 Linear Correlation & Regression Methods
Lecture Slides Elementary Statistics Twelfth Edition
Integrating administrative data – the 2021 Census and beyond
Inference for Regression
Chapter 11: Simple Linear Regression
How are question papers created and assessed?
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Introduction to the Validation Phase
Understanding Results
AN INTRODUCTION TO EDUCATIONAL RESEARCH.
Next-Generation MCAS: Update and review of standard setting
Reliability & Validity
Introduction to Assessment and Monitoring
Understanding Standards Event Higher Statistics Award
Chapter 14: Correlation and Regression
Elementary Statistics
III Choosing the Right Method Chapter 10 Assessing Via Tests
Edexcel: Large Data Set Activities
Lecture Slides Elementary Statistics Thirteenth Edition
Statistical Methods For Engineers
His Name Shall Be Revered …
Second Language Vocabulary Acquisition
Defence Requirements Authority for Culture and Language (DRACL)
DIF detection using OLR
Inferential Statistics
15.1 The Role of Statistics in the Research Process
AICT5 – eProject Project Planning for ICT
Evaluating Multi-item Scales
Quantitative design: Ungraded review questions
Item Response Theory Applications in Health Ron D. Hays, Discussant
Collaboration In Research
STEPS Site Report.
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Presentation transcript:

Ydy itemau wedi eu cyfieithu yn perfformio yn debyg? Do translated items perform the same way? Profiad asesu mewn gwlad ddwyieithog The experience of assessment in a bilingual country. Mark Hogan, Statistician November 2017

Version Control Version Author Date Notes 1 Mark Hogan 01 Sep 17 First Draft. 2 06 Sep 17 Included Su17 WJEC GCE AS Economics example. 3 13 Sep 17 Amended following Translation feedback. 4 04 Oct 17 Included Evans relationship FF plot and MD results for Su17 GCE AS Economics example. 5 12 Oct 17 Corporate formatting. 6 16 Oct 17 Updated following comments from Translation. 7 18 Oct 17 Updated following comments from RH, GP and EC. 8 19 Oct 17 Updated following AE comments. 9 20 Oct 17 Minor improvements. 10 27 Oct 17 Minor edits. 11 30 Oct 17 Edits following GP feedback.

Cefndir :: Background (1) Poblogaeth 3.1 miliwn (2) 19% yn siarad Cymraeg (2) ymgeiswyr yn cael dewis sefyll trwy gyfrwng y Gymraeg neu’r Saesneg Mae tua 14% o ymgeiswyr sefyll trwy cyfrwng Cymraeg. CBAC yw prif fwrdd arholi Cymru tua 30,000 ymgewiswyr TGAU tegwch – dim DIF rhwng cyfrwng iaith 3.1 million population (2). 19 % can speak Welsh (2). Candidates may chose to sit exam in medium of either English or Welsh. Approximately 14 % of candidates sit Welsh medium paper. WJEC is Wales’ main Examination Board. Circa 30,000 GCSE candidates. Fairness – no DIF between language medium. (1)

Exam Paper Translation - Current Welsh Speaking Subject Expert Review Question Paper Evaluation Committee Translation Editors Principal Examiner Reports (including comments on Welsh paper) Awarding Exam

Exam Paper Translation - Proposed Welsh Speaking Subject Expert Review Question Paper Evaluation Committee Translation Editors + Tier linking + Gender + Age groups +…. DIF Analysis Principal Examiner Reports (including comments on Welsh paper) Awarding Exam

Research Method Long list of methods Short list of methods Simulation study Recommendation for Quantitative DIF Analysis Choose DIF statistic Adjust DIF statistic and charts Summer 2016 Pilot Summer 2017 Pilot

DIF Method Short List (building on Wiberg, 2007 (3)) Method Polytomous Items Measure DIF Test DIF Uniform MD  SMD Mantel-Haenszel Logistic Regression IRT – Exact Unsigned Area Simple Unsigned Area X Simple Signed Area IRT – Likelihood Ratio Log Linear Modelling Graphical Item Response Function Chi-square Methods IRT – Lord’s Chi-Square

DIF Method MD Lowest false positive rate Simulation Study Method Results MD Lowest false positive rate Good detection above DIF of 0.5 marks SMD High false positive rate Mantel-Haenszel Logistic Regression IRT – Exact Unsigned Area Not tested due to requirement of large sample sizes and software limitations.

Pilot – Summer 2016 GCSE - Outputs Penfield and Camili strategy (4) Establish stratifying variable Determine reliability of the stratifying variable Measure between-group differences in target trait distribution Compute DIF statistics Conduct content analysis of items flagged as having DIF Remove large DIF items and recompute DIF statistics Boxplot and entry sizes Item v Rest Score Mean interitem correlation, item rest correlation, item test correlation, reliability Ability distributions Facility Factor scatterplot (similar to Angoff’s delta plot (5)) Item rest mean with CI Standardized Mean Difference with p-value

Pilot – Summer 2016 GCSE - Feedback Subject Officers Welsh paper, bilingual or English responses. Flexible English paper provision. Target age group, language maturation (RS birth control, euthanasia) Translators Welsh speaking subject expert checks for majority Difficult Welsh word -> Include English in brackets Include number of centres and centre details Extensive work over the last 10 years National effort to standardise terminology (Y Termiadur Addysg) Improved Welsh medium Teacher training FF plot most useful High false positive rate Fine recording of item and sub-item marks

Pilot – Summer 2017 GCSE Switched to Mean Difference (MD) Used facility factor plot with confidence intervals (derived from log-log relationship, allows for non-uniform DIF)

Example Question Response Option 1 Response Option 2 Response Option 3 Cwestiwn Opsiwn Ymateb 1 Opsiwn Ymateb 2 Opsiwn Ymateb 3 Opsiwn Ymateb 4 Opsiwn Ymateb 5

Example Welsh Medium Entries School English Welsh A 6 B 9 C 20 D 22

Example Item Reliability Statistics Q1 Q2 Q3 Q4 Q5 Mean Inter-Item Correlation 0.36 0.41 0.35 0.32 Item Rest Correlation 0.50 0.38 0.52 0.61 0.51 Item Test Correlation 0.70 0.71 0.77 Alpha 0.69 0.74 0.65

Example

Example

An aside – FF relationship FF plots often assume linear relationship. Analysis suggested non-linear. Mathematical derivation of expected relationship assuming: Dichotomous items; One-parameter Rasch model.

An aside – FF relationship FF plots often assume linear relationship. Analysis suggested non-linear. Mathematical derivation of expected relationship assuming: Dichotomous items; One-parameter Rasch model. Result Reduces false positives at extremes and middle. Second order approximation not successful. Log-log regression worked well. Use orthogonal regression, residuals and CIs. Allows for non-uniform DIF.

Example

Example DIF Analysis Summary Analysis Results Boxplot and Entries Welsh candidates score slightly lower. N. B. Small Welsh entry. Reliability Suspect DIF item has similar correlations. Scale is reliable. DIF analysis can be performed. Ability Distribution Similar ability distribution for candidates. DIF analysis can be performed. Facility Factor Plot No cause for concern. Suspect DIF item is within 95 % CI and very close to line of best fit. Item-Rest Plot Welsh candidates seem to achieve fewer marks (uniform DIF), however small Welsh entry provides poor statistical power. MD Welsh candidates achieve 1.4 fewer marks, strongly statistically significant (p-value < 0.01). Statistical power is 0.83. Quantitative Analysis Result Small Welsh entry weakens statistical evidence strength. Item does not have unusual DIF compared to other items in the paper and there is inconsistent statistical evidence (FF plot insignificant, MD significant). Therefore move to qualitative review but note weak and inconsistent statistical evidence. Qualitative Review Outcome Reviewed by Awarding Committee and Subject Officer and conclusion was minimal or no effect and so no adjustment was justified.

DIF Analysis – Next Step Options Qualitative review by Translators, Editors and Welsh Speaking Subject Experts. Options following Qualitative Review Option Notes Do nothing Qualitative review finds no issue with suspected DIF item and concludes the DIF analysis result was a false positive. Adjust marks of affected candidates Qualitative review finds an issue with the item. Requires careful consideration of choosing uniform or non-uniform DIF analysis. Drop item for awarding purposes. Extreme last resort. How does this affect the underlying construct, assessment objectives and candidate outcomes?

Crynodeb :: Summary Argymhelliad. Gwahaniaeth Cymedrig gydag allbynnau graffigol. Plot FF yn defnyddio perthynas bras log-log ac atchweliad orthogonal. Cofnodi diwedd marciau eitem ac is-eitem. Dadansoddiad DIF yn gam cyntaf, adolygiad ansoddol yn gam nesaf. Recommendation Mean Difference with graphical outputs. FF plot uses log-log relationship approximation and orthogonal regression. Finer recording of item and sub-item marks. DIF analysis only first step, next step is qualitative review.

Gwaith i’r dyfodol :: Future Work Adborth QPEC a Chyfieithu (positifau ffug) ymholiadau (negatifau ffug) Meini prawf arwyddocaol / adolygiad ansoddol Ystadegyn DIF Anffurfiol? Sut i ddelio ag eitemau dewisol? Goblygiadau ar gyfer DIF ar gyfer rhyw, oedran, ac ati. Diddordeb rheoleiddiol mewn DIF yn ôl gallu. Prawf Gwahaniaethol / Swyddogaeth Cymhwyster. Feedback QPEC and Translation (false positives) enquires (false negatives) Significance/Qualitative review criteria Non-Uniform DIF statistic? How to deal with optional items? Implications for DIF for gender, age, etc. Regulatory interest in DIF by ability. Differential Test/Qualification Functioning.

References Wikipedia,2017. Wales in the UK and Europe. [online] Available at: https://commons.wikimedia.org/w/index.php?curid=18497747 [Accessed 20 October 2017]. Office for National Statistics, 2012. 2011 Census: Key Statistics for Wales, March 2011. [online] Available at <https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/bulletins/2011censuskeystatisticsforwales/2012-12-11#proficiency-in-welsh> [Accessed 20 October 2017]. Wiberg, M., 2007. Measuring and Detecting Differential Item Functioning in Criterion-Referenced Licensing Test, A Theoretic Comparison of Methods. [pdf] Umea University. Available at: <http://www.edusci.umu.se/digitalAssets/59/59534_em-no-60.pdf> [Accessed 12 October 2017]. Penfield, R. D., Camilli, G., 2007, Differential Item Functioning and Item Bias. In: Rao et al, 2007, Handbook of Statistics, vol 20,pp. 125-167. Magis, D., Facon, B., 2012, Angoff’s delta method revisited: Improving DIF detection under small samples, British Journal of Mathematical and Statistical Psychology, vol 65, issue 2,