Download presentation
Presentation is loading. Please wait.
Published byHillary Lane Modified over 9 years ago
1
Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar Atanasov New Bulgarian University BILC Seminar, 10-15 October 2010-Varna
2
Outline Examination procedure Main concepts and observations Socio-cognitive test validation framework, Cyril Weir (2005) and criteria Scoring validity for listening and reading parts of the test Scoring validity for essay
3
Test structure 1. Listening paper: two tasks 15 MCQ 2. Reading paper: five tasks 6 items matching response format 10 items bank-cloze response format 10 items open-cloze response format 16 items short-answer response format 2 open-ended questions 5 MCQ 3. Essay: 180-220 words
4
Too much? The concept of communicative language ability (CEFR) The concept of test usefulness (Bachman) The concept of justifing the use of language assessment in real world (Bachman) The concept of validity The Code of practice (ALTE *, for example) * Association of Language Testers in Europe
5
Statements NBU exam is high-stake. NBU exam is criterion-oriented. NBU exam is ‘independent’. Evidences for test validation were not established, BUT there was a routine practice for test development process and test administration.
6
The Socio-cognitive Framework for test validation, Cyril Weir (2005) Test takers characteristics and: Context validity Theory-based validity Scoring validity Consequential validity Criterion-related validity
7
“ Before-the –test- event” Context validity Theory-based validity “After- the- test –event” Scoring validity Consequential validity Criterion-related validity
8
Scoring validity for listening and reading parts of the test are established by: Item analysis Internal consistency Error of measurement Marker reliability Not just looking at them! Investigate, discuss, learn and take decisions!
9
Analisis3-parameter IRT model Advantages Item parameter estimates are independent of the group of examinees used Test taker ability estimates are independent of the particular set of items used Degree of Difficulty to specify the discrimination to specify the content
12
Summer session, 2010
13
Item number Version 1 Values of difficulty Version 2 Values of difficulty Version 3 Values of difficulty Version 4 Values of difficulty 1-1,7-1,21,6-0,7 2-1,5-1,21.9-2,2 3-1,7-2,92,6-0,4 4-0,5-2,4-0,9-0,2 5-3-0,12,6-1,4 6-0.7-0,1-0,3-0,2
14
Possible decisions Remedial procedures Classroom assessment Only certification decision
15
Scoring validity for writing is established by: Criteria/rating scale Rating procedures: Rater training Standardization Rating conditions Rating Moderation Statistical analysis Raters Grading
17
Conclusion for the essay: Good Two raters Analytic writing scale Rubrics and input Negative The score depends on the raters No task specific scale No standardization
18
Now is fact that: We will continue our work for item writer’s training content and statistical specification of the items test review and test revision
19
Shearing: Investigation (small steps to “strong” validity). Comparison (language ability of the same population at the same level) Cooperation ( in research project)
20
Thank you New Bulgarian University www.nbu.bg
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.