STANAG 6001 Testing Workshop

Slides:



Advertisements
Similar presentations
How to Make a Test & Judge its Quality. Aim of the Talk Acquaint teachers with the characteristics of a good and objective test See Item Analysis techniques.
Advertisements

Standardized Scales.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
IB Diploma Program Exams – Semester Report Cards
Testing the foundations Marie Hanlon & Simon Gooch (Loughborough College)
A comparison of online and paper-based test results Sanja Čandrlić, Martina Ašenbrener Katić,
BILC Standardization Initiatives and Conference Objectives
Maths Counts Insights into Lesson Study 1. Tim Page and Joanne McBreen Transition Year or Senior Cycle Introducing Tolerance and Error (Leaving cert.
NRTs and CRTs Group members: Camila, Ariel, Annie, William.
Applied Opinion Research Training Workshop Day 3.
PARTNERS IN CRIME EST – LAT co-operation on the STANAG 6001 Piret Paju EST 2009.
Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.
Psychometrics: Exam Analysis David Hope
Program Evaluations Jennifer Glenski The Next Step Public Charter School.
Dr. Justin Bateh. Point of Estimate the value of a single sample statistics, such as the sample mean (or the average of the sample data). Confidence Interval.
Chapter 1 Assessment in Elementary and Secondary Classrooms
Double and Multiple Sampling Plan
50 Years of BILC: The Evolution of STANAG – 2016 and the first Benchmark Advisory Test Ray Clifford 24 May 2016.
Context for the experiment?
Understanding Your PSAT/NMSQT Results
Classroom Assessment A Practical Guide for Educators by Craig A
How learners learn in my teaching world…
Test Standardization: From Design to Concurrent Validation
In-Service Teacher Training
Release of PARCC Student Results
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Learning About Language Assessment. Albany: Heinle & Heinle
The SAT Suite of Assessments
Lesson #4: Short Writing Tasks
ACTION LEARNING Ian Duncan Action Learning Facilitator
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
What Is Science? Read the lesson title aloud to students.
Understanding Your PSAT/NMSQT Results
Chief of English Testing, Language Programs
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Reasoning in Psychology Using Statistics
Critical thinking as an educational ideal
TOPIC 4 STAGES OF TEST CONSTRUCTION
PILOTING CROATIAN EXPERIENCE Tamara Kramarić Maras
What Is Science? Read the lesson title aloud to students.
Using statistics to evaluate your test Gerard Seinhorst
National Testing Team Panel Discussions
Basic Statistics for Non-Mathematicians: What do statistics tell us
Understanding Your PSAT/NMSQT Results
One-Way Analysis of Variance
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Effective Presentation
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Understanding Your PSAT/NMSQT Results
Understanding Your PSAT/NMSQT Results
Challenges of Piloting Test Items
Tasks & Grades for MET3.
Tasks & Grades for MET5.
Psych 231: Research Methods in Psychology
Analyzing test data using Excel Gerard Seinhorst
Understanding Your PSAT/NMSQT Results
BiH Test Piloting Mary Jo DI BIASE.
TriaLling BAT2 Writing Prompts
Understanding Your PSAT/NMSQT Results
Bilateral cooperation on trialling - Latvian & Estonian experience
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
College and Career Readiness
Understanding Your PSAT/NMSQT Results
Tests are given for 4 primary reasons.
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
ICILS 2013 International Computer and Information Literacy Study
UNDERSTANDING YOUR PSAT/NMSQT RESULTS
Successful trialling: from trial and error to best practices
Presentation transcript:

STANAG 6001 Testing Workshop Challenges of selecting the pre-test population and analyzing results for bi-level tests Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

BACKGROUND Ideal situation for test administration (single level test for appropriate test takers) Practicality (bi-level or multi-level tests) Pre-test population should be similar to test population Challenges in selecting population Pre-test results interpretation Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Aim of the Presentation Bring the issue to the forum Offer our practice Open the floor for discussion Come up with some valuable conclusions for the community Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Outline Something about BIH STANAG Test in General Countries for Pre-testing BIH STANAG 6001 Test and Types of Cooperation Pre-test Format Pre-test Population Interpretation of Pre-test Results Test Assembly and New Test Versions Item Banking Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

BIH STANAG Test in General Levels tested (levels 2 and 3 receptive skills, levels 1, 2 and 3 productive skills) Skills pre-tested, (receptive skills on regular bases, sometimes productive skills but rarely) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Countries and Types of Cooperation Pre-testing abroad Countries (Croatia, Serbia, Macedonia, Bulgaria, Italy) Types of cooperation (bilateral cooperation, self-financing, financial support by NATO bodies, sending items via postal service, sending items via email) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Countries and Types of Cooperation Advantages of visits over sending items: Better instructions for candidates Direct contact with candidates (population level estimate, feedback from candidates) Qualitative analysis (with host nation colleagues) Productive skills standardization (with host nation colleagues) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Pre-test Format Format like on the real test Number of items Anchor items Instructions for candidates Candidates’ motivation Feedback for candidates Number of items : Pretesting conditions should resemble real test conditions, therefore pretest format should be the same. However in our case we notices that on reading some items at the end remain unanswered, therefore we concluded that we waste the last few items and there is no use of having them because motivation of pretesting population is not the same like real test population so we cut the number of L3 items from 20 to 15 and in total, instead of having 40 items in pretest like on real test we have 35 items. All other things in pretest format remain the same like on real test. Candidates’ motivation: In our case we usually invite our potential candidates for our real test, in that case pretesting other tests serves as preparation for them. On the other hand, it is a chance for us to compare the results on pretest and real test. Feedback for candidates: It is fair to let our candidates know results from pretesting. We should be careful about reporting results. Results should be reported after statistical analysis with short information in percentages on their performance not with STANAG results. In that feedback we also include information on how many items were accepted, rejected or need some improvement. Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

The most important moment in pre-testing effort Pre-test Population The most important moment in pre-testing effort Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

USING LEVELS IN TEACHING and TESTING Text, Task, and Student can be at different levels: Text Level Task Level Student Level Higher Higher Higher Same Same Same Lower Lower Lower Any combination of the above factors is possible In Testing Text and Task must be at the same level: Text Level Task Level Student Level Same Same Same Slide taken from LTS presentation: Item Development Process

Pre-test Population Number of candidates How to compensate small number of candidates? Quantitative analysis - tentative Qualitative analysis colleagues from host nation Further attention on the first appearance of pre-tested items on real test. Enough sample on real test session. Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

How to find appropriate candidates? Pre-test Population How to find appropriate candidates? Course population Volunteers (available personnel, teachers, testers….) Future candidates on our real test Future candidates on our real test: We are trying to include as many as possible. It is in their interest as well as ours and for the benefit of people who pre-test in our country. Candidates practice for their future STANAG exam, we can use their results for comparisons, Guest team gets better results, better sample…. Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Pre-test Population How to determine level of candidates? Course population External measure: STANAG if applicable ALCPT : 30-45 pre-intermediate 45-60 intermediate 60-70 upper-intermediate 70+ advanced teachers’ feedback is important Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

How to determine level of candidates (cont.)? Pre-test Population How to determine level of candidates (cont.)? Volunteers (usually have STANAG result, well acquainted with procedure but test wise) Future candidates on our real test (very motivated, interested in procedure and results, preparation for them) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Pre-test Population How should population look like Balance in levels (50% Level 2, 50% Level 3) What to do with Level 2+ What to do with Level 1+ Pre-test population should resemble test population Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

IF STANAG 6001 LEVELS WERE BUCKETS The blue arrows indicate the water (ability) observed at each level. 3 2 1 Notes: The buckets may begin filling at the same time. Some Level 2 ability will develop before Level 1 is mastered. That is ok, because the buckets will still reach their full (mastery) state sequentially. Slide taken from the presentation: Scoring Multi-level Tests, by Dr. Ray Clifford, Vilnius, September 2015 BILC/LTS

Interpretation of Pre-test Results Descriptive stats (Mean, Mode, Median, Range, Standard Deviation) Anchor Items Performance (Determine the strength of population) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Interpretation of Pre-test Results Classical Item Analysis (Facility Value, Discrimination Index, Distractor Analysis) Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Guidelines Facility values between .30 and .70 are generally considered acceptable Discrimination: .40 and up – very good items .30 - .39 - reasonably good items – subject to improvement .20 - .29 – marginal items – usually need improvement below .19 – poor items – to be rejected or improved by substantial revision We are exposing our level 2 items to level 3 candidates and vice versa therefore some FV for level 2 items higher than .70 must be accepted. At the same time, low FV for level 3 items is not a guaranty that our level 3 items are ideal since we exposed them to level 2 candidates. There are some candidates in pretesting population who are not even level 2. Because of that results from pretesting should be carefully interpreted in combination with qualitative analysis. Slide taken from LTS presentation: An Introduction to Statistical Analysis BILC/LTS

Performance of an item on different populations Listening item Level 2 Performance of an item on different populations Population Pre-testing BIH sessions MNE sessions Item Name L2 0186 Booklet number 20 18 5 11 3 38 25 31 Correct answer D A 6% 3% 1% 0% 19% 9% 10% B 8% 7% 21% 12% C 18% 13% 11% 4% 87% 73% 85% 76% 88% 89% 53% 62% 67% ? 2% WRONG 27% 15% 24% 47% 38% 33% FV DI 0,32 0,33 0,28 0,42 0,25 0,23 0,47 0,50 0,39

Performance of an item on different populations Reading Item Level 2 Performance of an item on different populations Population Pre-testing BIH sessions MNE sessions Item Name R2 0128 Booklet number 18 15 10 11 8 14 34 28 31 Correct answer A a 75% 76% 82% 72% 78% 86% 59% 57% 50% B 8% 10% 4% 14% 7% 16% 18% C 3% 11% D 13% 5% ? 0% WRONG 25% 24% 28% 22% 41% 43% FV DI 0,62 0,40 0,43 0,47 0,22 0,25 0,80 0,85 0,53

Test Assembly and New Test Versions Each session new test version Parallel forms Number of new/old items Filter for test population Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Item Banking Tracking performance of an item on different sessions (different test versions) Item banking. Updating item bank Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Conclusions Pretesting in bi-level form not ideal situation Practicality Selection of an adequate population is crucial Interpret results with caution Further attention on real test Item bank with history of performance Combination of old and new item in creating new test versions Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018

Thank you for your attention! Questions/Discussion Major Drazen Malesevic, BiH STANAG 6001 Testing Team Kranjska Gora, 04-06 Sep 2018