Challenges of Piloting Test Items

Slides:



Advertisements
Similar presentations
Wynne Harlen. What do you mean by assessment? Is there assessment when: 1. A teacher asks pupils questions to find out what ideas they have about a topic.
Advertisements

You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.
Wortham: Chapter 2 Assessing young children Why are infants and Preschoolers measured differently than older children and adults? How does the demand for.
Testing What You Teach: Eliminating the “Will this be on the final
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
WELCOME HIGH SCHOOL MATHEMATICS EDUCATORS & PRINCIPALS Day 2 Educator Effectiveness Academy Summer 2011.
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
Lesson Eight Standardized Test. Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized Test Scaling.
Formative and Summative Evaluations
Creating Effective Classroom Tests by Christine Coombe and Nancy Hubley 1.
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
BILC Standardization Initiatives and Conference Objectives
Stages of testing + Common test techniques
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?
‘Hints for Designing Effective Questionnaires ’
Business and Management Research
RELIABILITY BY DESIGN Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Iowa Collaborative Assessment Modules (ICAM) Heartland Area Education Agency.
Paulina Liżewska, Paweł Kamiński Viewpoints On Using the EPOSTL in ELT Departments.
I Can Distinguish the types of validity Distinguish the types of reliability Identify if an example is objective or subjective Copyright © Allyn & Bacon.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Validity & Practicality
CCT Certification Process 2012 NCSLI Workshop and Symposium Shawn Mason.
The Analysis of the quality of learning achievement of the students enrolled in Introduction to Programming with Visual Basic 2010 Present By Thitima Chuangchai.
RUBRICS AND CHECKLISTS KEITHA LUCAS HAMANN ASSESSMENT IN ARTS EDUCATION.
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar.
RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Validity: Introduction. Reliability and Validity Reliability Low High Validity Low High.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
Validity in Testing “Are we testing what we think we’re testing?”
Paper III Qualitative research methodology. Objective 1.2 Explain strengths and limitations of a qualitative approach to research?
Appropriate Testing Administration
Does Testing Hinder Instructional Success? Branka Petek Slovenia.
Stages of Test Development By Lily Novita
Carrying out a Survey We carry out surveys to enable us to gain more information on topics that are of particular interest to us e.g. eating habits, exercise.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Lessons Learned. Communication, Communication, Communication Collaborative effort Support of all stakeholders Teachers, Principals, Supervisors, Students,
Selection and Use of Supplementary Materials and Activities
STUDY SKILLS The Time. Managing Your Time Good time management is essential. If you are starting a course for the first time you will have to consider.
Test Validation Topics in the BILC Testing Seminars
CAEP Standard 3: Candidate Selectivity (3. 2/3
How to Write a Constructed and Visual Response for the EOC
Belgin Aydın Meral Melek Unver
Test Design & Construction
Steps for development and evaluation of an instrument - Revision
Does Testing Hinder Instructional Success?
About Market Research Making a Questionnaire
Study group 1: Ensuring the validity of tests
RELATING NATIONAL EXTERNAL EXAMINATIONS IN SLOVENIA TO THE CEFR LEVELS
Learning About Language Assessment. Albany: Heinle & Heinle
Business and Management Research
Starting a Performance Based Testing Program:
Session 8 Data Processing
TOPIC 4 STAGES OF TEST CONSTRUCTION
PILOTING CROATIAN EXPERIENCE Tamara Kramarić Maras
Lies, Damned Lies & Statistical Analysis for Language Testing
STANAG 6001 Testing Workshop
Republic of Bulgaria Ministry of Defense
Marketing Research: Course 4
BiH Test Piloting Mary Jo DI BIASE.
Starter: Learning Check - Questionnaires
Bilateral cooperation on trialling - Latvian & Estonian experience
Qualities of a good data gathering procedures
Successful trialling: from trial and error to best practices
M.A. Vargas Londoño E.O. Cardoso Espinosa J.A. Cortés Ruíz
Presentation transcript:

Challenges of Piloting Test Items Branka Petek School of Foreign Languages Slovenia

Content Challenges Slovenia had to face when piloting test items What we learned from experience

Why pilot test items? To get a clear picture about candidates’ language skills. To get a clear picture we need good test items. Impossible to have good test items without pre- testing.

Challenges SFL had to face Appropriate population for piloting Administration of the items Test format Statistical analyses

Population for piloting Size Similarity to the Slovenian testing population Level of proficiency Test fatigue

Lessons learned SIZE: the population should be as big as posible, (but) anything is better than nothing; SIMILARITY: the population should be similar to the testing population; LEVEL OF PROFICIENCY: normal (or near normal) distribution otherwise the results will be unreliable. TEST FATIGUE: Have the canidates piloted before? Are they tired of taking the tests, piloting?

Administration Administrators Time Courses Collecting data on testakers

Lessons learned ADMINISTRATORS: the most reliable results when we administer the tests; TIME: depends on a course cycle; COURSES: courses designed to prepare students for STANAG tests normally give the most reliable results; QUESTIONNAIRES: help investigate face validity of tests, time allocated, clarity of rubrics, appropriacy of test methods, text topics (if well designed).

Test format Length Number of items Task types Topics (cultural background, influence of the course)

Lessons learned LENGTH: Similar to the live test version; NUMBER OF ITEMS: approximately the same number of items; TASK TYPES: different countries use different methods – candidates might not be familiar with the task types we use; FAMILIARITY WITH THE TOPICS: e.g. military topics (cultural background);

Statistical analyses CTT IRT ‘Manual check’ The influence of a particular population

Lessons learned Small population, CTT – the only option; Sometimes less than 30 - manual checking: odd answers and strange behaviour, can help eliminate some problems and improve the items; With small population the data is less reliable - always an element of risk.

Perfect & real-world of piloting A perfect world piloting session would mean at least 300 test takers, IRT, revising test items, repilot, IRT, final version of the test and experts to determine cut-off scores. In real world piloting is difficult to plan and carry out. Absolutely essential part of a testing cycle. Piloting internationally can produce more reliable results but also represents many pitfalls we have to be aware of. Being aware of possible problems might help us plan. The more we invest (in the sense of time, effort and money), the more we get.

Thank you Questions, suggestions?