BiH Test Piloting Mary Jo DI BIASE.

Slides:



Advertisements
Similar presentations
Lecture 7: reliability & validity Aims & objectives –This lecture will explore a variety of techniques for ensuring that research is conducted with reliable.
Advertisements

Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Susan Malone Mercer University.  “The unit has taken effective steps to eliminate bias in assessments and is working to establish the fairness, accuracy,
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus.
Inter-rater reliability in the Performance Test: Summer workshop 2014 By: Dustin Florence.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
In the name of Allah. Development and psychometric Testing of a new Instrument to Measure Affecting Factors on Women’s Behaviors to Breast Cancer Prevention:
Testing for Language Teachers
NATO BILC 5-10 May 2013 LTCOL Fiona Curtis.
BILC Standardization Initiatives and Conference Objectives
Stages of testing + Common test techniques
TeamSTEPPS TM National Implementation Measurement The following slides are not part of the TeamSTEPPS Instructor Guide. Due to federal 508 compliance requirements.
1 THE INTO EUROPE SERIES IN IN-SERVICE TEACHER TRAINING Presentation by Györgyi Együd ‘Into Europe - European Standards in Language Assessment’ Conference.
6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?
Francesco Gratton 2013 Testing in the time of crisis BILC PROFESSIONAL SEMINAR Stockholm, October , 2013 INNOVATIVE TEST DESIGNS AND FORMATS Lt.Col.
Research in Sociology. Research methods Factual or empirical questions only ask about the facts of an event and do not consider why or how the event occurs.
WORKSHOP LANGUAGE PROFICIENCY REQUIREMENTS IMPLEMENTATION March 2010 Rome - Italy REGULATORY ISSUES ON TESTING Eleonora Italia Enac Personnel Licensing.
Ch 6 Validity of Instrument
Foundations of Educational Measurement
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Standardizing Testing in NATO Peggy Garza and the BAT WG Bureau for International Language Co-ordination.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
EDU 8603 Day 6. What do the following numbers mean?
MINISTRY OF DEFENCE REPUBLIC OF BULGARIA
Assessing the Quality of Research
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL.
STANAG OPI Testing Julie J. Dubeau Bucharest BILC 2008.
What are the stages of test construction??? Take a minute and try to think of these stages???
Building the NCSC Summative Assessment: Towards a Stage- Adaptive Design Sarah Hagge, Ph.D., and Anne Davidson, Ed.D. McGraw-Hill Education CTB CCSSO New.
Stages of Test Development By Lily Novita
Dr. Antar Abdellah Types of testsThe nature of achievement testsBasic testing terminologyThe characteristics of a good achievement testDeveloping.
Economics 111Lecture 7.2 Quantitative Analysis of Data.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
EVALUATING EPP-CREATED ASSESSMENTS
Testing the Test – Serbian STANAG 6001 English Language Test
Becoming Acquainted With Statistical Concepts
Test Validation Topics in the BILC Testing Seminars
Using Data to Drive Decision Making:
A Pilot Study of the DAPTM Interview in the Online Environment
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
IMPLEMENTATION OF ICAO LANGUAGE PROFICIENCY REQUIREMENTS IN BELARUS
Test Design & Construction
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Introduction to the Validation Phase
Classroom Analytics.
(Standardizing the Standards of Teaching and Testing in the Military)
How Do Psychologists Ask & Answer Questions?
Immediate activity What is this an example of?.
Statistics and Research Desgin
Canadian Defence Academy
Chief of English Testing, Language Programs
Roadmap Towards a Validity Argument
Measures of Dispersion
Introduction to the WIDA Consortium
Best Practices in STANAG 6001 Testing
Basic Statistics for Non-Mathematicians: What do statistics tell us
Using Verbal Reports for Data Collection and Analysis
STANAG 6001 Testing Workshop
Experiment Basics: Variables
Challenges of Piloting Test Items
Analyzing test data using Excel Gerard Seinhorst
Test format Total test time:
Week 14 More Data Collection Techniques Chapter 5
Qualities of a good data gathering procedures
Sales Presentation.
Successful trialling: from trial and error to best practices
Presentation transcript:

BiH Test Piloting Mary Jo DI BIASE

In the beginning……

In the beginning…… Workshops on: item development validation practice in oral interviews rating In the beginning……

Bosnia’s Items ready Italy’s guinea pigs ready

Summary Test population background Trialling phases Statistical procedures Results …. and more

Test population background Staff officer’s career Course specifications JFLT –correlation

Staff officer’s career ISSMI: selected Army officers one year duration including language course with final SLP 3333

Test population background Three-month course Staff officer’s career 34 hours a week Course specifications Additional activities JFLT –concurrent validity Periodic diagnostic tests

Test population background Staff officer’s career Piloting held two weeks before JFLT Both are proficiency tests Course specifications Both based on STANAG 6001 ed. 2 JFLT –correlation Both have similar test types

Summary Test population background Trialling phases Statistical procedures Results …. and more

Listening & Reading Items in booklet form: total 60 level two and three items Instructions and proctoring Questionnaires

Writing Speaking 15 minute interview between interviewer and candidate (observer in background with rating scale) Writing Writing scripts for inter rater reliability Trialling prompts to check for level appropriateness, etc

Summary Test population background Trialling phases Statistical procedures Results …. and more

Score Distribution Cluster: mean, mode, median Dispersion: standard deviation, range

Item Behavior Classical Item analyses (Facility Value &Discrimination Index) Distracter Analysis

Reliability Scale if item is deleted Inter-rater reliability coefficient (Speaking and Writing)

Interpretation of Results Items in relation to: Facility Value and Discrimination Index Distracter behavior (ambiguity, implausible options, etc) Overall mean of test Reliability of test if item is deleted

Example According to the report… From an economic report (common knowledge-discarded) According to the report… approvals of bank loans have gone up recently. the UK economy is severely going downhill. recession started earlier than it was thought. positive figures are predicted for the next year.

C lassical I tem A nalysis Key B 94% A 0% C 5% D 1% FV 94% DI 0,05

Summary Test population background Trialling phases Statistical procedures Results …. and more

READING LISTENING Mean 60,87% Median 60,40% Mode 67,79% Range 62 Standard Deviation 13,35% Skewness -0,486 Kurtosis 0,043 READING Mean 77,05% Median 80,00% Mode 75,00% Standard Deviation 13,33% Range 60 Skewness -1,150 Kurtosis 1,477

Listening Reading

Reliability

Speaking Inter-rater reliability between Interviewer (holistic rating) observer (analytic rating) trainer in background Language generated by prompts (wording, level alignment) Tester conduct and elicitation techniques during interview

Correlation Coefficient: Writing Inter-rater Reliability Correlation Coefficient: 0,617 0,517 0,458 0,250

WRAPPING UP Overall beneficial: positive statistical results (30% items discarded) - additional ‘live’ training - more piloting needed

Summary Test population background Trialling phases Statistical procedures Results …. and more

First Official Administration of BiH test:

Fotografie Say…..

The BiH Testing Team in Rome

Thank you!!! maryjo.dibiase@unipg.it maryjo.dibiase@gmail.com