Validating analytic rating scales for speaking at tertiary level Armin Berger IATEFL TEASIG 2011.

Slides:



Advertisements
Similar presentations
By: Edith Leticia Cerda
Advertisements

Quality Control in Evaluation and Assessment
Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional.
[Insert faculty Banner] Consistency of Assessment
Victorian Curriculum and Assessment Authority
European Frameworks of Reference for Language Competences Waldemar Martyniuk Language Policy Division, Council of Europe / Jagiellonian University, Poland.
Assessment Assessment should be an integral part of a unit of work and should support student learning. Assessment is the process of identifying, gathering.
1 SESSION 3 FORMAL ASSESSMENT TASKS CAT and IT ASSESSMENT TOOLS.
Spiros Papageorgiou University of Michigan
Raili Hildén, University of Helsinki, Finland TBLT 2009 Lancaster ‘Tasks: context, purpose and use’ 3rd Biennial International.
1 The Swiss ‘IEF’ Project - Assessment Instruments Supporting the ELP by Peter Lenz University of Fribourg/CH Voss/N, 3/06/05.
Centre for Applied Linguistics Dr Claudia Harsch Centre for Applied Linguistics University of Warwick From Norm- to Standards-based assessment What role.
1 © 2006 Curriculum K-12 Directorate, NSW Department of Education and Training Implementing English K-6 Using the syllabus for consistency of teacher judgement.
Adapting the CEFR to enhance language graduates’ employability Marga Menendez-Lopez Dr. Doris Dippold University of Surrey.
MULTILINGUAL & MULTICULTURAL EDUCATION DEPARTMENT
Consistency of Assessment
VALIDITY.
Second Language Acquisition and Real World Applications Alessandro Benati (Director of CAROLE, University of Greenwich, UK) Making.
E-Program Portfolio Let’s Begin Department of Reading and Language Arts Program Portfolio Central Connecticut State University Name: Date Submitted: Program.
Assessment Human Society and Its Environment 7-10 © 2007 Curriculum K-12 Directorate, NSW Department of Education and Training.
Relating language examinations to the Common European Framework of Reference for Languages (CEFR) Waldemar Martyniuk Waldemar Martyniuk Language Policy.
6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?
Raili Hildén University of Helsinki Relating the Finnish School Scale to the CEFR.
Is rater training worth it?
Comparing Generic Student Learning Outcomes among Fresh Graduates in the Workplace Comparing Generic Student Learning Outcomes among Fresh Graduates in.
ESL Phases & ESL Scale Curriculum Corporation 1994.
Challenges in Developing and Delivering a Valid Test Michael King and Mabel Li NAFLE, July 2013.
14th International GALA conference, Thessaloniki, December 2007
Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.
Applying Principles of Universal Design to Assessment Item Modification Peter A. Beddow III Vanderbilt University Nashville, TN June 2008.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Developing Communicative Dr. Michael Rost Language Teaching.
Principles in language testing What is a good test?
1 Use of qualitative methods in relating exams to the Common European Framework: What can we learn? Spiros Papageorgiou Lancaster University The Third.
Understanding Meaning and Importance of Competency Based Assessment
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
THE DANIELSON FRAMEWORK. LEARNING TARGET I will be be able to identify to others the value of the classroom teacher, the Domains of the Danielson framework.
General Information Iowa Writing Assessment The Riverside Publishing Company, 1994 $39.00: 25 test booklets, 25 response sheets 40 minutes to plan, write.
Workshop: assessing writing Prepared by Olga Simonova, Maria Verbitskaya, Elena Solovova, Inna Chmykh Based on material by Anthony Green.
ACCURACY IN ASSESSMENT; EVIDENCING AND TRACKING PROGRESS IN TEACHER EDUCATION BEA NOBLE-ROGERS.
This project has been funded with support from the European Commission. This publication reflects the views only of the author, and the Commission cannot.
DEPARTMENT OF EDUCATION AND TRAINING Assessment using the Australian Curriculum 2012.
Relating examinations to the CEFR – the Council of Europe Manual and supplementary materials Waldek Martyniuk ECML, Graz, Austria.
Creswell Qualitative Inquiry 2e
Chapter Eight: Quantitative Methods
Australian Curriculum English Achievement Standards PRESENTER: CHRIS THOMPSON SAETA COUNCIL MEMBER.
Glyn Jones Product Development Manager Dr John H.A.L. De Jong Director of Test development Pearson Language Assessments, London Linking Exams to the Common.
Quantification of dyspnea using descriptors: Development and initial testing of the Dyspnea-12 J Yorke, S H Moosavi, C Shuldham, P W Jones (Thorax
EVALUATING EPP-CREATED ASSESSMENTS
Classroom Assessments Checklists, Rating Scales, and Rubrics
Introduction to the Specification Phase
ECML Colloquium2016 The experience of the ECML RELANG team
Introduction to the Validation Phase
Writing Rubrics Module 5 Activity 4.
Key findings on comparability of language testing in Europe ECML Colloquium 7th December 2016 Dr Nick Saville.
Introduction to the Validation Phase
Training in Classroom Assessment Related to the CEFR
Defining Criterial Features at C1: an approach Susan Sheehan
Introduction to the Validation Phase
Surface energy modification for biomedical material by corona streamer plasma processing to mitigate bacterial adhesion Ibrahim Al-Hamarneh Patrick.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Erich C. Dierdorff and Frederick P. Morgeson
RELATING NATIONAL EXTERNAL EXAMINATIONS IN SLOVENIA TO THE CEFR LEVELS
EALTA MILSIG: Standardising the assessment of writing across nations
A brief presentation on:
Specification of Learning Outcomes (LOs)
From Learning to Testing
Developing rating instruments for the assessment of Academic Writing and Speaking at Austrian University English Departments Language Testing in Austria:
Language Testing in Austria: Towards a Research Agenda Dr Armin Berger
Qualities of a good data gathering procedures
Presentation transcript:

Validating analytic rating scales for speaking at tertiary level Armin Berger IATEFL TEASIG 2011

Overview Background Rating scale development The study –Research questions –Method –Analysis Expected results Conclusion IATEFL TEASIG 2011

Testing speaking: some challenges Definition of the construct –What is speaking? Construct-irrelevant variance –What influences performance? Reliability –What do raters do? IATEFL TEASIG 2011

ELTT scale: presentation Lexico- grammatical resources and fluency Pronunciation and vocal impact Structure and content Genre-specific presentation skills: formal presentation 1Descriptor IATEFL TEASIG 2011

ELTT scale: presentation Lexico- grammatical resources and fluency Pronunciation and vocal impact Structure and content Genre-specific presentation skills: formal presentationC2Descriptor C1 below C1 IATEFL TEASIG 2011

ELTT scale: presentation Lexico-grammatical resources and fluency Pronunciation and vocal impact Structure and content Genre-specific presentation skills: formal presentation flexibility range control fluency segmentals suprasegmentals prosodic features overall structure coherence cohesion relevance visuals time-keeping take-home message rhetorical features audience rapport paralinguistic features IATEFL TEASIG 2011

ELTT scale: interaction Lexico-grammatical resources and fluency Pronunciation and vocal impact Content and relevance Interaction flexibility range control fluency segmentals suprasegmentals prosodic features task awareness relevance contribution to discussion flexibility collaboration strategies IATEFL TEASIG 2011

ELTT descriptor units Lexico- grammatical resources and fluency Pronunciation and vocal impact Structure and content Genre-specific presentation skills: formal presentation Lexico- grammatical resources and fluency Pronunciation and vocal impact Content and relevance Interaction ELTT CEFR adapted IATEFL TEASIG 2011

Scale development Intuitive methods –Expert judgement –Committee –Experiential Empirical methods –Data-based –Empirically derived, binary-choice, boundary definition –Scaling descriptors (Fulcher 2003) IATEFL TEASIG 2011

Scale development: ELTT IATEFL TEASIG 2011

Scale validation Threats to validity –“... descriptions of expected outcomes, or impressionistic etchings of what proficiency might look like as one moves through hypothetical points or levels on a developmental continuum” [own emphasis] (Clark 1985) IATEFL TEASIG 2011

Scale validation (McNamara 2008) IATEFL TEASIG 2011

Scale validation Threats to validity –“... descriptions of expected outcomes, or impressionistic etchings of what proficiency might look like as one moves through hypothetical points or levels on a developmental continuum” [own emphasis] (Clark 1985) –scale use Validation prior to use –Milanovic et al. 1996; Taylor 2000 IATEFL TEASIG 2011

Research questions 1.Do the descriptors of the ELTT speaking scales form implicational scales of language development? a.To what extent are raters consistent in sequencing the ELTT rating scale descriptors? b.Do the ELTT scale descriptors represent the stages of developing speaking proficiency in a consecutive order? 2.Are users of the scales consistent in their scale interpretations? 3.Can users of the scales clearly distinguish between the successive scale levels? IATEFL TEASIG 2011

Research design Phase 1Phase 2 Subjects students of English 15 language teachers at Austrian English departments Instruments task prompts video performances sorting task rating sheet rater questionnaire rating scale rating sheet rater manual rater questionnaire Procedures sorting task descriptor scaling rater feedback rating trial verbal protocol rater feedback Analyses correlations multifaceted Rasch questionnaire analysis multifaceted Rasch verbal protocol analysis questionnaire analysis Triangulation IATEFL TEASIG 2011

Stages Stage 1: Development and piloting of instruments Stage 2: Mock exams IATEFL TEASIG 2011

Data collection I IATEFL TEASIG 2011

Stages Stage 1: Development and piloting of instruments Stage 2: Mock exams Stage 3: Raters’ data Stage 4: Data analysis IATEFL TEASIG 2011

Analysis Rasch analysis is grounded in probability theory allows the calibration of items and persons on a linear scale is used to determine the difficulty of individual test items is based on a simple assumption IATEFL TEASIG 2011

Analysis (McNamara 2008) IATEFL TEASIG 2011

Analysis Multifaceted Rasch analysis is grounded in probability theory allows the calibration of items and persons on a linear scale is used to determine the difficulty of individual test items is based on a simple assumption takes additional variables into account is adapted for descriptor scaling to indicate the relative difficulty of descriptors IATEFL TEASIG 2011

Illustrative output Relative difficulty of descriptors Logit scale IATEFL TEASIG 2011

Expected results RQ1: –If raters are able to sequence the descriptor units consistently, this can be interpreted as validity evidence. –If multifaceted Rasch analysis generates a scale that reflects the intended order, this can be interpreted as validity evidence. –Since the ELTT rating scales have largely been modelled on the CEFR, it is expected that most ELTT descriptors will form a unidimensional scale of increasing speaking ability. However, it will be interesting to see how those descriptors unique to the ELTT scales perform psychometrically. IATEFL TEASIG 2011

Implications The results will shed light on the developmental continuum of speaking ability underlying the ELTT scales. The study will tease out the implications of the results for scale revision and rater training. The results will allow conclusions about the specific methodology employed in the construction of the ELTT rating scales. The results will indicate how readily the upper levels of the CEFR, C1 and C2, can be further divided into more subtle yet distinguishable levels. Generally speaking, it is hoped that the study can make a contribution to a better understanding of the assessment of advanced second language speaking. IATEFL TEASIG 2011

References Brindley, Geoff "Describing language development? Rating scales and SLA." In: Clark, John "Curriculum renewal in second language learning: An overview." Canadian modern language review 42, Fulcher, Glenn Testing second language speaking. London: Pearson Longman. Kaftandjieva, Felianka and Sauli Takala "Council of Europe scales of language proficiency: A validation study." In: Council of Europe. Common European framework of reference for languages: Learning, teaching, assessment: Case studies, Linacre, Mike. 2010a. FACETS: Rasch measurement computer program. Chicago: MESA Press. McNamara, Tim Measuring second language performance. London: Longman. Milanovic, Michael et al "Developing ratings scales for CASE: Theoretical concerns and analyses." In: Cumming, Alister and Richard Berwick (eds.). Validation in language testing. Clevedon: Multilingual Matters, North, Brian The development of a common framework scale of language proficiency. New York: Peter Lang. Tyndall, Belle and Dorry Kenyon "Validation of a new holistic rating scale using Rasch multi- faceted analysis." In: Cumming, Alister and Richard Berwick (eds.). Validation in language testing. Clevedon: Multilingual Matters, IATEFL TEASIG 2011

Thank you! Armin Berger IATEFL TEASIG 2011