Download presentation
Presentation is loading. Please wait.
Published byKate Berley Modified over 10 years ago
1
1 The Swiss ‘IEF’ Project - Assessment Instruments Supporting the ELP by Peter Lenz University of Fribourg/CH Voss/N, 3/06/05
2
2 2001 - EYL: Launch of ELP 15+ in Switzerland In 2001 the Swiss Conference of Cantonal Ministers of Education recommends to the cantons to consider the CEFR in curricula (objectives and levels) in the recognition of diplomas to facilitate wide use of the ELP 15+ make ELP accessible to learners help teachers to integrate ELP in their teaching to develop ELPs for younger learners
3
3 Integrating the ELP – School Teachers’ Wishlist : More descriptors taylored to young learners ‘ needs : Less abstract formulations : Self-assessment grid and checklists with finer levels : Tools facilitating “hard” assessment : Test tasks relating to descriptors Marked and assessed learner texts Assessed spoken learner performances on video Assessment criteria relating to finer levels for Speaking and Writing
4
4 Meeting the Needs for ELP 11-15 IEF Project (2002-2005) “Instruments for the Assessment of Foreign-Language Competences - English, French” FL German-speaking cantons of Switzerland Principality of Liechtenstein Bildungsplanung Zentralschweiz Peter Lenz & Thomas Studer (UniFR)
5
5 IEF: Overview of Expected Products Bank of validated test tasks ( 5 “skills”; C-tests) Benchmark performances (Speaking, Writing) Bank of target-group-specific descriptors (levels A1.1-B2.1) Tests available from publisher Tests for evaluations Assessment criteria (Speaking, Writing) (Self-)assessment checklists Training packages for teacher training ELP
6
6 Phase I A New Bank of Can-do Statements How did the new descriptors take shape? 1) Collecting from written sources (ELPs, textbooks, other sources) Teachers decide on relevance for target learners and on suitability for assessment Teachers complement collection 2) Validating, amending the collection in workshops 3) Fine-tuning and selecting descriptors Make formulations non-ambiguous and accessible; add examples Select descriptors to cover whole range of levels A1.1 - B2.1 Represent wide range of skills and tasks ~330 descriptors for empirical phase
7
7 Assessment questionnaires – Teachers assess their pupils Following Schneider & North‘s methodology for the CEFR Phase I Calibrating Additional Descriptors
8
8 Linked and anchored assessment questionnaires of 50 descriptors each for different levels 2 parallel sets of descrip- tors of similar difficulty per assumed level Identical descriptors as links (& sometimes CEFR anchors) Too few learners at B2
9
9 Phase I Calibrating Additional Descriptors Statistical analysis and scale-building (A1.1 - B1.2)
10
10 Phase II Adapting Descriptors for Self-assessment (Self-)assessment checklists ELP Bank of target-group-specific descriptors (levels A1.1-B2.1)
11
11 Phase II Reformulations – Can … I can... 1.Some Can do s are transformed into I can s 2.Learners are asked for feedback: learners assess themselves and give feedback on that 3.Whole bank of Can do s is transformed into I-can statements
12
12 Phase II Checklists for the New Swiss ELP (Self-)assessment checklists Bank of target group-specific descriptors (levels A1.1-B2.1) Drawing on 3 sources
13
13 ELP II: Self-assessment in Relation to Finer Levels
14
14 Phase III Developing Test Tasks and Instruments Bank of validated test tasks (Self-)assessment checklists Bank of target-group-specific descriptors (levels A1.1-B2.1) ELP
15
15 Phase III Test Tasks and Instruments Speaking tasks (production and interaction) Writing tasks Listening tasks Reading tasks 1) Test tasks relating to communicative language ability 2) C-Tests (integrative tests) C-Tests (type of CLOZE) are said to provide reliable information on a learner‘s linguistic resources esp. for (written) Production. Most test tasks are related to one descriptor, sometimes two – but descriptor difficulty vs. task diff.? The test tasks are field-tested and attributed to a level at least tentatively Validation: tests + teacher questionnaires
16
16 Phase IV Assessment Criteria for Performances Assessment criteria for Speaking and Writing Bank of target-group-specific descriptors (levels A1.1-B2.1) (Self-)assessment checklists ELP Bank of validated test tasks (mainly performance-oriented)
17
17 Phase IV Developing Criteria for Speaking How did the criteria take shape? – Steps taken: Collect criteria from various sources: CEFR, examination schemes... 1) Collecting criteria Teachers bring video recordings Teachers describe differences between learner performances they can watch on video more criteria Teachers adopt and apply descriptors from existing collection Teachers agree on essential categories (e.g. Vocab range, Pronunciation/Int. ) and build a scale for each analytical category 2) Assessing spoken performances in workshops 3) Preparing empirical validation Decide on categories to be retained Revise and complete proposed scales of analytical criteria
18
18 Phase IV Producing Video Tapes With Spoken Performances One learner - different tasks in various settings
19
19 Phase IV Empirical Validation of Speaking Criteria Methodology A total of 35 teachers (14 Fr, 21 En) apply 58 analytical criteria (some from CEFR ) belonging to 5 categories 28 task-based descriptors (matching performed tasks) to 10 or 11 video-taped learners per language, each performing 3-4 spoken tasks Criteria categories Interaction Vocabulary range Grammar Fluency Pronunciation & Intonation
20
20 Phase IV Calibrating Criteria for Speaking Criteria and questionnaires - a linked and anchored design CEFR Anchors 3 assessment questionnaires for three different learner levels “Statement applies to this pupil but s/he can do clearly better” “Statement generally applies to this pupil ” “Statement doesn‘t apply to this pupil” For reasons of practicality: only 3-step rating scale for Can descriptors/criteria !! Links between questionnaires
21
21 Phase IV Criteria for Speaking – Analysis (1) The 5 analytical categories retained – Correlations and Fit InteractionVocabGrammarFluencyPronuncia- tion Overall Interaction 1.00 Vocab 0.991.00 Grammar 1.000.991.00 Fluency 0.980.99 1.00 Pronuncia- tion 0.920.93 1.00 Overall 1.00 0.931.00 Disattenuated correlations between pupil measures suggest proximity of categories/competences – except Pronunciation/Intonation FACETS indicates slight misfit for Fluency ; overfit for Interaction (.88)
22
22 Phase IV Criteria for Speaking – Analysis (2) Criteria applied to French and English – Diagnosing DIF
23
23 Phase IV Criteria for Speaking – Analysis (3) Teacher severity and consistency Consistency: 5 out of 35 raters were removed from the analysis due to misfit of up to 2.39 logits (infit mean square) Severity: Some extreme raters (severe or lenient) show a strong need for rater training although every criterium makes a meaningful (though somewhat abstract) statement on mostly observable aspects of competence. Map for English
24
24 Phase IV Criteria for Speaking – Anchoring (1) 11 analytical criteria from the CEFR linked design (mostly Interaction and Vocabulary) A total of 28 task-oriented descriptors from the IEF bank in a linked design Known scores of 3 learners of English rated in teacher workshop on CEFR basis Used here Potential anchors towards the CEFR in the data:
25
25 Phase IV Criteria for Speaking – Anchoring (2) CEFR difficulties (x-axis) and IEF difficulties (y-axis) of criteria are plotted (blue diamonds) using a scaling factor for equating the separate calibrations. Lines visualize the usual 95% confidence interval that helps detect items that are not suitable as anchors (outliers). Outlier – perceived more difficult in IEF (over 3 logits): Can link groups of words with simple connectors like ‘and’, ‘but’ and ‘because’. Outlier removed
26
26 Phase IV Criteria for Speaking – Taking Stock The calibrations of the video-taped learners are very plausible. IEF now has video-taped examples of learners from below A1 (French only) to a very high B2. The additional, newly-developed criteria are well spread across the targeted level range A1-B2 (A1.1?). But: What will the assessment instruments to be used in schools look like?
27
27 Phase IV Assessment instruments for Speaking Problem: middle category “Statement generally applies to this pupil “ – desirable (because of its meaningfulness) but possibly too broad Range here: -2.57 to +2.57 logits Solutions? Other formulations for narrower categories? Use e.g. B1.2 descrip- tor to establish A2.1 of a learner? ???
28
28 Phase IV Assessment instruments for Speaking Narrower categories - Can the middle category be divided up into three? Range of middle category: -1.2 to +0.8 logits Main problem: Raters have the impression to apply modifiers upon modi- fiers, new restrictions upon restrictions already present in the criteria. 0…0… 1 Pupil has this ability only partially 2 Pupil generally has this ability. 3 Pupil fully has this ability 4…4…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.