1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus
2 Overview 1. Why the BEST Plus? 2. What does the BEST Plus look like? 3. What is its research base? 4. How can the BEST Plus be used?
4 The original BEST Oral Interview Developed early 1980s Assessed basic functional oral English language skills for adult immigrants and refugees Designed for program use Began to be widely used for accountability purposes
1. Where is he? 2. In, where did you buy your food? 3. Is shopping in and the same? How is it different/the same?
6 The BEST Plus A performance-based assessment (individually administered face-to-face oral interview) Assesses functional oral language skills (interpersonal communication) of adult ESL learners using everyday language Designed with current assessment needs in mind
7 Goals in developing the BEST Plus Respond to adult ESL program needs for assessment and accountability – Produce a test that is short and practical – Assess learner language for a variety of purposes and stakeholders – Increase accuracy in measuring oral proficiency – Provide “multiple forms” for pre- and post-testing
9 BEST Plus components (computer-based version) Test items appear on the computer screen (instead of in a test booklet) If an item requires a visual, examinees view the visual on the computer screen (instead of a picture cue booklet) Test administrators enter scores directly into the computer (instead of on a score sheet)
11 3. What does the computer-assisted BEST Plus look like?
12 3. What does the computer-assisted BEST Plus look like?
13 Sample computer screen
14 BEST Plus components (print-based version) Three forms Within each form, locator test + three level tests – SPL1-4 – SPL 4-6 – SPL 6-10 Materials – Picture booklet – Test booklet (scripts and score sheet combined)
16 Scoring on 3 components of proficiency Listening Comprehension = How well did the examinee understand the setup and question? Language Complexity = How did the examinee organize and elaborate the response? Communication = How clearly did the examinee communicate meaning?
17 Ability estimation After each question, the program estimates the examinee’s ability based on scores awarded on the current and all previous questions. With each estimation, the accuracy of the measurement increases. Goal: To ‘level off’ in estimation with acceptable level of accuracy.
18 Path through the computer-adaptive BEST Plus Following a fixed “warm-up,” examinees are asked questions drawn from several thematically-based “folders.” After hearing each response, the test administrator enters a score for each component. After each set of scores is entered, the computer updates its estimate of the examinee’s ability, and chooses folders and questions as appropriate. The test ends when one of three conditions is met. Users can instantly receive full score report.
19 Path through the print-based BEST Plus Administer and score Locator questions (the fixed “warm-up” items + 2 high end discriminators) Total score on Locator and choose level test based on chart Administer level test Total raw score and find approximate SPL range Enter raw scores into computer BEST Plus Score Management software to obtain full score report
21 Rigorous development procedures Feasibility study ( ) Initial development ( ) Pilot, small scale field test, initial reliability study (2001) Revisions ( ) Pilot, full scale field test, reliability study, standard setting study (2002) Finalization of training materials, ancillary materials, further refinements (2003)
22 Full involvement of stakeholders OVAE oversight Technical Working Group (TWG), comprised of researchers, state directors, and local program directors and practitioners Item writers, comprised of experienced adult ESL teaching professionals Instructors and students in the field
23 Example: Full scale field test participants 9 states (DC, DE, FL, IL, MA, MD, OR, PA, VA) 23 programs 41 administrators 2420 examinees
24 Example: Reliability study adult ESL students Two testing rooms (A, B) Administrator (project staff) Observer/Co-Scorer (project staff) Observer/Co-Scorer (novice scorer) Each student was tested, then immediately retested in second room
25 Average interrater agreement Within administration (same room) Total ScoreRoom A (3 raters) Room B (3 raters)
26 Test/re-test reliability Between Rooms Final Ability Estimate
27 Example: Some initial validity evidences Analyses of ancillary data collected from program records during the field test, including test scores less than six months old Standard setting study
28 Correlations with program placement Range of Correlation Number of Programs Percentage.80 or above730.4%.70 to %.60 to %.50 to % Below % TOTALS23100%
29 Summary: Program placement correlations 69.5% of the correlations were.70 or higher
30 Example: Standard setting study 11 judges 30 student performances Performances (about 6 min each) arranged from lowest to highest Judgment made: “Which SPL is best characterized by this performance?” Judges were able to complete this task relating the SPL descriptors to the observed performances
32 The BEST Plus Score Report Information includes: – BEST Plus Scale Score – SPL level – NRS level – Diagnostic information
33 Uses of the BEST Plus Accountability – National Reporting System (NRS), as scores on the BEST Plus relate to the 6 NRS levels for Speaking and Listening – Program Evaluation
34 Standard setting outcome (SPLs) SPLScale Score Range 0Below Above 795
35 Standard setting outcome (NRS) NRS LevelRelated SPLBEST Plus Scale Scores Beginning ESL Literacy0-1Below 401 Beginning ESL Low Intermediate ESL High Intermediate ESL Low Advanced ESL High Advanced ESL7 or moreAbove 540
36 Uses of the BEST Plus Within Programs – Placement – Progress – Diagnosis – Screening
37 Diagnostic score report information
38 Example (diagnostic information) Relative to other SPL 5s, current examinee is: Low in listening High in complexity Average in communication
