1/27 CRESST/UCLA Research findings on the impact of language factors on the assessment and instruction of English language Learners Jamal Abedi University.

Slides:



Advertisements
Similar presentations
National Accessible Reading Assessment Projects Examining Background Variables of Students with Disabilities that Affect Reading Jamal Abedi, CRESST/University.
Advertisements

Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards.
Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
ELL-Language-based Accommodations for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University of California, Davis.
Issues Related to Assessment with Diverse Populations
Jamal Abedi National Center for Research on Evaluation, Standards, and Student Testing UCLA Graduate School of Education & Information Studies November.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
C R E S S T / U C L A Issues and problems in classification of students with limited English proficiency Jamal Abedi UCLA Graduate School of Education.
Measuring Achievement and Aptitude: Applications for Counseling Session 7.
Item PersonI1I2I3 A441 B 323 C 232 D 112 Item I1I2I3 A(h)110 B(h)110 C(l)011 D(l)000 Item Variance: Rank ordering of individuals. P*Q for dichotomous items.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Are Accommodations Used for ELL Students Valid? Jamal Abedi University of California, Davis National Center for Research on Evaluation, Standards and Student.
Structural Equation Modeling
Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.
1/16 CRESST/UCLA Alternative Assessment for English Language Learners Christy Kim Boscardin Barbara Jones Shannon Madsen Claire Nishimura Jae-Eun Park.
Minnesota Manual of Accommodations for Students with Disabilities Training Guide
Jamal Abedi University of California, Davis/CRESST Presented at: The Race to the Top Assessment Program Public & Expert Input Meeting December 2, 2009.
LECTURE 16 STRUCTURAL EQUATION MODELING.
Mark DeCandia Kentucky NAEP State Coordinator
Assessment Accommodations for English Language Learners: Implications for Policy-Based Empirical Research By: Erin Burns.
Jamal Abedi University of California, Davis/CRESST Validity, Effectiveness and Feasibility of Accommodations for English Language Learners With Disabilities.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
The University of Central Florida Cocoa Campus
Martha Thurlow and Laurene Christensen National Center on Educational Outcomes CEC Preconvention Workshop #4 April 21, 2010.
National Accessible Reading Assessment Projects Research on Making Large-Scale Reading Assessments More Accessible for Students with Disabilities June.
Assessment and Accountability Issues for English Language Learners and Students With Disabilities Oregon Department of Education October 4, 2007 Jamal.
Texas Comprehensive SEDL Austin, Texas March 16–17, 2009 Making Consistent Decisions About Accommodations for English Language Learners – Research.
CRESST ONR/NETC Meetings, July 2003, v1 1 ONR Advanced Distributed Learning Language Factors in the Assessment of English Language Learners Jamal.
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
C R E S S T / U C L A Impact of Linguistic Factors in Content-Based Assessment for ELL Students Jamal Abedi UCLA Graduate School of Education & Information.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
STUDENT AIMS PERFORMANCE IN A PREDOMINANTLY HISPANIC DISTRICT Lance Chebultz Arizona State University 2012.
WKCE Translation Accommodation Annual Bilingual/ESL Meeting October 8, 2009.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment.
Mark DeCandia Kentucky NAEP State Coordinator
NAEP 2011 Mathematics and Reading Results NAEP State Coordinator Mark DeCandia.
C R E S S T / U C L A UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation,
Do we have enough evidence on the validity of accommodations to justify the reporting of accommodated assessments? Jamal Abedi University of California,
CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:
Question paper 1997.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,
Assessment Parents Due Process Title 6 and ELL Using Assessment to Identify Evaluating Formally –IQ –Achievement Evaluating Informally –tying into instruction.
Jamal Abedi CRESST/University of California,Los Angeles Paper presented at 34 th Annual Conference on Large-Scale Assessment Boston, June 20-23, 2004.
SIOP: Sheltered Instruction Observation Protocol Dr. Kelly Bikle Winter 2007.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Minnesota Manual of Accommodations for Students with Disabilities Training January 2010.
C R E S S T / U C L A Psychometric Issues in the Assessment of English Language Learners Presented at the: CRESST 2002 Annual Conference Research Goes.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.
Jamal Abedi, UCLA/CRESST Major psychometric issues Research design issues How to address these issues Universal Design for Assessment: Theoretical Foundation.
Critical Issues Related to ELL Accommodations Designed for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University.
English Language Learners. What Is ELL? English Language Learners 1.) Students who are new to the English language. 2.) Students whose native language.
C R E S S T / U C L A UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation,
How to avoid misclassification of English Language Learners with Disabilities Presented at: Supporting English Language Learners with Disability Symposium.
NCLB Assessment and Accountability Provisions: Issues for English-language Learners Diane August Center for Applied Linguistics.
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Examples: Avoid Using Synonyms in Problems An issue that can create difficulties is to use a synonym for a word somewhere in the problem. Consider the.
ELL-Focused Accommodations for Content Area Assessments: An Introduction The University of Central Florida Cocoa Campus Jamal Abedi University of California,
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
Chapter 11 Effective Grading in Physical Education 11 Effective Grading in Physical Education C H A P T E R.
Accommodations in testing: Law, policy and practice
Esteban Villalobos, Diego Portales University
Designing Programs for Learners: Curriculum and Instruction
Presentation transcript:

1/27 CRESST/UCLA Research findings on the impact of language factors on the assessment and instruction of English language Learners Jamal Abedi University of California, Davis National Center for Research on Evaluation, Standards, and Student Testing UCLA Graduate School of Education January 23, 2007

2/27 CRESST/UCLA Why was assessment first mentioned in this title? For ELL students assessment starts before instruction Assessment results affect ELL students in the following areas: Classification Instruction Accountability (the NCLB issues) Promotion Graduation Thus assessment of ELL students is very high stakes.

3/27 CRESST/UCLA How do ELL students do in assessments in comparison with non-ELL students? ELL students perform lower than non-ELL students in general The performance-gap between ELL and non-ELL students increases as the language demand of test items increases The performance-gap approaches zero in content areas with a minimal level of linguistic complexity (e.g. math computation)

4/27 CRESST/UCLA SubgroupReadingMathLanguageSpelling LEP Status LEP Mean SD N62,27364,15362,55964,359 Non-LEP Mean SD N244,847245,838243,199246,818 SES Low SES Mean SD N92,30294,05492,22194,505 High SES Mean SD N307,931310,684306,176312,321 Site 2 Grade 7 SAT 9 Subsection Scores

5/27 CRESST/UCLA Reading Science Math MSD M SD M SD Grade 10 SWD only LEP only LEP & SWD Non-LEP/SWD All students Grade 11 SWD Only LEP Only LEP & SWD Non-LEP/SWD All Students Normal Curve Equivalent Means & Standard Deviations for Students in Grades 10 and 11, Site 3 School District

6/27 CRESST/UCLA Are the Standardized Achievement Tests Appropriate for ELLs? The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) elaborated on this issue: For all test takers, any test that employs language is, in part, a measure of their language skills. This is of particular concern for test takers whose first language is not the language of the test. Test use with individuals who have not sufficiently acquired the language of the test may introduce construct- irrelevant components to the testing process. (p. 91)

7/27 CRESST/UCLA Are the Standardized Achievement Tests Reliable and Valid for these Students? The reliability coefficients of the test scores for ELL students are substantially lower than those for non-ELL students ELL students’ test outcomes show lower criterion-related validity Structural relationships between test components and across measurement domains are lower for ELL students

8/27 CRESST/UCLA Site 2 Stanford 9 Sub-scale Reliabilities (Alpha), Grade 9 Non-LEP Students Sub-scale (Items)Hi SESLow SESEnglish Only FEPRFEPLEP Reading, N=205,09235,855181,20237,87621,86952,720 -Vocabulary (30) Reading Comp (54) Average Reliability Math, N=207,15536,588183,26238,32922,15254,815 -Total (48) Language, N=204,57135,866180,74337,86221,85252,863 -Mechanics (24) Expression (24) Average Reliability Science, N=163,96028,377144,82129,94617,57040,255 -Total (40) Social Science, N=204,96536,132181,07838,05221,96753,925 -Total (40)

9/27 CRESST/UCLA Why these tests are not reliable for ELL students There must be additional sources of measurement error affecting the assessment outcome for these students These sources include: Linguistic complexity of test items Cultural factors Interaction between linguistic and cultural factors with other student background variables

10/27 CRESST/UCLA Assumptions of Classical True-Score Test Theory 1. X = T + E (Total observed score is the sum of true score plus error score) 3.  ET = 0 (Correlation between error and true scores is zero) 2.  (X) = T (Expected value of observed score is true score) 5.  E1T2 = 0 (Correlation between error scores and true scores is zero) 4.  E1E2 = 0 (Correlation between two error scores is zero)

11/27 CRESST/UCLA Classical Test Theory: Reliability  2 X =  2 T +  2 E X: Observed Score T: True Score E: Error Score  XX’=  2 T / 2 X  XX’= 1-  2 E / 2 X Textbook examples of sources of measurement error: Rater; Occasion; Item; Test Form

12/27 CRESST/UCLA Gneralizability Theory: Language as an Additional Source of Measurement Error  2 ( X prl ) =  2 p +  2 r +  2 l +  2 pr +  2 pl +  2 rl +  2 prl,e p: Person r: Rater l: Language Are there any sources of measurement error that may specifically influence ELL performance?

13/27 CRESST/UCLA How can we improve reliability in the assessment for ELL students? Add more test items Control for the random sources of measurement error Control for systematic sources of measurement error

14/27 CRESST/UCLA Add More Test Items “As a general rule, the more items in a test, the more reliable the test” (Sylvia & Ysseldyke, 1998, p. 149). “The fact that a longer assessment tends to prove more reliable results was implied earlier...” (Linn, 1995, p. 100). “However, longer tests are generally more reliable, because, under the assumptions of classical true-score theory, as N increases, true-score variance increases faster than error variance” (Allen & Yen, 1979, p. 87).

15/27 CRESST/UCLA Formula: N =  XX’ (1 -  yy’ ) /  yy’ (1 -  xx’ ) For example, if we want to increase reliability of a 25-item test from.6 (  yy’ )to.8 (  xx’ ), we need to add 43 items. Add More Test Items

16/27 CRESST/UCLA A research example showing the effects of increasing test items Source: O'Neil, H. F. & Abedi, J. (1996). Reliability and validity of a state metacognitive inventory: Potential for alternative assessment. Journal of Educational Research, 89(4), SubscaleN of itemsAlpha () Effort 31 Effort 17 Effort 7 Worry 14 Worry 11 Cognitive Strategy 14 Cognitive Strategy

17/27 CRESST/UCLA Increasing number of test items for ELL students may cause further complexities: If the new items are more linguistically complex, they may add to construct-irrelevant variance/ measurement error and reduce the validity and reliability even further The cognitive load for the added items maybe greater than the cognitive load for the original items Providing extra time for the new items to the already extended time may cause more logistical problems

18/27 CRESST/UCLA Does reliability of a test affect test validity? Reliability sets the upper limit of a test’s validity, so reliability is a necessary but not a sufficient condition for valid measurement (Sylvia & Ysseldyke, 1998, p. 177) Reliability is a necessary but not sufficient condition for validity (Linn, 1995, p. 82) Reliability limits validity, because  xy < √  xx’ (Allen & Yen, p. 113) For example, the upper limit of validity coefficient for a test with a reliability of is 0.73

19/27 CRESST/UCLA Grade 11 Stanford 9 Reading and Science Structural Modeling Results (DF=24), Site 3 All Cases (N=7,176) Even Cases (N=3,588) Odd Cases (N=3,588) Non-LEP (N=6,932) LEP (N=244) Goodness of Fit Chi Square NFI NNFI CFI Factor Loadings Reading Variables Composite Composite Composite Composite Composite Math Variables Composite Composite Composite Composite Factor Correlation Reading vs. Math Note. NFI = Normed Fit Index. NNFI = Non-Normed Fit Index. CFI = Comparative Fit Index.

20/27 CRESST/UCLA Language of Assessment A clear and concise language is a requirement for reliable and valid assessments for ELL students It may also be important consideration for students with learning disabilities since a large majority of students with disabilities are in the Learning Disability category Students in the Learning Disability category may have difficulty processing complex language in assessment Simplifying the language of test items will also help students with disabilities, particularly those with learning disabilities

21/27 CRESST/UCLA Original: A certain reference file contains approximately six billion facts. About how many millions is that? A. 6,000,000 B. 600,000 C. 60,000 D. 6,000 E. 600 Modified: Mack’s company sold six billion pencils. About how many millions is that? A. 6,000,000 B. 600,000 C. 60,000 D. 6,000 E. 600 Example

22/27 CRESST/UCLA Original: The census showed that three hundred fifty-six thousand, ninety-seven people lived in Middletown. Written as a number, that is: A. 350,697 B. 356,097 C. 356,907 D. 356,970 Modified: Janet played a video game. Her score was three hundred fifty-six thousand, ninety-seven. Written as number, that is: A. 350,697 B. 356,097 C. 356,907 D. 356,970 Example

23/27 CRESST/UCLA CRESST Studies on the Assessment and Accommodation of ELL Students: Impact of Language Factors On Assessment of ELLs A Chain of Events Fourteen studies on the assessment and 3 on the instruction (OTL)of ELL students

24/27 CRESST/UCLA Study #1 Analyses of extant data (Abedi, Lord, & Plummer, 1995). Used existing data from NAEP 1992 assessments in math and science. SAMPLE: ELL and non-ELLs in grades 4, 8, and 12 main assessment. NAEP test items were grouped into long and short and linguistically complex/less complex items. Findings ELL students performed significantly lower on the longer test items. ELL students had higher proportions of omitted and/or not- reached items. ELL students had higher scores on the linguistically less-complex items.

25/27 CRESST/UCLA Study #2 Interview study (Abedi, Lord, & Plummer, 1997) 37 students asked to express their preference between the original NAEP items and the linguistically modified version of these same items. Math test items were modified to reduce the level of linguistic complexity. Findings Over 80% interviewed preferred the linguistically modified items over the original version.

26/27 CRESST/UCLA Many students indicated that the language in the revised item was easier: “Well, it makes more sense.” “It explains better.” “Because that one’s more confusing.” “It seems simpler. You get a clear idea of what they want you to do.”

27/27 CRESST/UCLA Study #3 Impact of linguistic factors on students’ performance (Abedi, Lord, & Plummer, 1997). Two studies: testing performance and speed. SAMPLE: 1,031 grade 8 ELL and non-ELL students. 41 classes from 21 southern California schools. Findings ELL students who received a linguistically modified version of the math test items performed significantly better than those receiving the original test items.

28/27 CRESST/UCLA Study #4 The impact of different types of accommodations on students with limited English proficiency (Abedi, Lord, & Hofstetter, 1997) SAMPLE: 1,394 grade 8 students. 56 classes from 27 California schools. Findings Spanish translation of NAEP math test. Spanish-speakers taking the Spanish translation version performed significantly lower than Spanish-speakers taking the English version. We believe that this is due to the impact of language of instruction on assessment. Linguistic Modification Contributed to improved performance on 49% of the items. Extra Time Helped grade 8 ELL students on NAEP math tests. Also aided non-ELL students. Limited potential as an assessment accommodation.

29/27 CRESST/UCLA Study #5 Impact of selected background variables on students’ NAEP math performance (Abedi, Hofstetter, & Lord, 1998). SAMPLE: 946 grade 8 ELL and non-ELL students. 38 classes from 19 southern California schools. Findings Four different accommodations used (linguistically modified, a glossary only, extra time only, and a glossary plus extra time). Language modification of test items was the only accommodation that reduced the performance-gap between ELL and non ELL students

30/27 CRESST/UCLA Study #6 The effects of accommodations on the assessment of LEP students in NAEP (Abedi, Lord, Kim, & Miyoshi, 2000) SAMPLE: 422 grade 8 ELL and non-ELL students. 17 science classes from 9 southern California schools. A customized dictionary was used. Findings Included only non-content words in the test. Customized dictionary easier to use than published dictionary. ELL students showed significant improvement in performance. No impact on the non-ELL performance.

31/27 CRESST/UCLA Study #7 Language accommodation for large-scale assessment in science (Abedi, Courtney, Leon, Mirocha, & Goldberg, 2001). SAMPLE: 612 grades 4 and 8 students. 25 classes from 14 southern California schools. Findings A published dictionary was both ineffective and administratively difficult as an accommodation. Different bilingual dictionaries had different entries, different content, and different format.

32/27 CRESST/UCLA Study #8 Language accommodation for large-scale assessment in science (Abedi, Courtney, & Leon, 2001) SAMPLE: 1,856 grade 4 and 1,512 grade 8 ELL and non-ELL students. 132 classes from 40 school sites in four cities, three states. Findings Results suggested: linguistic modification of test items improved performance of ELLs in grade 8. No change on the performance of non-ELLs with modified test. The validity of assessment was not compromised by the provision of an accommodation.

33/27 CRESST/UCLA Study #9 Impact of students’ language background on content-based performance: analyses of extant data (Abedi & Leon, 1999). Analyses were performed on extant data, such as Stanford 9 and ITBS SAMPLE: Over 900,000 students from four different sites nationwide. Study #10 Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001). Data were analyzed for the language impact on assessment and accommodations of ELL students. SAMPLE: Over 700,000 students from four different sites nationwide. Findings The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). This performance gap was zero in math computation.

34/27 CRESST/UCLA Recent publications summarizing findings of our research on the assessment of ELLs : Abedi, J. and Gandara, P. (2007). Performance of English Language Learners as a Subgroup in Large-Scale Assessment: Interaction of Research and Policy. Educational Measurement: Issues and Practices. Vol. 26, Issue 5, pp Abedi, J. (in press). Utilizing accommodations in the assessment of English language learners. In: Encyclopedia of Language and Education. Heidelberg, Germany: Springer Science+ Business Media. Abedi, J. (2006). Psychometric Issues in the ELL Assessment and Special Education Eligibility. Teacher’s College Record, Vol. 108, No. 11, Abedi, J. (2006). Language Issues in Item-Development. In Downing, S. M. and Haladyna, T. M. Handbook of Test Development (Ed.). New Jersey: Lawrence Erlbaum Associates, Publishers. Abedi, J. (in press). English Language Learners with Disabilities. In Cahlan, C. & Cook, L. Accommodating student with disabilities on state assessments: What works? (Ed.) New Jersey: Educational Testing Service. Abedi, J. (2005). Assessment: Issue and Consequences for English Language Learners. In Herman, J. L. and Haertel, E. H. Uses and Misuses of Data in Accountability Testing (Ed.) Massachusetts: Blackwell Publishing Malden.

35/27 CRESST/UCLA Conclusions and Recommendation Assessment for ELL students: Must be based on a sound psychometric principles Must be controlled for all sources of nuisance or confounding variables Must be free of unnecessary linguistic complexities Must include sufficient number of ELLs in its development process (field testing, standard setting, etc.) Must be free of biases, such as cultural biases Must be sensitive to students’ linguistics and cultural needs