C R E S S T / U C L A Psychometric Issues in the Assessment of English Language Learners Presented at the: CRESST 2002 Annual Conference Research Goes.

Slides:



Advertisements
Similar presentations
National Accessible Reading Assessment Projects Examining Background Variables of Students with Disabilities that Affect Reading Jamal Abedi, CRESST/University.
Advertisements

Principal Investigators: Martha Thurlow & Deborah Dillon Introduction Assumptions & Research Questions Acknowledgments 1. What characteristics of current.
Fairness in Testing: Introduction Suzanne Lane University of Pittsburgh Member, Management Committee for the JC on Revision of the 1999 Testing Standards.
Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
VALIDITY AND RELIABILITY
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
ELL-Language-based Accommodations for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University of California, Davis.
Jamal Abedi National Center for Research on Evaluation, Standards, and Student Testing UCLA Graduate School of Education & Information Studies November.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
C R E S S T / U C L A Issues and problems in classification of students with limited English proficiency Jamal Abedi UCLA Graduate School of Education.
VALIDITY.
Measuring Achievement and Aptitude: Applications for Counseling Session 7.
Reliability and Validity
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Are Accommodations Used for ELL Students Valid? Jamal Abedi University of California, Davis National Center for Research on Evaluation, Standards and Student.
Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.
1/16 CRESST/UCLA Alternative Assessment for English Language Learners Christy Kim Boscardin Barbara Jones Shannon Madsen Claire Nishimura Jae-Eun Park.
Jamal Abedi University of California, Davis/CRESST Presented at: The Race to the Top Assessment Program Public & Expert Input Meeting December 2, 2009.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Understanding Validity for Teachers
Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
The University of Central Florida Cocoa Campus
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
National Accessible Reading Assessment Projects Research on Making Large-Scale Reading Assessments More Accessible for Students with Disabilities June.
Texas Comprehensive SEDL Austin, Texas March 16–17, 2009 Making Consistent Decisions About Accommodations for English Language Learners – Research.
CRESST ONR/NETC Meetings, July 2003, v1 1 ONR Advanced Distributed Learning Language Factors in the Assessment of English Language Learners Jamal.
C R E S S T / U C L A Impact of Linguistic Factors in Content-Based Assessment for ELL Students Jamal Abedi UCLA Graduate School of Education & Information.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
1/27 CRESST/UCLA Research findings on the impact of language factors on the assessment and instruction of English language Learners Jamal Abedi University.
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Assessing the Quality of Research
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
C R E S S T / U C L A UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation,
Do we have enough evidence on the validity of accommodations to justify the reporting of accommodated assessments? Jamal Abedi University of California,
Chapter 16 Data Analysis: Testing for Associations.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Alternate Proficiency Assessment Erin Lichtenwalner.
Variables It is very important in research to see variables, define them, and control or measure them.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Jamal Abedi CRESST/University of California,Los Angeles Paper presented at 34 th Annual Conference on Large-Scale Assessment Boston, June 20-23, 2004.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Jamal Abedi, UCLA/CRESST Major psychometric issues Research design issues How to address these issues Universal Design for Assessment: Theoretical Foundation.
Critical Issues Related to ELL Accommodations Designed for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University.
C R E S S T / U C L A UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation,
How to avoid misclassification of English Language Learners with Disabilities Presented at: Supporting English Language Learners with Disability Symposium.
NCLB Assessment and Accountability Provisions: Issues for English-language Learners Diane August Center for Applied Linguistics.
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
BY MADELINE GELMETTI INCLUDING STUDENTS WITH DISABILITIES AND ENGLISH LEARNERS IN MEASURES OF EDUCATOR EFFECTIVENESS.
ELL-Focused Accommodations for Content Area Assessments: An Introduction The University of Central Florida Cocoa Campus Jamal Abedi University of California,
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Reliability & Validity
پرسشنامه کارگاه.
Presentation transcript:

C R E S S T / U C L A Psychometric Issues in the Assessment of English Language Learners Presented at the: CRESST 2002 Annual Conference Research Goes to School Assessment, Accountability, and Improvement Jamal Abedi UCLA Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing (CRESST) September 10-11, 2002

C R E S S T / U C L A Measurement /Psychometric Theory Do the same underlying measurement theories used for the mainstream assessment apply equally to English language learners? Yes  No  Do the psychometric textbooks have enough coverage of issues concerning measurement of ELLs? Yes  No  Are there specific measurement issues that are unique to the assessment of ELLs? Yes  No  Can the low performance of ELLs in content-based areas be explained mainly by the lack of their content knowledge? Yes  No  Are there any extraneous variables that could specifically impact the performance of ELLs? Yes  No  1

C R E S S T / U C L A Psychometric Methods Development and application of modern mental measures by Steven J. Osterlind, University of Missouri Chapter 3. Classical Measurement Theory Supposed a sample population of examinees is comprised of individuals from two different cultures, in one culture dogs are considered as close family members and in other culture, dogs are considered non-family and meant for work. Now suppose that some of the reading test questions incidentally describe the treatment of dogs. Remember, this is a test of one’s reading ability, not a test about dogs. Abedi, J., & Lord, C. (2001). The Language Factor in Mathematics Tests. Applied Measurement in Education, 14(3),

C R E S S T / U C L A Classical Test Theory: Reliability  2 X =  2 T +  2 E X: Observed Score T: True Score E: Error Score  XX’=  2 T /  2 X  XX’= 1-  2 E /  2 X Textbook examples of possible sources that contribute to the measurement error: Rater Occasion Item Test Form 3

C R E S S T / U C L A Assumptions of Classical True-Score Theory 4 1. X = T + E 2.  (x) = T 3.  ET = 0 4.  E 1 E 2 = 0 5.  E 1 T 2 = 0

C R E S S T / U C L A Generalizability Theory: Partitioning Error Variance into Its Components  2 (X pro ) =  2 p +  2 r +  2 o +  2 pr +  2 po +  2 ro +  2 pro,e p: Person r: Rater o: Occasion Are there any sources of measurement error that may specifically influence ELL performance? 5 There may be other sources such as: test forms, test instructions, item difficulty, and test-taking skills.

C R E S S T / U C L A Validity of Academic Achievement Measures We will focus on construct and content validity approaches: A test’s content validity involves the careful definition of the domain of behaviors to be measured by a test and the logical design of items to cover all the important areas of this domain (Allen & Yen, 1979, p. 96). A test’s construct validity is the degree to which it measures the theoretical construct or trait that it was designed to measure (Allen & Yen, 1979, p. 108). Examples: A content-based achievement test has construct validity if it measures the content that it is supposed to measure. A content-based achievement test has content validity if the test content is representative of the content being measured. 6

C R E S S T / U C L A Study #2 Interview study (Abedi, Lord, & Plummer, 1997). 37 students asked to express their preference between the original NAEP items and the linguistically modified version of these same items. Math test items were modified to reduce the level of linguistic complexity. Finding l Over 80% interviewed preferred the linguistically modified items over the original version. 7

C R E S S T / U C L A 8 Study #3 Impact of linguistic factors on students’ performance ( Abedi, Lord, & Plummer, 1997). Two studies: test performance and speed. SAMPLE: 1,031 grade 8 ELL and non-ELL students. 41 classes from 21 southern California schools. Finding l ELL students who received a linguistically modified version of the math test items performed significantly better than those receiving the original test items.

C R E S S T / U C L A Study #4 The impact of different types of accommodations on students with limited English proficiency (Abedi, Lord, & Hofstetter, 1997). SAMPLE: 1,394 grade 8 students. 56 classes from 27 southern California schools. 9 Finding Spanish translation of NAEP math test. l Spanish-speakers taking the Spanish translation version performed significantly lower than Spanish-speakers taking the English version. l We believe that this is due to the impact of language of instruction on assessment. Linguistic Modification l Contributed to improved performance on 49% of the items. Extra Time l Helped grade 8 ELL students on NAEP math tests. l Also aided non-ELL students. Limited potential as an assessment accommodation.

C R E S S T / U C L A Study #5 Impact of selected background variables on students’ NAEP math performance. (Abedi, Hofstetter, & Lord, 1998). SAMPLE: 946 grade 8 ELL and non-ELL students. 38 classes from 19 southern California schools. 10 Finding Four different accommodations used (linguistically modified, a glossary only, extra time only, and a glossary plus extra time). The glossary plus extra time was the most effective accommodation. Glossary plus extra time accommodation l Non-ELLs showed a greater improvement (16%) than the ELLs (13%). l This is the opposite of what is expected and casts doubt on the validity of this accommodation.

C R E S S T / U C L A 11 Study #8 Language accommodation for large-scale assessment in science (Abedi, Courtney, & Leon, 2001). SAMPLE: 1,856 grade 4 and 1,512 grade 8 ELL and non-ELL students. 132 classes from 40 school sites in four cities, three states. Finding l Results suggested: linguistic modification of test items improved performance of ELLs in grade 8. l No change on the performance of non-ELLs with modified test. l The validity of assessment was not compromised by the provision of an accommodation.

C R E S S T / U C L A Study #9 Impact of students’ language background on content-based performance: analyses of extant data (Abedi & Leon, 1999). Analyses were performed on extant data, such as Stanford 9 and ITBS SAMPLE: Over 900,000 students from four different sites nationwide. 12 Study #10 Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001). Data were analyzed for the language impact on assessment and accommodations of ELL students. SAMPLE: Over 700,000 students from four different sites nationwide. Finding l The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. l Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). l This performance gap was reduced to zero in math computation.

C R E S S T / U C L A Normal Curve Equivalent Means and Standard Deviations for Students in Grades 10 and 11, Site 3 School District Reading Science Math MSD M SD M SD Grade 10 SD only LEP only LEP & SD Non-LEP & SD All students Grade 11 SD Only LEP Only LEP & SD Non-LEP & SD All students Note. LEP = limited English proficient. SD = students with disabilities. 13

C R E S S T / U C L A Disparity Index (DI) was an index of performance differences between LEP and non-LEP. Site 3 Disparity Index (DI) Non-LEP/Non-SD Students Compared to LEP-Only Students Disparity Index (DI) Math Math Grade Reading Math Total Calculation Analytical

C R E S S T / U C L A 15

C R E S S T / U C L A 16

C R E S S T / U C L A Generalizability Theory: Language as an additional source of measurement error  2 (X prl ) =  2 p +  2 r +  2 l +  2 pr +  2 pl +  2 rl +  2 prl,e p: Person r: Rater l: Language Are there any sources of measurement error that may specifically influence ELL performance? 17

C R E S S T / U C L A Main effect language factors  2 l ( Different level of English/Native language proficiency)  Interactions of language factors with other factors è  2 pl (Different level of English/Native language proficiency) è  2 rl (Differential treatment of ELL students by raters with different background)   2 prl,e (A combination of different level of language proficiency, interaction of rater with language and subjects, and unspecified sources of measurement error) 18

C R E S S T / U C L A Issues and problems in classification of students with limited English proficiency 19

C R E S S T / U C L A Correlation between LAS rating and LEP classification for Site 4 Correlation G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 Pearson r Sig (2-tailed) N Findings The relationship between language proficiency test scores and LEP classification. Since LEP classification is based on students’ level of language proficiency and because LAS is a measure of language proficiency, one would expect to find a perfect correlation between LAS scores and LEP levels (LEP versus non-LEP). The results of analyses indicated a weak relationship between language proficiency test scores and language classification codes (LEP categories).

C R E S S T / U C L A Correlation coefficients between LEP classification code and ITBS subscales for Site 1 Grade Reading Math Concept Math Problem Math & Estimation Solving Computation Grade 3 Pearson r Sig (2-tailed) N 36,006 35,981 35,948 36,000 Grade 6 Pearson r Sig (2-tailed) N 28,272 28,273 28,250 28,261 Grade 8 Pearson r Sig (2-tailed) N 25,362 25,336 25,333 25,342 21