A Balancing Act: Common Items Nonequivalent Groups (CING) Equating Item Selection Tia Sukin Jennifer Dunn Wonsuk Kim Robert Keller July 24, 2009.

Slides:



Advertisements
Similar presentations
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
AzMERIT Arizona’s Statewide Achievement Assessment for English Language Arts and Mathematics November 20, 2014.
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.
STAR 2010 September 10, Agenda New in 2010 Interpreting reports Comparing results Appendixes A-G 2.
Enquiring mines wanna no.... Who is it? Coleman Report “[S]chools bring little influence to bear upon a child’s achievement that is independent of.
Measuring Achievement and Aptitude: Applications for Counseling Session 7.
Clustered or Multilevel Data
Vertical Scale Scores.
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
The Accuracy of Small Sample Equating: An Investigative/ Comparative study of small sample Equating Methods.  Kinge Mbella Liz Burton Rob Keller Nambury.
Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.
Evaluating Teacher Performance Daniel Muijs, University of Southampton.
Does Formative Feedback Help or Hinder Students? An Empirical Investigation 2015 DEE Conference Carlos Cortinhas, University of Exeter.
Fall Testing Update David Abrams Assistant Commissioner for Standards, Assessment, & Reporting Middle Level Liaisons & Support Schools Network November.
STAAR TEST Class of 2015 and beyond. What is STAAR? The State of Texas Assessments of Academic Readiness (STAAR) is the new state assessment for students.
Descriptive Statistics
A Reliability Generalization of the Life Satisfaction Index K. A. Wallace & J. C. Caruso University of Montana Presented at the Annual Meeting of The Gerontological.
COMMON CORE OVERVIEW Welcome. NYS Common Core 5 Strands (Same for Prek-12) (Number Sense, Algebra, Geometry, Measurement, Statistics and Probability)
Linking Disciplinary Literacies to the Common Core State Standards EDC 448 Dr. Julie Coiro.
Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.
The Power of Two: Achievement and Progress. The Achievement Lens Provides a measure of what students know and are able to do relative to the Ohio standards,
CEDAR RIDGE MIDDLE SCHOOL JANUARY 15, 2015 acos2010.wikispaces.com.
Mark DeCandia Kentucky NAEP State Coordinator
7 th Grade: Decreased # of students in Basic by 7% 8 th Grade: Students are moving from BB to BAS scale EOC Literacy: Students are moving from BB to BAS.
NAEP 2011 Mathematics and Reading Results NAEP State Coordinator Mark DeCandia.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Fall 2007 MEAP Reporting 2007 OEAA Conference Jim Griffiths – Manager, Assessment Administration & Reporting Sue Peterman - Department Analyst, MEAP.
Biology EOC Test Taking Skills & Winter Interim Review Biology EOC Test Taking Skills & Winter Interim Review By: Heather Hodson, your Science Coach.
Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.
Jack Buckley Commissioner National Center for Education Statistics February 21, 2013.
Luella High School – AP Exam Result Comparisons: 2014; 2015.
KNR 445 Statistics t-tests Slide 1 Standard Scores Comparing scores across (normal) distributions – “z- scores” 1.
5.3 – Solving Multi-Step Inequalities. *Just like solving equations*
Beginning in the school year, the high school Washington Assessment of Student Learning (WASL)
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 6 The Normal Distribution.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Integrating the CCSS into Classroom Instruction Presentation for the Idaho Colleges of Education Deans Dr. Chuck Zimmerly Director – Intermountain Center.
The Redesigned SAT January 20, About the Redesigned SAT.
Understanding AzMERIT Results and Score Reporting An Overview.
SAT, End of Course, SAT, End of Course, & Advanced Placement RESULTS2014 October 14, 2014 Dr. Rodney Thompson, Superintendent Dr. Kevin O’Gorman, Associate.
Mathematics Initiative Office of Superintendent of Public Instruction Mathematics Initiative Office of Superintendent of Public Instruction CAA Options.
COMMON CORE STANDARDS C OLLEGE - AND C AREER - READINESS S TANDARDS North East Florida Educational ConsortiumFall 2011 F LORIDA ’ S P LAN FOR I MPLEMENTATION.
COMMON CORE STANDARDS C OLLEGE - AND C AREER - READINESS S TANDARDS North East Florida Educational ConsortiumFall 2011 F LORIDA ’ S P LAN FOR I MPLEMENTATION.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Testing 101: Quantitative Approaches to Assessment CTE – November 2, 2005 Noelle Griffin, PhD LMU Office of Assessment at Data Analysis.
INDIVIDUAL STUDENT GROWTH IN READING: 1, 1.5, AND 2 OR MORE YEAR’S GROWTH.
To support efforts to raise student achievement To support the district’s accountability status To offer standardized accountability metrics to complement.
Studying the use of research knowledge in public bureaucracies Mathieu Ouimet, Ph.D. Department of Political Science Faculty of Social Sciences CHUQ Research.
Performance Wisconsin Student Assessment System
Examining Achievement Gaps
CINS Data Presentation
MATH-138 Elementary Statistics
It Begins With How The CAP Tests Were Designed
Preliminary Review of the 2012 Math SOL Results
Assessments for Monitoring and Improving the Quality of Education
Booklet Design and Equating
Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis by Esther Herrmann, Josep Call, María Victoria Hernàndez-Lloreda,
Onur DOĞAN.
2014 Keystone Data.
Quantitative Methods PSY302 Quiz Normal Curve Review February 6, 2017
1 2 3 Welcome! ACT Updates PSAT & SAT Overhaul The Transition Year
Investigations into Comparability for the PARCC Assessments
JACKSON SCHOOL DISTRICT Middle School Math Informational Night
Descriptive statistics for groups:
Presentation transcript:

A Balancing Act: Common Items Nonequivalent Groups (CING) Equating Item Selection Tia Sukin Jennifer Dunn Wonsuk Kim Robert Keller July 24, 2009

Background Equating using a CING design requires the creation of an anchor set Angoff (1968) developed guidelines for developing the anchor set  Length: 20% of operational test (OT) or 20 items  Content: Proportionate to OT by strand  Statistical Properties: Same mean / S.D.  Contextual Effects: Same locations, formats, key, etc.

Background Majority of the research provides support for these guidelines (e.g., Vale et al., 1981; Klein & Jarjoura, 1985; Kingston & Dorans, 1984) Research has included robustness studies (e.g., Wingersky & Lord, 1984; Beguin, 2002; Sinharay & Holland, 2007)

Background Most research has used placement (e.g., AP), admissions (e.g., SAT), and military (e.g., ASVAB) exams for empirical and informed simulation studies Research using statewide accountability exams is limited (e.g., Haertel, 2004; Michaelides & Haertel, 2004)

Background General Science tests are administered in all states for all grade levels except:  19 states offer EOC Science exams in H.S.  10 offer more than one EOC Science exam  5 offer more than two

Research Questions Do the long-established guidelines for maintaining content representation (i.e., proportion by number) hold in creating an anchor set across all major subject areas (i.e., Mathematics, Reading, Science) ? Are there significant changes between expected raw scores and proficiency classification when different methods for maintaining content representation are used?

Design 3 Subjects (2 States, 3 Grades)  Math  Reading  Science 5 Methods of Anchor Set Construction  Operational  Proportion by Number of Items/Strand  G Theory  ICCs  Construct Underrepresentation

Variance Calculation – G Theory Multivariate Design  p x i with content strand as a fixed facet Multivariate Benefit  Covariance components are calculated for every pair of strands Item Variance Component

Variance Calculation – ICC Use the median P(θ) as the average in calculating within strand variability P(θ) θ

Equating Item Selection Example:

Equating Item Selection Percentage of strands that differ by more than one item between selection methods (excluding the construct underrepresentation method) :  Math: 13%  Reading: 52%  Science: 20%

Example Results – Scoring Category Distributions

Discussion Equating is highly robust to the selection process used for creating anchor sets EXCEPT  Choosing equating items from 1-2 strands is discouraged  More caution may be needed with Science  Item selection mattered for 22% of the conditions  2/18 for Math: Both were the under rep. condition  3/18 for Reading: All were the under rep. condition  7/18 for Science: 2 under rep. / 5 ICC and G Content balance is important and can be conceptualized in different ways without impacting the equating

Future Study A simulation study is needed so that raw score and proficiency categorizations using the different item selection methods can be compared to truth Meta-analysis detailing published & unpublished studies that provide evidence for or against the robustness of CING equating designs

Thank you