Aligned to Common Core State Standards

Slides:



Advertisements
Similar presentations
Implications and Extensions of Rasch Measurement.
Advertisements

1 K-2 Smarter Balanced Assessment Update English Language Arts February 2012.
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
VALIDITY AND RELIABILITY
SBAC Alignment Study Los Angeles, March 2014 Paula Torres Leanne Leonard, Ed.D.
+ A New Stopping Rule for Computerized Adaptive Testing.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
 Here’s What... › The State Board of Education has adopted the Common Core State Standards (July 2010)  So what... › Implications and Impact in NH ›
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
A comparison of exposure control procedures in CATs using the 3PL model.
Presented by Julie Joseph Charlene Stringham Diana Ruiz February 17, 2011.
Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June.
Math Learning Progression
NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,
Smarter Balanced Assessment Consortium A Peek at the Assessment System 1 Rachel Eifler January 30, 2014.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
1 6-8 Smarter Balanced Assessment Update English Language Arts February 2012.
Smarter Balanced Assessments Parent Information. Practice Problems GO!
Introduction ELA Selected Themes Assessment targets Progress Map ENGLISH LANGUAGE ARTS LEARNING PROGRESSION.
Liru Zhang, Delaware DOE Shudong Wang, NWEA Presented at the 2015 NCSA Annual Conference, San Diego, CA 1.
Eileen Boyce Toni Tessier Waterford Public Schools Literacy Specialists.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2010 Department of Kinesiology.
The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.
Smarter Balanced Assessment Update English Language Arts February 2012.
The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium CCSSO NCSA San Diego CA.
Destination--- Common Core Staff Meeting/SSC February 2013.
Measuring Human Intelligence with Artificial Intelligence Adaptive Item Generation Sangyoon Yi Susan E. Embretson.
1 An Investigation of The Response Time for Maths Items in A Computer Adaptive Test C. Wheadon & Q. He, CEM CENTRE, DURHAM UNIVERSITY, UK Chris Wheadon.
Oregon Department of Education Office of Assessment and Accountability Jim Leigh and Rachel Aazzerah Mathematics and Science Assessment Specialists Office.
Based on Common Core.  South Carolina will discontinue PASS testing in the year This will be a bridge year for common core and state standards.
Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1.
Oxford Preparatory Academy Scholar Academy Parent Social Topic: Changes in State Testing May 4, 5, and 6, 2015.
ASSOCIATION OF WASHINGTON MIDDLE LEVEL PRINCIPALS WINTER MEETING -- JANUARY 24, 2015 Leveraging the SBAC System to Support Effective Assessment Practices.
Building the NCSC Summative Assessment: Towards a Stage- Adaptive Design Sarah Hagge, Ph.D., and Anne Davidson, Ed.D. McGraw-Hill Education CTB CCSSO New.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
Practical Issues in Computerized Testing: A State Perspective Patricia Reiss, Ph.D Hawaii Department of Education.
Understanding the 2015 Smarter Balanced Assessment Results Assessment Services.
The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.
Smarter Balanced Assessments Parent Information. Smarter Balanced Assessment 0 Aligned with Nevada Academic Content Standards, which were developed from.
Assessment in Common Core. Essential Questions How is CAASPP different than STAR? How is SBAC different than CST? What do students have to know and be.
Smarter Balanced Scores & Reports. The new assessment, Smarter Balanced, replaces our previous statewide assessment, the New England Common Assessment.
Update on State Assessment and CCSS Presentation to the West Hartford Parent Teacher Council.
Daniel Muijs Saad Chahine
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Statistical Estimation
Principles of Language Assessment
Smarter Balanced Assessment Results
Test Standardization: From Design to Concurrent Validation
Language Arts Assessment Update
Shudong Wang NWEA Liru Zhang Delaware Department of Education
Item pool optimization for adaptive testing
Auditing & Investigations I
Common Core Update May 15, 2013.
Considerations of Content Alignment in CAT
9-12 Smarter Balanced Assessment Update
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Shudong Wang, NWEA Liru Zhang, Delaware DOE G. Gage Kingsbury, NWEA
CAASPP Results 2015 to 2016 Santa Clara Assessment and Accountability Network May 26, 2017 Eric E, Zilbert Administrator, Psychometrics, Evaluation.
Innovative Approaches for Examining Alignment
K-2 Smarter Balanced Assessment Update
Perspectives on Equating: Considerations for Alternate Assessments
Presentation transcript:

Aligned to Common Core State Standards Effects of Item Bank Design and Item Selection Methods on Content Balance and Efficiency in Computerized Adaptive Reading Tests with Mix of Short and Long Passages Aligned to Common Core State Standards Shudong Wang Northwest Evaluation Association   Liru Zhang Delaware Department of Education Paper presented at the NCSA National Conference on Student Assessment June 22-25, 2015, San Diego, California.

I. Introduction Common Core State Standards (CCSS) includes a set of consistent learning goals Any assessment that claims measuring the CCSS has to provide evidences to prove it measure the CCSS The Smarter Balanced Assessment Consortium (SBAC) program has adopted a Cognitive Rigor Matrix (CRM) The CRM includes content standards (Sub-contents) and cognitive complexity (Depth-of-Knowledge Levels, DOK ) Assessments aligned to CCSS (Such as SBAC) could capitalize on the efficiency of the computerized adaptive testing Test efficiency is defined as the mean number of items required to reach the certain reliability level Reading Comprehension (RC) as part of English Language Arts and Literacy specified by the CCSS is one of mandatory accountability measures A short passage could associate with one item (individual item). Along passage usually associate with a group of items (testlet) The constraints that are used to select RC items in CAT have two levels Passage (testlet) level: Topic, number of words, complexity measured by Lexile, etc. Individual item level: Item type, content standards (sub-content), cognitive complexity (DOK), etc. One of major issue in using CAT to measure RC is the feasibility of constraints on both levels and the impact on the test The danger of exhausting the item bank before certain test attributes can be satisfied always exists and could lead to less optimal measurement precision.

The CAT design can vary from adapting at individual item level as classical CAT to adapting at group (testlet) of items level as computer-adaptive multi-stage testing (ca-MST, Luecht & Nungester, 1998; Luecht & Sireci, 2011).

The major purpose of this study is to investigate the effects of item selection method and different pool distributions on the content validity and efficiency of mixed CAT and ca-MST(i-CAT) design.

Independent Variables II. Method Design of Study: One balanced factorial experiment design with 100 replications Table 1. Plan of Design Dependent Variables Independent Variables Conditional Overall Distribution of Item Bank (B) Testlet Selection Method (T) Individual Item Selection Method (I) Bias 1, 2, 3, 4 1 (Yes), 2 (No) 1 (IM), 2 (ICM), 3 (ICDM) SE RMSE Conditional Indices Overall Indices Where T-true theta, N-number of examinee, R-replication

Item Bank % of DOK Level % of Content Area rb-DOK 1 2 3 4 0.2 0.5 0.3 IRT Model: Rasch (testlet effect was ignored) Item Banks and Test There total 2380 items in each of 4 banks 900 discrete items and 200 passage (testlets) items and # within testlet items from minimum 5 to maximum 10 Table 2. Characteristics of Item Bank and Targeted Test Property Ability Estimation Method Owen Bayesian estimation (OBE, Divgi, 1986) provides provisional ability estimation to select items Maximum Likelihood estimation (MLE) provides final scores. Method for Selecting the First Item Item difficulty value is 0.5 logit lower than examinee’s true ability from N(0,1) Item Bank % of DOK Level   % of Content Area rb-DOK 1 2 3 4 0.2 0.5 0.3 0.25 0.6 0.4 0.1 Test Property

Item Selection Methods Testlet Level (Testlet Selection Method) 1. |Theta Estimate – Median of Testlet Items Difficulty|<0.3 or 0.5 logit (Yes) 2. |Theta Estimate – Median of Testlet Items Difficulty|<5.0 logit (No) Item Level (Individual Item Selection Method) Item information (IM) IM + sub-content balance (ICM) ICM + DOK balance (ICDM) Each of methods (combination of testlet and individual item) has the same three tiers of procedure Tier 1. Selecting a group items (such as 5 to 15 items) from item bank based on criteria level one ( Inf > = 0.235/subc-ontent/DOK) if none item can be found, then go to tier 2. Tier 2. Selecting a group items from item bank based on criteria level two (Inf > = 0.216/sub-content/DOK), if none item can be found, then go to tier 3. Tier 3. Selecting a group items from item bank that have the best (by sorting) values of criteria in Tier 2.

III. Results and Discussion 1. Overall Accuracy Among three independent variables (item bank, testlet selection, item selection), only testlet selection and item selection methods have statistically significant impacts on bias, SE, and RMSE.

2. Conditional Accuracy (Item Bank 1 only)

3. Sub-Content (Item Bank 2 only)

4. DOK (Item Banks 1 and 3 only)

5. Item Exposure Rate at Eight True Theta B=1, M=1, T=1 B=1, M=1, T=2

III. Conclusions In general, regardless of the distribution of item bank, Individual item selection has statistically significant impact on the accuracy of test. The IM method has the highest accuracy comparing to ICM and ICDM. Testlet selection has statistically significant impact on the accuracy of test. Not using testlet constraint will decrease accuracy of the test. In general, both distribution of item bank and individual item selection method have direct impact on the test content balance. Across different item banks, the test with attribution that is controlled by the item selection method will have better content quality. The ICM improves sub-content quality and the ICDM improves the DOK quality in test. For limited size of item bank, the number of item selection criterion should be kept as minimum as possible. Item exposure rate for testlet items is more difficulty to control than that for discrete items. Mixing discrete and testlet items produces reasonable results.

Thank you ! For any question: Shudong.wang@NWEA.org