Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam Pamela.

Slides:



Advertisements
Similar presentations
Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman.
Advertisements

Knowledge Dietary Managers Association 1 PART II - DMA Certification Exam Blueprint and Exam Development-
Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,
Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved
Spiros Papageorgiou University of Michigan
M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.
VALIDITY AND RELIABILITY
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Experimental Research Designs
1 New England Common Assessment Program (NECAP) Setting Performance Standards.
Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.
Jeff Beard Lisa Helma David Parrish Start Presentation.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Chapter 4 Validity.
Setting Alternate Achievement Standards Prepared by Sue Rigney U.S. Department of Education NCEO Teleconference March 21, 2005.
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
Correlational Designs
Wastewater Treatment Plant Operator Exam Setting Performance Standards With The Modified Angoff Procedure.
Measurement and Data Quality
Scales and Indices While trying to capture the complexity of a phenomenon We try to seek multiple indicators, regardless of the methodology we use: Qualitative.
Standard Setting Methods with High Stakes Assessments Barbara S. Plake Buros Center for Testing University of Nebraska.
Establishing MME and MEAP Cut Scores Consistent with College and Career Readiness A study conducted by the Michigan Department of Education (MDE) and ACT,
Overview of Standard Setting Leslie Wilson Assistant State Superintendent Accountability and Assessment August 26, 2008.
Setting Performance Standards for the Hawaii State Alternate Assessments: Reading, Mathematics, and Science Presentation for the Hawaii State Board of.
The Genetics Concept Assessment: a new concept inventory for genetics Michelle K. Smith, William B. Wood, and Jennifer K. Knight Science Education Initiative.
1 New England Common Assessment Program (NECAP) Setting Performance Standards.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Assessment tool OSCE AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2008 Department of Kinesiology.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.
 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2010 Department of Kinesiology.
Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.
Examination of Public Perceptions of Four Types of Child Sexual Abuse Prevention Programs Brandon Kopp Raymond Miltenberger.
SGP Logic Model Provides a method that enables SGP data to contribute to performance evaluation when data are missing. The distribution of SGP data combined.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
Assessment Developing an Assessment. Assessment Planning Process Analyze the environment Agency, clients, TR program, staff & resources Define parameters.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
 Research Design Part 2 Variability, Validity, Reliability.
Scales and Indices While trying to capture the complexity of a phenomenon We try to seek multiple indicators, regardless of the methodology we use: Qualitative.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
1 BUILDING QUALITY LEARNING USING PERIODIC ASSESSMENTS Session Outcomes: Use diagnostic Periodic Assessments as instructional tools for quality enhancement.
Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,
NAEP Achievement Levels Michael Ward, Chair of COSDAM Susan Loomis, Assistant Director NAGB Christina Peterson, Project Director ACT.
Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.
A Visual How-To Guide (Best viewed in Slide Show Mode)
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
INTRODUCTION TO ASSESSMENT METHODS USED IN MEDICAL EDUCATION AND THEIR RATIONALE.
Presentation to the Nevada Council to Establish Academic Standards Proposed Math I and Math II End of Course Cut Scores December 22, 2015 Carson City,
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 16: reliability and validity.
How the CAP Science and Social Studies Tests Measure Student Growth.
An Introduction to Measurement and Evaluation
It Begins With How The CAP Tests Were Designed
CLEAR 2011 Annual Educational Conference
Assessments for Monitoring and Improving the Quality of Education
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
The All-important Placement Cut Scores
UMDNJ-New Jersey Medical School
RELATING NATIONAL EXTERNAL EXAMINATIONS IN SLOVENIA TO THE CEFR LEVELS
Making Sense of Advanced Statistical Procedures in Research Articles
Matt Drown The Effects of Immediate Forewarning of Test Difficulty on Test Performance Charles J. Weber Eastern Illinois University George Y. Bizer Union.
Standard Setting for NGSS
Standard Setting Zagreb, July 2009.
Deanna L. Morgan The College Board
Presentation transcript:

Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan, Barbara S. Plake, and Rosemary. Reshetar psychology and applied statistics lhard.html

Contents 2. Multiple Yes-No (MYN) method 1. Many-Faceted Rasch Model 4. Results and Conclusion 3. Instrument 2

introduction  Standard setting ‘‘... standard setting refers to the process of establishing one or more cut scores on examinations. The cut scores divide the distribution of examinees’ test performances in two or more categories’’ Cizek and Bunch (2007)  criteria  The criteria for evaluating panelist judgments  Procedural validity : implementation issues and documentation  Internal validity : interpanelist and intrapanelist consistency  External validity : comparisons with other methods 3

Many-Faceted Rasch Model  n: panelist  k: a standard setting modified Angoff rating  i :item ; j : round   n is the judged severity for panelist n,   i is the average judged item difficulty for item i,   j is the judged average performance level for round j   jk is the cut score, or threshold coefficient, from round j for standard setting ratings of k. ( rating k relative to k − 1 ) 4

Rating quality indices  Rating quality indices  (a) panelist severity/leniency measures: separation statistics/chi-square statistic  (b) model–data fit : Outfit MSE  (c) the creation of a visual display for comparing panelist judgments on the latent variable 5

Multiple Yes-No (MYN) method  MYN requires panelists to consider the borderline examinee at each cut score and to identify at which level the borderline examinee would be able to answer each item correctly.  panelists considered each item and decided whether or not the borderline examinee in each category would be able to identify the correct answer 6

PLDs 7

 Would a borderline-1/2 student be able to answer this item correctly? If yes, then the panelist would circle the 1/2 cut score on the rating form and move on to the next item.  If no, then the panelists would consider the next question about the same item: Would a borderline-2/3 student be able to answer this item correctly?  If yes, the 2/3 cut score would be circled for that item and the panelist would move on to the next item. If no, the panelist would consider the next question about the same item: Would a borderline-3/4 student be able to answer this item correctly?  If yes, the 3/4 cut score would be circled for that item and the panelist would move on to the next item.  If no, then the panelists would consider the next question about the same item: Would a borderline-4/5 student be able to answer this item correctly? If yes, the 4/5 cut score would be circled for that item and the panelist would move on to the next item. If no, then the panelist would consider the final question about the same item: Would the above borderline-5 student be able to answer this item correctly?  If yes (which is likely given that all other possible borderline students have been considered), then the Above 5 score would be circled for that item. 8

9

instuments  The Advanced Placement (AP) program (Advanced Placement Environmental Science (APES) examination) is composed of 34 courses and corresponding examinations in 22 subject areas.  Data used in this study come from the 2011 administration of the APES exam and the standard setting for this examination.  100 MC items and four CR items  Data used in this study are the ratings that resulted from two rounds of item-level judgments provided by the 15 APES panelists 10

Research Purpose  the MFR model is used to evaluate the quality of judgments on MC items provided by panelists who participated in a modified Angoff standard setting that used the MYN method for MC items, the 2011 APES exam.  panelist characteristics (gender and level of teaching) are incorporated into the MFR model to determine whether or not these are explanatory variables that account for differences in panelist ratings 11

Results and conclusion 12

Results and conclusion P397 13

P MSE[ ]

P400 15

P401 16

17 P402

18 P404

P405 19

P405 20

Future study  additional explanatory variables  additional statistical models  MFR model+other modified Angoff procedures, or Bookmark procedures  overall contribution of each facet  CR questions 21

 Thus the interaction between theta and omega should be considered in Equation 1  different rating scale structure and a random effect approach  Through the PC power, let panelists use computer to do standard setting. We can record the time spent  transform cut score of each category to the expected scores 22