A District-initiated Appraisal of a State Assessments Instructional Sensitivity HOLDING ACCOUNTABILITY TESTS ACCOUNTABLE Stephen C. Court Presented in.

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
Stephen C. Court Presented at
NYC Teacher Data Initiative: An introduction for Teachers ESO Focus on Professional Development December 2008.
NYC Teacher Data Initiative: An introduction for Principals ESO Focus on Professional Development October 2008.
PDAs Accept Context-Free Languages
The Practice of Statistics
Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman.
EuroCondens SGB E.
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
PT3 Join Together: EFFECTIVE RECRUITING SURVEY Aggregated Results Developed by Melissa DeLana, 2006.
1 Changing Profile of Household Sector Credit and Deposits in Indian Banking System -Deepak Mathur November 30, 2010.
Status of Institutional Criteria and Indicators Presented by Tom Roberts Bureau of Land Management National Science and Technology Center Denver, Colorado.
1 September North Lee Street, Suite 400 · Alexandria, Virginia · · FAX Public Opinion on Coverage for the Uninsured.
AYP Changes for 2007 K-20 Videoconference June 11, 2007 Presented by: JoLynn Berge OSPI Federal Policy Coordinator.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
New England Pension Consultants. 1 Table of Contents > Market Environment > Asset Allocation / Investment Policy Targets > Performance Summary > Performance.
1 1  1 =.
CHAPTER 18 The Ankle and Lower Leg
SPRING CREEK ELEMENTARY Title I For additional information contact the school at
1 FAA Florida Alternate Assessment Karen Schafer Office of Testing and Accountability Brevard Public Schools.
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The SCPS Professional Growth System
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
Item Analysis.
Briana B. Morrison Adapted from William Collins
Regression with Panel Data
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Faster IS Better: Accelerating to Success Kay Teague And Michael Warren.
Data Analysis 53 Given the two histograms below, which of the following statements are true?
NEW JERSEY DEPARTMENT OF LABOR PROJECTIONS SYSTEM Industry Projections Occupational Projections Population Projections Labor Force Projections Labor Force.
Beyond the Red and Green “The Rest of the Story!”.
Biology 2 Plant Kingdom Identification Test Review.
Lecture 3 Validity of screening and diagnostic tests
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
When you see… Find the zeros You think….
Midterm Review Part II Midterm Review Part II 40.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
OPSB & RSD LEAP/GEE Scores in Context Cowen Institute for Public Education Initiatives Tulane University May 2008.
1  Janet Hensley  Pam Lange  Barb Rowenhorst Meade School District.
Public choice in the mix of electric power generation Climate and Energy Decision-Making Center Annual Meeting May 21, 2012 Lauren A. Fleishman, RAND Corporation*
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Let’s take a 15 minute break Please be back on time.
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Compass: Module 3 Student Growth.
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
1 Maine’s Impact Study of Technology in Mathematics (MISTM) David L. Silvernail, Director Maine Education Policy Research Institute University of Southern.
SMART GOALS APS TEACHER EVALUATION. AGENDA Purpose Balancing Realism and Rigor Progress Based Goals Three Types of Goals Avoiding Averages Goal.
9. Two Functions of Two Random Variables
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Presented to: By: Date: Federal Aviation Administration FAA Safety Team FAASafety.gov AMT Awards Program Sun ‘n Fun Bryan Neville, FAASTeam April 21, 2009.
Evaluating the Vermont Mathematics Initiative (VMI) in a Value Added Context H. ‘Bud’ Meyers, Ph.D. College of Education and Social Services University.
Stephen C. Court Educational Research and Evaluation, LLC A Presentation at the First International Conference on Instructional Sensitivity Achievement.
Using statistics to evaluate your test Gerard Seinhorst
Presentation transcript:

A District-initiated Appraisal of a State Assessments Instructional Sensitivity HOLDING ACCOUNTABILITY TESTS ACCOUNTABLE Stephen C. Court Presented in Symposium American Educational Research Association (AERA) Annual Meeting May 2, 2010 Denver, Colorado

Accountability Basic premise: Teaching Learning Proficiency High proficiency rates = Good schools Low proficiency rates = Bad schools

Accountability Basic Assumption State assessments distinguish well- taught students from not so well-taught students with enough accuracy to support accountability decisions.

Accountability Q: Is the assumption warranted? A: Only if the tests are instructionally sensitive. When tests are insensitive, accountability decisions are based on the wrong things – e.g., SES.

Kansas: SES

Kansas: Test Scores

Kansas: Exemplary by SES

The Situation in Kansas Basic Question Can the instruction in low-poverty districts truly be that much better than the instruction in high-poverty districts? Or, do instructionally-irrelevant factors (such as SES) distort or mask the effects of instruction?

Multi-district Study Purpose –To compare instructional sensitivity appraisal models and methods –To appraise the instructional sensitivity of the Kansas state assessments District-initiated because no state-level study had been initiated –Indicator-level analysis –Loss/gain because no indicator-level cut scores Based initially on empirical approach recommended by Popham (2008)

Tactical Variations A variety of practical constraints and preliminary findings raised several conceptual and methodological issues. The original design underwent several revisions. Several tactical variations involving –data collection –data array, analysis, and interpretation

Tactical Variations See the paper for details… discusses the issues and design revisions provides exegesis of item-selection criteria and test- construction that yield instructional insensitivity describes, demonstrates, and compares the tactical variations employed in the collection, array, and analysis of the data, as well as in the interpretation of the results Due to time constraints, lets focus just on the juiciest jewels…

Study Participants Study Participants 575 teachers responded –320 teachers (grades 3-5 reading and math) –129 reading teachers (grades 6-8) –126 math teachers (grades 6-8) 14,000 students Only Grade 5 reading included in this study. To be reported in June at CCSSO in Detroit: –other reading results (grades 3-8) –all math results (grades 3-8)

A Gold Standard By recommending that teachers be asked to identify their best-taught indicators, Popham (2008) transformed the instructional sensitivity issue in a fundamental way – both conceptually and operationally: For the first time since IS inquiries began about 40 years ago, there now could be a gold standard independent of the test itself – a huge breakthrough!

Old and New Model A = Non-Learning B = Learning C = Slip D = Maintain A = True Fail B = False Pass = II-E C = False Fail = II-D D = True Pass

Initial Analysis Scheme Initial logic: If best-taught students outperform other students, indicator is sensitive to instruction. If mean differences are small or in the wrong direction, indicator is insensitive to instruction.

Problem But significant performance differences between best-taught and other students do not necessarily represent instructional sensitivity. affluent students provided ineffective instruction typically end up in Cell B challenged students provided effective instruction typically end up in Cell C

Problem Thus: Means-based and DIF-driven approaches that evaluate between- group differences are not appropriate for appraising instructional sensitivity. Instead: Focus on the degree to which indicators accurately distinguish effective from ineffective instruction – without confounding from instructionally irrelevant easiness or difficulty.

Conceptually Correct Rather than comparing group differences in terms of means, lets look instead at the combined proportions of true fail and true pass. That is, (A + D) / (A + B + C + D) Which can be shortened to (A + D) / N = Malta Index

Malta Index (A + D) / N Ranges from 0 to 1 (Completely Insensitive to Totally Sensitive) In practice: A value of.50 = chance Equivalent to random guessing

Totally Sensitive (A + D) / N = ( ) / 100 = 1.0 A perfectly sensitive item or indicator would cluster students into Cell A or Cell D.

Totally Insensitive (A+D) / N = (0+0) / 100 = 0.0 A perfectly insensitive test clusters students into Cell B or Cell C

Useless (A+D) / N = (25+25) /100 = = mere chance An indicator that cannot distinguish true fail or pass from false fail or pass is totally useless – no better than random guessing.

Malta Index Parallels The Malta Index is similar conceptually to: –Mann-Whitney U –Wilcoxon ranks statistic –Area Under the Curve (AUC) in Receiver Operating Characteristic (ROC) curve analysis But its interpretation is embedded in the context of instructional sensitivity appraisal.

Malta Index Compared to these other approaches, the Malta Index is easier to… –compute –understand –interpret Thus, it is more accessible conceptually to measurement novices, such as –teachers –reporters –policy-makers

ROC Analysis Malta Index values can be depicted graphically as ROC curves.

Informal Evaluation Malta Index values can be evaluated informally via acceptability criteria (Hosmer & Lemeshow, 2000) Value – = excellent (A) – = good (B) – = acceptable (C) – = poor (D) – = fail (F)

Indicator Teacher Ratings (Most vs. Less) Prior Data: (Best vs. Not Best) Prior Data: (Best vs. Worst) MIAUCMIAUCMIAUC Average

Summary and Interpretations AUC and the Malta Index yield very similar but not identical results Identical conclusions overall: Grade 5 reading indicators lack instructional sensitivity –No indicator was graded better than a C –Most were in the Poor to Useless range –Averages ranged from Poor to Useless

Summary and Interpretations Low instructional sensitivity values for grade 5 reading were disappointing, especially given: –Local contractor (CETE) –Guidance from TAC (including Popham and Pellegrino) –Concerns from the KAAC (including Court) If Kansas assessments lack instructional sensitivity, what about other states assessments?

Conclusion Dear U.S. Department of Education: Please make instructional sensitivity… –An essential component in reviews of RTTT funding applications –A critical element in the approval process of state and consortia accountability plans When the Department revised its Peer Review Guidance (2007) to include alignment as a critical element of technical quality, states were compelled to conduct alignment studies that they otherwise would not have conducted. Instructional sensitivity deserves similar Federal endorsement.

Presenters Questions, comments, or suggestions are welcome