The Relationship Between State High School Exit Exams and Mathematical Proficiency: Analyses of the Complexity, Content, and Format of Items and Assessment Protocols By Blake Regan
Overview Problem Statement and motivation for study Conceptual Model Research Questions Analyses Participating states and results Discussion Limitations Recommendations Implications Conclusion
Problem Statement Pressures of No Child Left Behind Act of 2001 Mathematics and Reading Assessments yearly from Grades 3 through 8 and once from Grades 10 through 12 Financially reward/penalize schools based on test scores/results Teaching to the test Teachers are pressured to push content focused learning aside to prepare students for tests Students who have been classified as proficient by high school exit exams are failing to be college/career ready (Boyd, 2008; Bunch, 2004; Kupermintz, 2001; Lutzer, Rodi, Kirkman, & Maxwell, 2007; NCLB, 2002)
Motivation for Study To determine whether the high-stakes assessments that are used across the nation to ensure high school students are proficient in mathematics, and that serve as a gateway to graduation, are encouraging teachers to teach in a manner likely to result in a genuine development of proficiency across the full range of desirable mathematical behaviors.
Traditional Model
Conceptual Model (p. 47)
Research Questions Analysis A (Student Scores) Which complexity level of items best predicts student success on high school exit exams? Which content strand addressed by the items best predicts student success on high school exit exams? Which item format best predicts student success on high school exit exams? Analysis B (Assessment Protocols) Are states’ high school exit exams and cut scores aligned with their respective definition of mathematical proficiency? To what extent, if at all, do state high school exit exams encourage mathematical proficiency as defined by the NRC (2001)?
Analysis A Binary Logistic Regression “used to analyze data in studies where the outcome variable is binary or dichotomous” (Warner, 2008, p. 931) Students proficiency classification was used as the outcome variable Proficient or better scored a 1 Below proficient scored a 0 Student sub-scores on different types of items were used as predictor variables
Analysis B Acceptability Categorical Concurrence Depth-of-Knowledge Consistency Range-of-Knowledge Correspondence Balance of Representation Yes 6 or more items per standard and complexity level 50% or more of the points from items at or above moderate complexity level 50% or more of benchmarks are addressed by at least on item BORI value of .7 or greater Weak 40% to less than 50% of the points from items at or above moderate complexity level 40% to less than 50% of benchmarks are addressed by at least one item BORI value greater than .6 and less than .7 No Less than 6 items per standard and complexity level Less than 40% of the points from items at or above moderate complexity level Less than 40% of benchmarks are addressed by at least one item BORI value less than .6
Exam 1 68,784 student samples Cut Score was 30 of 60 possible points Complexity level of items was not provided Researcher compiled a group of volunteers Mathematicians, a scientist, businessmen, parents, teachers, a principal, and a superintendent (15 total participants) E-mailed definitions of each complexity and an electronic version of the exam Phone conference to classify each item Unanimous decision before an item was classified
Exam 2a One of two assessments used by State to assess students’ achievement of mathematical proficiency Administered in the eleventh-grade 62,043 student samples Cut score was 40 of 65 possible points Complexity level of each item was provided
Exam 2b Second of two assessments used by a state to assess students’ achievement of mathematical proficiency Students who fail to be classified as proficient by Exam 2a must be classified as proficient by 2b to qualify for graduation Administered in the eleventh-grade 62,043 student samples Cut score was 28 of 40 possible points
Exam 3 Administered in the tenth-grade 65,535 student samples Cut score was 17 of 46 possible points Complexity level of items was provided
Summary of Results (p.136) State 1 State 2 State 3 Exam 1 Exam 2a Summary of Results for Exams 1, 2a, 2b, and 3 State 1 State 2 State 3 Exam 1 Exam 2a Exam 2b Exam 3 Best Predictive Power Complexity Level High Moderate Low Content Strand Patterns, relations, and algebra Patterns, functions, and algebra Data analysis and probability Format Open-response Multiple-choice Categorical concurrence Complexity-by-Content Strand YES for 2 of 15 categories YES for 4 of 9 categories YES for 2 of 12 categories NO YES Depth-of-knowledge consistency WEAK Range-of knowledge correspondence Balance of representation YES for 3 and WEAK for 2 content strands
Discussion (State 1) Exam 1 Requires students to successfully earn at least one point from all of the content strands and a minimum of four points from moderate and high complexity items Recommend an increase in the total number of items to allow for the categorical concurrence requirement to be met Appropriately assesses students’ achievement of mathematical proficiency
Discussion (State 2) Exam 2a Exam 2b Requires students to successfully earn at least one point from all of the content strands and a minimum of 17 points from moderate and high complexity items Appropriately assesses students’ achievement of mathematical proficiency Exam 2b Failed to require students to correctly answer an item from each content strand Inappropriately assesses students’ achievement of mathematical proficiency
Discussion (State 3) Exam 3 Fails to require students to receive one point from each of the content strands as well as a point from either moderate or high complexity items Inappropriately assesses students’ achievement of mathematical proficiency Setting the cut score at 23, half of the total possible points Moderate complexity items High and extend response items have more predictive power than that of the null model Depth-of-knowledge consistency
Discussion (overall) No one item type was the best predictor of student classification for all the exams analyzed greatest amount of variation between complexity level least amount of variation between content strand addressed Balanced the weight and power of the content strands Need to be just as vigilant with complexity level Range-of-knowledge correspondence Categorical concurrence of each content strand separately, low complexity and moderate complexity Not one exam met the categorical concurrence requirement for each complexity-by-content strand category
Recommendations Other assessments of mathematical proficiency Correlation between these findings and teachers’ feelings toward these assessments Correlation between these findings and teachers’ techniques and practices for preparing students for the exams Assessments prior to the NCLB Assessments in other content areas
Implications Influence on teachers Cut score Assessments for the Common Core State Standards Meeting the NCLB deadline of 2014 Relationship between the Exam 1 and State 1’s rank according to NAEP 2009
Conclusion To meet the challenges set forth by the NCLB and the ever-expanding technological world, and most importantly for the success of U.S. students, it is imperative that exams that propose to assess student achievement are designed appropriately and critically
References Boyd, B. T. (2008). Effects of state tests on classroom test items in mathematics. School Science and Mathematics, 108(6), 251–262. Bunch, M. B. (2004). Ohio Graduation Tests standard setting report: Reading and mathematics. (T. Moore, Ed.) Columbus, OH: Ohio Department of Education. Kopko, E. (2009). State SAT scores 2009. Retrieved from Best and Worst States: http://blog.bestandworststates.com/2009/08/25/state-sat-scores-2009.aspx Kupermintz, H. S. (2001). Teacher effects as a measure of teacher effectiveness: Construct validity considerations in TVAAS (Tennessee Value Added Assessment System). Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle, WA. Lutzer, D. J., Rodi, S. B., Kirkman, E. E., & Maxwell, J. W. (2007). Statistical abstract of undergraduate programs in the mathematical sciences in the United States: Fall 2005 CBMS survey. Providence, RI: American Mathematical Society. National Center for Education Statistics. (2010). The nation’s report card: Grade 12 reading and mathematics 2009 national and pilot state results. Washington, DC: U.S. Department of Education. Retrieved from http://nces.ed.gov/nationsreportcard/pdf/main2009/2011455.pdf No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 115 Stat. 1425 (2002). Warner, R. M. (2008). Applied statistics: From bivariate through multivariate techniques. Thousand Oaks, CA: Sage.