1 New England Common Assessment Program Item Review Committee Meeting March 30, 2005 Portsmouth, NH
2 Tim Kurtz – Director of Assessment New Hampshire Department of Education Michael Hock – Director of Educational Assessment Vermont Department of Education Mary Ann Snider – Director of Assessment & Accountability Rhode Island Department of Education Tim Crockett – Assistant Vice President Measured Progress Welcome and Introductions
3 Meeting Agenda Committee Member Expense Reimbursement Form Substitute Reimbursement Form NECAP Nondisclosure Form Handouts of presentations Logistics
4 Test Development: Past, Present & Future How we got here? – Tim Kurtz, NH DoE Statistical Analyses – Tim Crockett, MP Bias/Sensitivity – Michael Hock, VT DoE Depth of Knowledge – Ellen Hedlund, RI DoE Betsy Hyman, RI DoE Schedule – Tim Kurtz, NH DoE So, what am I doing here? Morning Agenda
5 How did we get to where we are today? Tim Kurtz Director of Assessment New Hampshire Department of Education Item Review Committee
6 1st Bias Committee meeting – March 1st Item Review Committee meeting – April 2nd Item Review Committee meeting – July 2nd Bias Committee meeting – July Face-to-Face meetings – August Test Form Production and DOE Reviews – August NECAP Pilot Review
7 Reading and Mathematics Printing and Distribution – September Test Administration Workshops – October Test Administration – October 25 – 29 Scoring – December Data Analysis & Item Statistics – January Teacher Feedback Review – February Has affected item review, accommodations, style guide and administration policies Item Selection meetings – February & March NECAP Pilot Review
8 Writing Printing and Distribution – December & January Test Administration – January Scoring – March Data Analysis & Item Statistics – April Item Selection meetings – April & May NECAP Pilot Review
9 What data was generated from the pilot and what do we do with it? Tim Crockett Assistant Vice President Measured Progress NECAP Pilot Review
10 Item Statistics ●The review of data and items is a judgmental process ●Data provides clues about the item ●Difficulty ●Discrimination ●Differential Item Functioning
11 At the top of each page...
12 The Item and any Stimulus Material
13 Item Statistics Information
14 Item Difficulty (multiple-choice items) ●Percent of students with a correct response. Range is from.00 to Difficult Easy ●NECAP needs a range of difficulty, but below.30 may be too difficult above.80 may be too easy
15 Item Difficulty (constructed-response items) Average score on the item. Range is from.00 to 2.00 or 0.00 to 4.00 On 2-point items below 0.4 may be too difficult above 1.6 may be too easy On 4-point items below 0.8 may be too difficult above 3.0 may be too easy
16 Item Discrimination ●How well an item separates higher performing students from lower performing students ●Range is from to 1.00 ●The higher the discrimination the better ●Items with discriminations below.20 may not be effective and should be reviewed
17 Other Discrimination Information: (multiple-choice items)
18 Differential Item Functioning ● DIF (F-M) – females compared to males who performed the same on the test are compared on their performance on the item ●positive number reflects females scoring higher ●negative number reflects males scoring higher ●NS means no significant difference
19 Item Statistics Information
20 Differential Item Functioning Multiple Choice HighLow Negligible LowHigh C B AA B C < > FemaleMale -Dorans and Holland, For CR items: –.20 or +.20 represents negligible DIF >–.30 or +.30 represents low DIF >–.40 or +.40 represents high DIF
21 How do we insure that this test works well for students from diverse backgrounds? Michael Hock Director of Educational Assessment Vermont Department of Education Bias/Sensitivity Review
22 What Is Item Bias? Bias is the presence of some characteristic of an assessment item that results in the differential performance of two individuals of the same ability but from different student subgroups Bias is not the same thing as stereotyping although we don’t want either in NECAP We need to ensure that ALL students have an equal opportunity to demonstrate their knowledge and skills
23 Item Development Bias-Sensitivity Review Item Review Field-Testing Feedback Pilot-Testing Data Analysis (DIF) How Do We Prevent Item Bias?
24 Sensitivity to different cultures, religions, ethnic and socio-economic groups, and disabilities Balance of gender roles Use of positive language, situations and images In general, items and text that may elicit strong emotions in specific groups of students, and as a result, may prevent those groups of students from accurately demonstrating their skills and knowledge Role of the Bias-Sensitivity Review Committee The Bias-Sensitivity Review Committee DOES need to make recommendations concerning…
25 Reading Level Grade Level Appropriateness GLE Alignment Instructional Relevance Language Structure and Complexity Accessibility Overall Item Design Role of the Bias-Sensitivity Review Committee The Bias-Sensitivity Review Committee DOES NOT need to make recommendations concerning…
26 Passage Review Rating Form “This passage does not raise bias and/or sensitivity concerns that would interfere with the performance of a group of students”
27 Universal Design Improved Accessibility through Universal design
28 Universal Design Improved Accessibility through Universal design Inclusive assessment population Precisely defined constructs Accessible, non-biased items Amenable to accommodations Simple, clear, and intuitive instructions and procedures Maximum readability and comprehensibility Maximum legibility
29 How do we control item complexity? Ellen Hedlund and Betsy Hyman Office of Assessment and Accountability Rhode Island Department of Elementary and Secondary Education Item Complexity
Depth of Knowledge A presentation adapted from Norman Webb for the NECAP Item Review Committee March 30, 2005
31 Bloom Taxonomy Knowledge Recall of specifics and generalizations; of methods and processes; and of pattern, structure, or setting. Comprehension Knows what is being communicated and can use the material or idea without necessarily relating it. Applications Use of abstractions in particular and concrete situations. Analysis Make clear the relative hierarchy of ideas in a body of material or to make explicit the relations among the ideas or both. Synthesis Assemble parts into a whole. Evaluation Judgments about the value of material and methods used for particular purposes.
32 U.S. Department of Education Guidelines Dimensions important for judging the alignment between standards and assessments Comprehensiveness: Does assessment reflect full range of standards? Content and Performance Match: Does assessment measure what the standards state students should both know & be able to do? Emphasis: Does assessment reflect same degree of emphasis on the different content standards as is reflected in the standards? Depth: Does assessment reflect the cognitive demand &depth of the standards? Is assessment as cognitively demanding as standards? Consistency with achievement standards: Does assessment provide results that reflect the meaning of the different levels of achievement standards? Clarity for users: Is the alignment between the standards and assessments clear to all members of the school community?
33 The demand on thinking the items requires: Low Complexity Relies heavily on the recall and recognition of previously learned concepts and principles. Moderate Complexity Involves more flexibility of thinking and choice among alternatives than do those in the low-complexity category. High Complexity Places heavy demands on students, who must engage in more abstract reasoning, planning, analysis, judgment, and creative thought. Mathematical Complexity of Items NAEP 2005 Framework
34 Depth of Knowledge (1997) Level 1 Recall Recall of a fact, information, or procedure. Level 2 Skill/Concept Use information or conceptual knowledge, two or more steps, etc. Level 3 Strategic Thinking Requires reasoning, developing plan or a sequence of steps, some complexity, more than one possible answer. Level 4 Extended Thinking Requires an investigation, time to think and process multiple conditions of the problem.
35
36
37
38
39
40 Practice Exercise Read the passage, The End of the Storm Read and assign a DOK to each of the 5 test questions Form groups of 4-5 to discuss your work and reach consensus of a DOK for each test question
41 Issues in Assigning Depth-of-Knowledge Levels Variation by grade level Complexity vs. difficulty Item type (MC, CR, ER) Central performance in objective Consensus process in training Aggregation of DOK coding Reliabilities
42 Web Sites Alignment Tool Survey of the Enacted Curriculum
43 What is the development cycle for this year? What is your role in all this? Tim Kurtz Director of Assessment New Hampshire Department of Education NECAP Operational Test
44 1st Bias Committee meeting – March teachers – 6 from each state 1st Item Review Committee meeting – March teachers – 12 from each state in each content area 2nd Item Review Committee meeting – April Practice Test on DoE website – early May 2nd Bias Committee meeting – May 3-4 Face-to-Face meetings – May & June 1-3 Test Form Production and DOE Reviews – July NECAP Operational Test
45 Printing – August Test Administration Workshops – Aug & Sept Shipments to schools – September Test Administration Window – October ,000 students and 25,000 teachers from the 3 states Scoring – November Standard Setting – December Teachers and educators from the three states Reports shipped to schools – Late January NECAP Operational Test
46 This assessment has been designed to support a quality program in mathematics and English language arts. It has been grounded by the input of hundreds of NH, RI, and VT educators. Because we intend to release assessment items each year, the development process continues to depends on the experience and professional judgment and wisdom of classroom teachers from our three states. TIRC – So, why are we here?
47 We have worked hard to get to this point. Today, you will be looking at passages in reading and some items in mathematics. The role of Measured Progress staff is to keep the work moving along productively. The role of DoE content specialists is to listen and ask clarifying questions as necessary. TIRC – Our role.
48 You are here today to represent your diverse contexts. We hope that you… share your thoughts vigorously, and listen just as intensely – we have different expertise and we can learn from each other, use the pronouns “we” and “us” rather than “they” and “them” – we are all working together to make this the best assessment possible, and grow from this experience – I know we will. And we hope that today will be the beginning of some new interstate friendships. TIRC – Your role?
49 Tim KurtzDirector of Assessment NH Department of Education (603) Mary Ann SniderDirector of Assessment and Accountability Rhode Island Department of Elementary and Secondary Education (401) ext Michael HockDirector of Educational Assessment Vermont Department of Education (802) Information, Questions and Comments