Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scaling Up District-Determined Measures

Similar presentations


Presentation on theme: "Scaling Up District-Determined Measures"— Presentation transcript:

1 Scaling Up District-Determined Measures

2 Personal Introduction
Teacher for 10 years (5th grade, Gifted Math) Finishing PhD at University of Connecticut in Measurement, Evaluation, and Assessment Goal Setting in Teacher Evaluation Combining Multiple Measures in Teacher Evaluation Assessment Coordinator for Department of Education: District Determined Measures

3 Resources Technical Guide B Webinar Series Commissioners Memoranda
Technical Advisory Sessions Using Current Assessments in DDMs Example Assessments Other ESE documents (Technical Guide A, Part VII, Regulations)

4 District-Determined Measures
Summative Performance Rating Student Impact Rating Evidence Products of practice (e.g., observations) Other evidence relevant to one or more of the four Standards of practice (e.g., student surveys) Multiple measures of student learning, growth and achievement, including: Measures of student progress on classroom assessments Measures of student progress on learning goals set between the educator and evaluator Evidence Trends and patterns in student learning, growth & achievement At least two years of data At least two measures Statewide growth measures, where available (including MCAS SGP) Additional DDMs comparable across schools, grades, and subject matter district-wide Let’s look at precisely how and where DDMs will inform an educator’s evaluation. Here we have the required sources of evidence for both evaluation ratings. With respect to the Summative Performance Rating, multiple measures of student learning, growth and achievement comprise one of the three categories of evidence that must be taken into account, and must include measures of student progress on classroom assessments, as well as measures of student progress related to an individual educator’s learning goals. *CLICK: District-determined measures can serve this role for the Summative Performance Rating, depending on the nature of the educator’s goals or educator plan. With respect to the Impact Rating, data from at least two state or district-wide measures of student learning gains over at least two years is used to determine each educator’s impact on student growth. MCAS SGP must be one of these measures for educators with students taking the MCAS, and can be both where available. Bottom line: the Impact Rating is always based on a trend over time of at least two years, and it should reflect a pattern in the results on at least two different assessments, which is where district-determined measures come into play.

5 The Educator Evaluation Framework
Summative Rating Exemplary Self-Directed Growth Plan Proficient 1-yr Self-Directed Growth Plan 2-yr Self-Directed Growth Plan Needs Improvement Directed Growth Plan Unsatisfactory Improvement Plan SAM This graphic is probably familiar to many of you. It shows how the two independent ratings are used to determine the type and length of the Educator Plan an educator will be assigned to. The Summative Performance Rating, reflected here on the vertical axis (the orange bar), determines the type of Educator Plan. The Student Impact Rating, here along the horizontal axis (the purple-ish bar), determines the length of the Educator Plan. STRESS THAT THE TWO RATINGS ARE INDEPENDENT. Highlight where discrepancies are and what they might mean. Low Moderate High Rating of Impact on Student Learning

6 Guiding Principles

7 DDM Key Questions Is the measure aligned to content?
Does it assess what the educators intend to teach and what’s most important for students to learn? Is the measure informative? Do the results tell educators whether students are making the desired progress, falling short, or excelling? Do the results provide valuable information to schools and districts about their educators?

8 Five Considerations Measure growth (Technical Guide B)
Common administration procedure  Common scoring process Translate to an Impact Rating Comparability (Webinar 6) Aligned to content and informative are the two guiding principals districts need to follow to establish a firm foundation for DDMs. “How effectively does the assessment measure growth? Is there a common administration protocol?    Is there a common scoring process? How will results on the measure translate into a rating of low, moderate, or high impact? Comparable within and across grades or subjects district-wide?” After this foundation, districts need to think about how they will start building systems to continuously improve these measures. Improving how growth is measured, coordinating administration, creating effective scoring and data collection, generating impact ratings, and exploring comparability are all important next steps for the future that districts should start thinking seriously about.

9 Using Current Assessments in District Determined Measures

10 Considerations in DDMs
Performance vs. Growth Most CEPAs are measures of performance and not growth Growth takes into account the different levels of student achievement Provides all students an equal opportunity to demonstrate growth Unit vs. Year Most CEPAs are assessments after a short unit (10 lessons) A Measure from during the year provides a more accurate representation of student growth and of an educator’s impact 2:01 These are challenges to using CEPAs as a measure of TEACHER’S IMPACT. The purpose of the three steps is to using CEPAs to address these two challenges. Key Message: These are often the same advantages and disadvantages to using other current assessments that you are already using.

11 Potential Solutions Bringing together multiple assessments, (e.g., CEPAs) to measure performance at different points in time Multiple administrations of one assessment (or similar versions) during the year Using multiple measures of growth to make a single DDM 2:03 Key Message: In this session, we are using CEPAs in a DDM. This could be either multiple CEPAs, or a given CEPA administered at multiple times during the year. Key Message: This is not the only way to select a DDM, for example, a district may have a significant performance assessment (such as completing a multi-stage research project or producing a concert that represent a significant sampling during the year on their own). However, the issues and challenges with these situations are different.

12 The larger DDM context Preparing Implementing 2:05
Step Key Questions 1. Create a Team Who will be on the team? What are the key responsibilities? 2. Determine Content to Be Measured What standards and objectives are the most important to measure? 3. Identify Measure What measures are already in use in the district? Do components need to be developed or modified before piloting? 4. Prepare to Implement Measure What is the timeline? How will measure be scored? How will the data be collected and stored? Implementing Key Actions 5. Test Administer the measure Score the results Collect the data 6. Analyze Analyze student results Analyze the administration and scoring processes 7. Adjust Modify components as needed 8. Repeat Test, analyze, and adjust 2:05 Key Message: The three steps covered in this presentation are just a part of a larger process. Remember: The goal is to get started. This is about building capacity. It is also important to remember that these steps are part of a larger process that includes creating a team, administration, scoring, and managing data. These are important aspects of implementing DDMs that will be the focus of future guidance. The focus of this session is just on the identification of a measure.

13 Using CEPAs: Three Steps
STEP 1: Identifying Key Content: What content best represents a meaningful sample of the learning a student should complete during the year? Ensuring that change in performance represents growth: Selecting an approach for measuring growth: 2:06 Key Message: This is one of the key questions from Technical Guide B. Identifying key content helps ensure that the DDM is aligned to content. The first step when assessing is always determining what you want students to learn. This is similar as we would do with any measure. It is unique in this context because instead of determining this independently, districts would look across current assessments and select important content that is currently assessed. It is important that districts do not let assessment drive instruction.

14 Step 1: Identifying Key Content
Key Content includes What content best represents a meaningful sample of the learning a student should complete during the year? What content is particularly challenging to teach and for students to learn? What content is not assessed currently? What content reflects a district priority? Example: look across the year using a curriculum map to determine what is meaningful sample. 2:08 Key Message: There are several ways to arrive at identifying key content. Key Message: We are using a DDM as a measure of impact and student learning, thus it should be a meaningful sample. Key Message: A meaningful sample is determined by educators expertise. In the next two examples, we will focus on comparing standards. Not every unit plan may have clearly articulated standards. It is important to remember that the represented content does not need to be a comprehensive selection of all important content, but rather a meaningful sample. To approach this question, we can look across a curriculum map. This is not the only way to get at this question.

15 Writing Standards Covered in Each MCU
Example: 3rd Grade ELA Model Curriculum Unit Name Writing Standards Covered in Each MCU 1 2 3 4 5 6 7 8 9 10 Independent Readers X Author Study Biography Extreme Weather Newspaper Poetry Reading Inquiry Stories Matter Whose Story 2:11 Key Message: This is an example from a third grade curriculum based on nine of the MCU units. We will assume that these units represent a curriculum map. This example is described in detail in the document called Using Current Assessments in DDMs. To answer the question, we see that standard 3 is covered at multiple times across the year, we have the following table of the writing standards covered in each unit. Key Message: But you have to go back to the original question. Does this content represent a meaningful sample of the learning a student should complete during the year? Standard 3 reads (W3.3) Write narratives to develop real or imagined experiences or events using effective technique, descriptive details, and clear event sequences.

16 Types of Key Content Key content may be taught repeatedly across the year Key content may be taught once during the year 2:13 This example is of content covered repeatedly across the year. Key Message: Easier to measure growth in skills that are taught repeatedly across the year. Key Message: You shouldn’t just assess a skill because it is easier to measure growth! Key Message: A meaningful sample can be content taught in depth once during the year. However, it needs to still be a meaningful sample of the material covered during the year. Another approach is to bring together content into a theme. For example, in history students are able to describe how conflict leads to change by providing increasing number of facts from different time periods over the year.

17 Using CEPAs: Three Steps
Identifying key content: Step 2: Ensuring that change in performance represents growth: Are the assessments similar enough to support meaningful inferences about student growth during the year? Selecting an approach for measuring growth: 2:37 So , we have discussed growth, the next step is to look at multiple assessments and ask if they are similar enough that you think that it captures growth. Key Message: We want to assess the growth of key content because it is more fair to students and teachers. Key Message: From the assessments you want to make a claim about growth with confidence. There is an important difference between confidence and certainty. Certainity is not required. Key Message: Not a yes or no question, but always how can I increase my confidence in making a claim of growth?

18 What is Growth? 2:42 Ask the group what is growth? (5 minutes)
Make a distinction between change and growth. Key message: a distinction between growth (what actually happens) and ways to measure it (SGPs, 4 approaches, .. Etc) Key Message: There are simple (e.g. gain scores on a test) and sophisticated ways (SGPs) to estimate growth. But they are all efforts to get at this concept you already understand The goal of this slide is to reiterate that teachers in their gut, understand what growth is. Key Message: Growth is a concept you understand. Key Message: You need to know where someone began and ended to assess growth

19 Step 2: Does change represent growth?
Are the assessments similar enough to support meaningful inferences about student growth during the year? Do early assessments provide meaningful information about what students do not understand? Do future assessments provide meaningful information about what students have learned? Do students have the opportunity to demonstrate different levels of growth? 2:44 The goal in this next step is to look carefully at how you assessed where students began and where they ended. Are these assessments similar enough to support a claim of growth? Key Message: Teachers have the necessary skills to lead the process These are the types of questions that you would ask yourself when completing step 2.

20 Example: 3rd Grade ELA In the third grade example Solution
Each unit has a CEPA covering the same content Rubrics are different, hard to compare Solution Identify similar evidence Modify rubrics by adding consistently worded rubric items that assess the same content 2:45 Key Message: In this Example, the team has identified Key Content that is taught at multiple times during the year. However they found that the assessments did not support a claim of growth. To solve this problem, they modified the rubric. We are going to show how they did that.

21 Looking Across Assessments
Unit Name Rubric Items 1 2 3 4 5 6 Author Study Topic development and clarity Evidence and Content Accuracy Organization Standard English Conventions Variety of Sentences Precise Use of Vocabulary Biography Organization: relevant, well organized and detailed Opinion: Includes persuasive details that support the writer’s opinion Illustrations: Includes engaging illustrations on each page Text Features: Includes text features Voice: Turns the events into a story Newspaper The Writing Process Use of Technology Language and Conventions Interviewing and Quotes Descriptive Details Poetry Includes description of topic of poem Includes senses or emotions evoked by poem Specific words or phrases to support response Personal reflection Whose Story Topic development Illustrations  Standard English Conventions Looking Across Assessments 2:50 Key Orientation: We have the five units that were identified as including the same content. Across we have the rubric items. The following table includes the rubric items for each of the five units that had identified the standard W3.3 You will notice that each rubric is different (the CEPAs were not designed to be identical). Looking across the units we see consistent themes: Some of them are obvious from the titles. For example, we can look across the different rubrics and we see that Organization is a consistent title. Even when rubric items do not have the same title, when we look deeper into the descriptions of each item, we notice consistent patterns. For example, in the unit Whose Story, the rubric item Topic development includes evidence that the piece is organized and clear. To highlight another example, Topic Development and clarity, which assess how a student is able to tie a whole piece together in a coherent narrative, is assessed under a couple different titles. You may notice that Topic Development included elements of both of these categories. So as a result, the team working with these units decided to write consistent rubric items for these four themes. These consistent rubric items would be added or would replace additional items. Key Message: We are looking for a ways to increase our confidence in a claim of growth.

22 3rd Grade ELA Example: Advantages
Do not need to change what students do Rubrics are on a consistent scale Each rubric doesn’t need to be identical Still include important rubric items that are present in only one CEPA (for example, use of technology in the Newspaper Unit) 2:52 Wherever possible, reducing change allows us to keep things that worked in the past. What are the advantages of just changing the rubric? We can still capture important differences between the assessments.

23 Using CEPAs: Three Steps
Identifying key content: Ensuring that change in performance represents growth: Step 3: Selecting an approach for measuring growth: What scoring approach best captures student learning? 3:06 Finally, once we are confident that we have measures of student’s learning we need to determine the approach for measuring that growth. Key Message: Measuring is the systematic process of assigning a number

24 Measuring Growth

25 Approaches to Measuring Student Growth
Pre-Test/Post Test Repeated Measures Holistic Evaluation Post-Test Only Not intended to be comprehensive Focus is on approaches that are practical to implement at the local level AND practical for locally developed measures (as opposed to commercial) NOT about classifying – this is about considering the range

26 Pre/Post Test Description: Measuring Growth: Considerations:
The same or similar assessments administered at the beginning and at the end of the course or year Example: Grade 10 ELA writing assessment aligned to College and Career Readiness Standards at beginning and end of year Measuring Growth: Difference between pre- and post-test. Considerations: Do all students have an equal chance of demonstrating growth? Description: A lot of familiarity with this method Growth: This is the most common way, other options Considerations: How to address these issues? If not difference, another method may make 1 point growth more comparable. Ceiling effect, making more harder questions. If pre-test is not meaningful, perhaps pre-post is not appropriate. (Think Is one point of growth the same across the entire scale? Are there ceiling or floor effects that hide growth? Is a pretest meaningful?

27 Repeated Measures Description: Measuring Growth: Considerations:
Multiple assessments given throughout the year. Example: running records, attendance, mile run Measuring Growth: Graphically Ranging from the sophisticated to simple Considerations: Less pressure on each administration. Authentic Tasks Authentic tasks: Takes class time, should be worth while. Don’t want students to get “used” to the assessment, and have growth be conflated with experience with the assessment.

28 Repeated Measures Example Running Record
# of errors Running Records a short checks on the accuracy of student’s reading used in early literacy Classrooms. From this graph we can see that Students have different levels of growth The January 24th sample may have been hard More improvement from September to November (Is this an artifact or real?) Computing growth: Could create regression lines, could take average of difference between first and last three points. Date of Administration

29 Holistic Description: Measuring Growth: Considerations:
Assess growth across student work collected throughout the year. Example: Tennessee Arts Growth Measure System Measuring Growth: Growth Rubric (see example) Considerations: Rating can be challenging & time consuming Option for multifaceted performance assessments Description Measuring Growth: You can have rated pieces individually, but the growth measure itself is looking at the original items.

30 Holistic Example 1 2 3 4 Details No improvement in the level of detail. One is true * No new details across versions * New details are added, but not included in future versions. * A few new details are added that are not relevant, accurate or meaningful Modest improvement in the level of detail * There are a few details included across all versions * There are many added details are included, but they are not included consistently, or none are improved or elaborated upon. * There are many added details, but several are not relevant, accurate or meaningful Considerable Improvement in the level of detail All are true * There are many examples of added details across all versions, * At least one example of a detail that is improved or elaborated in future versions *Details are consistently included in future versions *The added details reflect relevant and meaningful additions Outstanding Improvement in the level of detail * On average there are multiple details added across every version * There are multiple examples of details that build and elaborate on previous versions * The added details reflect the most relevant and meaningful additions The example comes from Austin, a student from Anser Charter School. He completed these scientific drawing throughout the year. Notice, that the rubric is not based on the quality of any individual point, but rather looking for growth across the examples. Example taken from Austin, a first grader from Anser Charter School in Boise, Idaho.  Used with permission from Expeditionary Learning. Learn more about this and other examples at

31 Post-Test Only Description: Measuring Growth, where possible:
A single assessment or data that is paired with other information Example: AP exam Measuring Growth, where possible: Use a baseline Assume equal beginning Considerations: May be only option for some indirect measures What is the quality of the baseline information? Unlikely to use this for 5 pilot areas Example, looking at graduation rate for a principal, not realistic pre-test Remember, the first goals is aligned, and informative. Graduation rate may be a very well aligned, meaningful, and informative piece of information, that provides weaker evidence around growth. As a result, future improvement should focus on how growth is measured.

32 Examples Portfolios Unit Assessments Capstone Projects
Measuring achievement v. growth Unit Assessments Looking at growth across a series Capstone Projects May be a very strong measure of achievement Not all fall under one of these labels so easily. Here are three that are trickier and are called out in the regulations.

33 Comparability

34 What is comparability? Comparable within a grade, subject, or course across schools within a district Identical measures are recommended Comparable across grade or subject level district-wide Impact Ratings should have a consistent meaning across educators; therefore, DDMs should not have significantly different levels of rigor Comparable within a grade, subject, or course. ESE recommends that the DDMs used in all schools within a district for a particular grade/subject or course be identical. For example, all chemistry teachers in a district should typically administer the same DDMs. This allows for educator collaboration in discussing results and adjusting instruction and for better calibration in scoring across educators. However, there may be good reasons that a district would choose measures that are comparable, but not identical. For example, elementary schools within a large district may have significantly different school-level goals due to differences in student populations; in that case, the district may elect to assign DDMs to educators that align with their schools’ goals.  For example, 5th grade teachers in one school may be assigned an ELA and a math DDM, while 5th grade educators in another are assigned a math and a science DDM because their school has a stronger STEM focus. Comparable across grade or subject level district-wide. District-determined measures provide the opportunity to compare student outcomes across grades, subjects, and educator roles. Districts need to be cautious when making these comparisons across different assessments and situations. A critical consideration is if a year’s worth of growth on a Grade 1 DDM is a comparable difficulty to a year’s worth of growth on a Chemistry DDM. DDMs should not have significantly different levels of rigor, therefore, such that it is more challenging for chemistry students to demonstrate a year’s worth of growth than it is 1st grade students.

35 Comparability (Type 1) Comparable across schools
Example: Teachers with the same job (e.g., all 5th grade teachers) Where possible, measures are identical Easier to compare identical measures Do identical measures provide meaningful information about all students? When might they not be identical? Different content (different sections of Algebra I) Differences in untested skills (reading and writing on math test for ELL students) Other accommodations (fewer questions to students who need more time)

36 Error and Bias Error is the difference between true ability and a student’s score. Random error Student sleeps poorly, lucky guess, … etc Systematic error (bias) Error occurs for one type or group of students ELL student misreads a set of questions Systematic Error = Bias Why This matters? Error (OK) decreases with longer/additional measures Bias (BAD) does not decrease with longer/additional measures Even with identical DDM, bias threatens comparability

37 When does bias occur? Situation: Students who score zero on the pre-test have less of an opportunity to grow because there isn’t an accurate baseline measure (Floor Effect). Situation: Special education students gain fewer points from pre-post test, and as a result are less likely to be labeled as having high growth.

38 Checking for Bias Do all students have an equal chance to grow?
Is there a relationship between the pre-test score and gain score? We want to see little to no relationship between pre-test scores and gain scores

39 Correlation Example: Pre-test Compared to Gain Score
Very Low Correlation Students of all ability were equally likely to demonstrate growth Negative Correlation Students of high ability systematically demonstrated less growth (due to ceiling effect) Positive Correlation Students with lower ability systematically demonstrated less growth (bias)

40 Interpreting Correlation
Strong correlation is an indication of a problem A low correlation is not a guarantee of no bias! Strong effect in small sub-population Counteracting effects at both low and high end Use common sense Always look at a graph! Create a scatter-plot graph and look for patterns

41 Example of Bias at Teacher Level
Teacher A Teacher B Pre Post Gain 3 4 1 8 14 6 Pre Post Gain 3 4 1 8 14 6 Even though similar students gained the same amount Teacher A’s average gain is 2 Teacher B’s average gain is 5

42 Solution: Grouping Grouping allows teachers to be compared based on similar students, even when the number of those students is different Teacher Average Growth Low Students A 1 B High Students 6 Wherever possible you should fix bias in the assessment itself. Make more high level items.

43 Addressing Bias: Grouping
How many groups? What bias are you addressing? Enough students in each group? Using Groups Weighted average Rule based (all groups must be above cut off) Professional judgment

44 Comparability (Type 2) Comparability across different DDMs
Across different grades and subject matter Are different DDMs held to the same standard of rigor? Does not require identical number of students in each of the three groups of low, moderate, and high Common sense judgment of fairness

45 Scoring & Impact Rating

46 Standard Setting Growth doesn’t need to be an increase in scores
Designing three rubrics across the year where a 3 represents being on track Setting Standards Qualitatively Check for variability

47 Look for Variability We want to see variability in the student growth results for each DDM. Subgroup variability is important. Variability suggests that all students are able to demonstrate their knowledge and skills regardless of ability level. No floor and ceiling effects.

48 Looking for Variability
The second graph is problematic because it doesn’t give us information about the difference between average and high growth because so many students fall into the “high” growth category.

49 Other Resources

50 Student Impact Rating Rollout:
School Year September 2013: Decide which DDMs to pilot and submit list to ESE. October 2013 – May 2014: Conduct pilots and research potential DDMs in other disciplines June 2014: Determine final plan for DDM implementation beginning in SY (request extension for particular grades/subjects or courses) SY Implement DDMs and collect Year 1 Student Impact Rating data for all educators (with the exception of educators who teach the particular grades/subjects or courses for which an extension has been granted SY Implement DDMs, collect Year 2 Student Impact Rating, and determine and report Student Impact Ratings for all educators (with the exception of educators who teach the particular grades/subjects or courses for which a district has received an extension). Directions for administering the measure, e.g., how long the students have to complete the work, required materials, etc. Student directions, e.g., what they can and cannot say, do, or use, how much time they have, etc. Instrument, e.g., a set of test questions, writing prompt, etc. Scoring method, e.g. answer key, rubric, etc Scoring directions, e.g. single or multiple raters, a process for resolving differences in raters’ scores for a student, etc.

51 Example Assessments Now Available
WestEd identified over 150 assessments that provide over 800 options for different grades/subjects and courses. Options include: Traditional and non-traditional assessments Commercially available and locally-developed options (including submissions from MA districts – thank you!). Multiple views of the data available: By pilot area By assessment coverage area By approach (build, borrow, buy) Full sortable HTML table 1:50 To support districts, the ESE contracted WestEd to indentify available assessments that would support districts in selecting DDMs. The collection includes over 150 assessments. Key Message: Using one of the example assessments is one approach to choosing a DDM. Access Examples Here:

52 Example Assessments Now Available
Key features of each option are described in a one-page summary. Each one-pager includes a link for additional information. 1:52 Describe what is on each page: There is the name of the assessment and a link to either the vendor’s website, or the assessment itself There is a description of the assessment And a selection of check boxes that provide a glance at the type of assessment

53 Ongoing Work with WestEd
Collect more complete and open source assessments We need your help! Please submit: Assessment directions, materials, items and prompts Scoring resources (rubrics, answer keys) Other information (steps for administration, lesson plans, examples of how results are used) To submit your examples, complete the survey and upload your materials at 1:54 Key Message: The stronger the quality of the example library, the less work there is for districts. Key Message: Maybe from today you will make something you want to share.

54 Hosting & Stories

55 Questions and Follow Up
Questions about the process of developing and improving DDMs can be directed to Craig Waterman at Policy questions about DDMs and the teacher evaluation framework can be directed to Ron Noble at 3:30


Download ppt "Scaling Up District-Determined Measures"

Similar presentations


Ads by Google