Download presentation
Presentation is loading. Please wait.
Published byRobert Freeman Modified over 9 years ago
1
Using Classroom Artifacts to Measure Instructional Practice in Middle School Mathematics: A Two-State Field Test Hilda Borko, Suzanne Arnold, Beth Dorman, Karin Kuffner (CU-Boulder) Brian Stecher, Mary Lou Gilbert, Alice Wood (RAND Corporation) CRESST Conference 2004 September 10, 2004
2
Artifact Packages for Characterizing Instructional Practice: A Validation Study Goal: an instrument to capture instructional practice reliably and efficiently Rationale for artifact packages Richer descriptions than surveys Fewer resource demands than case studies Validation study to investigate reliability and validity
3
The Scoop Metaphor “What is it like to learn mathematics in your classroom?” A Scoop of Classroom Material One way that scientists study unfamiliar territory (e.g., freshwater wetlands, Earth’s crust) is to scoop up all the material they find in one place and take it to the laboratory for careful examination. Analysis of a typical Scoop of material can tell a great deal about the area from which it was taken. We would like to do something similar in classrooms, i.e., scoop up a typical week’s worth of material and use it to learn about the class from which it was taken. The artifacts would include assignments, homework, tests, projects, problem solving activities, and anything else that is part of instruction during the week.
4
The Scoop Notebook “Scoop” a typical week’s worth of instructional materials Variety of methods for capturing instructional practice Daily calendar Instructional materials Samples of student work Photographs Teacher Reflections
5
Methods Participants 36 middle school mathematics teachers Teachers from Colorado (23) and California (13) Variety of curricula, traditional to reform Data from 30 teachers used in reliability and validity analyses Data collection Scoop Notebook completed by teacher (5 days of instruction) Researcher observation and ratings (2 - 3 days) Audiotape of instruction (8 teachers, 2 - 3 days)
6
Scoring Guide 11 Dimensions of Classroom Practice Collaborative GroupingExplanation & Justification Structure of LessonsProblem Solving Multiple RepresentationsAssessment Use of Mathematical ToolsConnections & Applications Cognitive DepthOverall Discourse Community(Notebook Completeness) (Confidence)
7
Rating Observations and Notebooks Five-point rating scale Scoring Guide with descriptions and examples for each dimension: high (5) medium (3) low (1)
8
Scoring Guide Example: Problem Solving Overall Description: Extent to which instructional activities enable students to identify, apply and adapt a variety of strategies to solve problems. Extent to which problems that students solve are complex and allow for multiple solutions. [NOTE: this dimension focuses more on the nature of the activity/task than the enactment. To receive a high rating, problems should not be routine or algorithmic; they should consistently require novel, challenging, and/or creative thinking.] High: Students work on problems that are complex, integrate a variety of mathematical topics, and draw upon previously learned skills. Problems lend themselves to multiple solution strategies and have multiple possible solutions. Problem solving is an integral part of the class’ mathematical activity, and students are regularly asked to formulate problems as well as solve them. Example: During a unit on measurement, students regularly solve problems such as: “Estimate the length of your family’s car. If you lined this car up bumper to bumper with other cars of the same size, about how many car lengths would equal the length of a blue whale?” After solving the problem on their own, students compare their solutions and discuss their solution strategies. The teacher reinforces the idea that there are many different strategies for solving the problem and a variety of answers because the students used different estimates of car length to solve the problem.
9
Ratings of Instructional Practice Notebook Only Contents of Scoop Notebook Gold Standard Observations and contents of Scoop Notebook Notebook + Discourse Transcripts of audio-taped classroom lessons and contents of Scoop Notebook
10
Reliability Research Questions Do raters agree on the scores they assign to the dimensions of classroom practice, based on the Scoop Notebook? Is agreement among raters higher for some dimensions than others? Is agreement among raters higher for some teachers than others?
11
Agreement Among Raters: Calculation Procedures Three raters per notebook; pairs of ratings compared 1-2-3: three pairs (1,2), (1,3), & (2,3) Exact agreement = 0% Within 1 rating point = 67% 4-4-1: three pairs (4,4), (4,1), (4,1) Exact agreement = 33% Within 1 rating point = 33%
12
Agreement by Dimension Average ratings across teachers close to 3.0 for all dimensions Relatively high levels of agreement for all dimensions Exact agreement ranged from 21.1% to 44.3% Agreement within 1 point ranged from 70.1% to 82.3% Agreement fairly consistent across dimensions
13
Agreement by Teacher Wide range of values Average notebook ratings (1.55 to 4.21) Exact agreement: 12.0% to 60.5% Agreement within 1: 57.5% to 97.0% No apparent relationship to: Average notebook rating (traditional versus reform practices) Notebook completeness Rater confidence
14
Validity Research Questions 1.Do ratings based only on the Scoop Notebook agree with ratings based on the Scoop Notebook and classroom observations (“Gold Standard” ratings)? Is agreement higher for some dimensions than others? Is agreement higher for some teachers than others? 2.Are there differences in the ratings of Colorado teachers and California teachers? 3.Do ratings based on the Scoop Notebook and transcripts of classroom lessons agree with Gold Standard ratings?
15
Methods Similar to the Reliability Analysis Comparisons between average Notebook Only rating (averaged across 3 raters) and Gold Standard rating Two levels of agreement (on 5-point scale) Within 0.33 Within 0.67
16
Agreement by Dimension Moderately high levels of agreement for all dimensions Agreement within 0.33 ranged from 30.0% to 53.3% across the 11 dimensions Agreement within 0.67 ranged from 43.3% to 66.7% Differences in agreement among dimensions make sense Structure of Lessons “easy” to rate Mathematical Discourse and Assessment more “difficult” to rate
17
Agreement by Teacher Pattern similar to reliability data Large differences among teachers in levels of agreement Agreement within 0.33 ranged from 9.09% to 81.8% Agreement within 0.67 ranged from 9.09% to 90.0% Level of agreement is not related to: Average notebook rating Notebook completeness Rater confidence
18
Notebooks Detect Known Differences in Curriculum Average ratings differed for teachers using traditional vs. reform-based curricula Notebook ratings: 3.42 vs. 2.59 Gold standard ratings: 3.47 vs. 2.30 Differences between ratings varied by dimension and match known differences in the curricula Ratings most alike on Structure of Lessons and Assessments Ratings most different on Cognitive Depth, Discourse Community, etc.
19
Validity Analyses with Classroom Transcripts How do the ratings based on the Scoop Notebook and transcripts of classroom lessons compare to Gold Standard ratings? To what extent does analysis of classroom discourse provide additional insights about instructional practices?
20
Discourse Plus Scoop Notebook vs. Gold Standard Exact agreement occurred in 45.4% of cases Range across dimensions: Grouping: 14.3% Structure of Lessons: 71.4% Agreement within 1.0 point occurred in 92.2% of cases. Agreement within 1 was 100% for 7 of 11 dimensions In general, relatively high levels of agreement
21
Qualitative Analysis On which dimensions does discourse provide more information and insights than the Scoop Notebook alone? Mathematical Discourse Community Explanation/Justification Cognitive Depth Connections/Applications Assessment
22
Additional Insights: Mathematical Discourse Community How teacher solicits, explores, & attends to student thinking How teacher models & emphasizes use of mathematical language Student-to-student communication Common classroom discourse patterns (e.g., IRE; more open ended)
23
Conclusions: Feasibility of the Approach Teachers were interested, supportive, and cooperative Teachers were able to follow artifact collection instructions well Notebooks returned in timely manner Student work represented a broad range of curriculum and instructional activities Photographs and reflections were descriptive
24
Conclusions: Reliability and Validity Agreement among raters is reasonably high for all dimensions and very high for some Agreement between Notebook Only ratings and Gold Standard ratings is moderately high for all dimensions Some dimensions and teaching practices present greater challenges than others for artifact-based tools such as the Scoop Notebook Raters reported struggling with some dimensions (e.g., Mathematical Discourse Community) more than others Information about classroom discourse provides additional insights about some dimensions Disagreements among raters may be greater when there are inconsistencies in the data
25
Implications and Future Directions Scoop Notebook is useful for describing instructional practice in broad terms Results do not support use of the Scoop Notebook to make judgments about individual teachers Additional research needed to answer questions such as: Why are some classrooms and teachers more difficult to rate than others? Are there systematic differences among individual raters? Possible future uses of the Scoop Notebook Tool for professional development Trace changes in teachers over time or across different instructional units
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.