Teacher Effectiveness and Their Carbon TIME Practices and Knowledge

Slides:



Advertisements
Similar presentations
Value Added in CPS. What is value added? A measure of the contribution of schooling to student performance Uses statistical techniques to isolate the.
Advertisements

Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Cory Koedel, Eric Parsons, Michael Podgursky and Mark Ehlert
Specialized Understanding of Mathematics: A Study of Prospective Elementary Teachers Meg Moss.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
COPYRIGHT WESTED, 2010 Calipers II: Using Simulations to Assess Complex Science Learning Diagnostic Assessments Panel DRK-12 PI Meeting - Dec 1–3, 2010.
Human Capital Policies in Education: Further Research on Teachers and Principals 5 rd Annual CALDER Conference January 27 th, 2012.
Sensitivity of Teacher Value-Added Estimates to Student and Peer Control Variables October 2013 Matthew Johnson Stephen Lipscomb Brian Gill.
The Impact of Including Predictors and Using Various Hierarchical Linear Models on Evaluating School Effectiveness in Mathematics Nicole Traxel & Cindy.
Promise and Problems of Learning Progression-guided Interventions Hui Jin, Hyo Jeong Shin, Michele Johnson, Jinho Kim.
Self-assessment Accuracy: the influence of gender and year in medical school self assessment Elhadi H. Aburawi, Sami Shaban, Margaret El Zubeir, Khalifa.
Using Common Core State Standards of Seventh Grade Mathematics in the Application of NXT LEGO® Robotics for CReSIS Middle School Students.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
An Analysis of Three States Alignment Between Language Arts and Math Standards and Alternate Assessments Claudia Flowers Diane Browder* Lynn Ahlgrim-Delzell.
Experiments. The essential feature of the strategy of experimental research is that you… Compare two or more situations (e.g., schools) that are as similar.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
T tests comparing two means t tests comparing two means.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Effectiveness of Selected Supplemental Reading Comprehension Interventions: Impacts on a First Cohort of Fifth-Grade Students June 8, 2009 IES Annual Research.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞 (南京工业大学,外国语言文学学院,江苏 南京211816) Introduction Vocabulary acquisition is of great.
What does the Research Say About . . .
Issues in Evaluating Educational Research
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
EXPERIMENTAL RESEARCH
Defining an Occasion of Sensemaking
Developing Problem Statement for Dissertation
Assessing Students' Understanding of the Scientific Process Amy Marion, Department of Biology, New Mexico State University Abstract The primary goal of.
What does the Research Say About . . .
Evaluation of An Urban Natural Science Initiative
Statistical Data Analysis
Educational Analytics
Preliminary Analysis of EOG/EVOS Data – Greene County 2009,2010,2011
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
DISCUSSION AND CONCLUSIONS
Single-Variable, Independent-Groups Designs
Analysis of Covariance (ANCOVA)
12 Inferential Analysis.
Instructional Practices in the Early Grades that Foster Language & Comprehension Development Timothy Shanahan University of Illinois at Chicago
Detecting Prosody Improvement in Oral Rereading
Manahel Alafaleq and Lianghuo Fan
2 independent Groups Graziano & Raulin (1997).
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
EVAAS Overview.
Dan Goldhaber1,2, Vanessa Quince2, and Roddy Theobald1
(Michigan State University)
Chapter 10: Estimating with Confidence
12 Inferential Analysis.
Chapter 8: Estimating with Confidence
Statistical Data Analysis
Validity and Reliability II: The Basics
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Managerial Decision Making and Evaluating Research
Chapter 8: Estimating with Confidence
Educational Testing Service
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Teacher Effectiveness and Their Carbon TIME Practices and Knowledge Qinyun Lin1, JinHo Kim2, Ellen Holste1, Shruti Bathia2, Karen Draney2 & Kenneth A. Frank1 1: Michigan State University; 2: University of California, Berkeley Abstract 2. Who Is the Best Teacher? Different Models to Estimate Teachers’ Effectiveness Value-added models have gained much popularity recently. We conducted several models (M1, M2 and M3 shown as follows) to measure teachers’ effectiveness in students’ learning. The correlations are quite high among effectiveness estimates generated by different models (all above 0.86). However, there could be sizable differences if we look at teacher rankings across different models for some specific teachers. This result is generally consistent with previous literature. (Goldhaber et al, 2014) Figure 3 represents our case study teachers’ rankings among 67 teachers across three models. M1 Pretest as a covariate (fixed teacher effect) M2 Gain score as the outcome variable (fixed teacher effect) M3 Two-level model (random teacher effect) 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑖 and 𝑃𝑜𝑠𝑡𝑡𝑒𝑠𝑡 𝑖 are the full pretest and posttest for student 𝑖; 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑗 is the average pretest of all students taught by teacher j; 𝐹𝑅𝐿 𝑗 is the percentage of free reduced lunch in teacher j’s school. Discussion: The change in rankings from M1 and M2 to M3 is as expected. For teachers who were assigned with better students, their rankings get lower once we add control variables related to students’ background. For example, Mr. Solomon gets a much lower ranking in M3 and his students were accelerated students. For teachers who taught special education students (such as Ms. Estabrook), their rankings get improved when FRL is added into the model to take into account students’ background. The changes in rankings across models indicated the sensitivity of teacher effectiveness estimates to different models. The changes from M1 and M2 to M3 exactly represent possible bias we may get if we omit some crucial factors (such as students’ background measured as percent of free and reduced lunch here). In this context, it is clear that M3 is better than M1 and M2. But how can we know that M3 is good enough and there is no other important factors that are omitted from the model? In general, it is problematic to claim cause-effect relationships for learning in classrooms of individual teachers just based on pre-post test scores. When we compare our rankings here to our classroom observation data, we find some inconsistencies. For example, Ms. Nolan has been recognized as the very top teacher among case study teachers in terms of classroom discourse. However, her ranking based on pre-post scores is only in the middle of case study teachers. This is probably because she is not as serious as other teachers about test preparation. Therefore, it might be unfair if teachers are only evaluated by pre-post test scores. There could be many factors that are not observed in the test scores but should be appreciated. With pretest and posttest scores estimated from item response theory (IRT), we applied hierarchical linear models to investigate several factors affecting students’ success, including the students’ grade levels (middle or high school), affluence of their schools (measured by percent free and reduced lunch), prior knowledge (measured by pretest scores), and the identities of their teachers. We also discussed different approaches to evaluate teachers’ effectiveness in terms of their students’ 3D learning of the Carbon TIME curriculum. Finally, we introduced some teacher-level variables measured by surveys and discussed how these variables correlated to their students’ learning gains. 1. Teachers DO Make a Difference Among Other Factors. 𝑃𝑜𝑠𝑡𝑡𝑒𝑠𝑡 𝑖 = 𝛾𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑖 + 𝑇𝑒𝑎𝑐ℎ𝑒𝑟 𝑖 𝛽+ 𝑢 𝑖 Data: We used the estimated latent proficiency scores from our IRT analysis as students’ test scores. These scores provide as measures of students’ proficiency in the Carbon dimension in Full Tests. For each student, we have their scores both at the beginning (pretest) and end (posttest) of a school year. For 2016-17, we have 67 teachers and 3642 students. We also collected teacher-level variables including affluence of schools measured by percent of free reduced lunch (FRL), grade levels measured as high school or middle school (Gradeband), percent of minority students (Minority) and school average standardized test scores (School scores). Participating in Carbon TIME As we can see in Figure 1 which shows the distribution of all students’ gain scores, most of students improved from pretests to posttests (the gain score is bigger than 0). A paired t-test shows that the increase from pretest (Mean = -0.654) to posttest (Mean = 0.704) is statistically significant (p < 0.001) from 0 and the effect size is about 0.7. However, we can tell from Figure 1, students differ greatly from each other in terms of how much learning growth they gained from participating in Carbon TIME. This is because different teachers and many other factors could lead to variation in learning through Carbon TIME. Therefore, we applied multilevel models to study what factors are most important and how important they are in affecting the learning students could gain from participating in Carbon TIME. Teachers Do make a difference in helping student learn from Carbon TIME We ran following two models: Model A: Model B: The difference between these two models is the 𝐹𝑅𝐿 𝑗 . We use this variable to control classroom characteristics. (In fact, once we control this variable, the other variables such as grade level, percent of minority and school test scores do not make significant contributions to explaining the variance in gain scores.) In the Figure 2, we can tell that there are great variance in observed classroom average gain scores. This variance decreases a little bit once we account for the classroom characteristic measured by FRL (blue triangles show small variance compared to yellow circles). In other words, after removing the factor of FRL, there are still quite sizable differences in terms of classroom average learning gains, which indicates the key role of teachers in helping students learn from Carbon TIME. In fact, the intra-class correlation in Model A is 24%, indicating a considerable amount of between-teacher variation. The conditional ICC in Model B is still as high as 22.4%, which represents a between-teacher variation even after removing the effect of classroom characteristics. This is a quite high ICC among other research. (Frank, 1998) Therefore, teachers do make a difference in students’ learning in Carbon TIME. 𝑃𝑜𝑠𝑡𝑡𝑒𝑠𝑡 𝑖 − 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑖 = 𝐺𝑎𝑖𝑛𝑠𝑐𝑜𝑟𝑒 𝑖 = 𝑇𝑒𝑎𝑐ℎ𝑒𝑟 𝑖 𝛽+ 𝑢 𝑖 𝐿𝑒𝑣𝑒𝑙1: 𝑃𝑜𝑠𝑡𝑡𝑒𝑠𝑡 𝑖𝑗 = 𝛽 0𝑗 + 𝛽 1𝑗 ∗(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑖𝑗 − 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑗 )+ 𝑟 𝑖𝑗 𝐿𝑒𝑣𝑒𝑙2: 𝛽 0𝑗 = 𝛾 00 + 𝛾 01 ∗ 𝐹𝑅𝐿 𝑗 + 𝛾 02 ∗ 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑗 +𝑢 0𝑗 𝛽 1𝑗 = 𝛾 10 3. Science-teaching Practices, Knowledge and Grading: How Do Teachers’ Teaching Affect Their Students’ 3D Learning? 4. Future Research 𝐿𝑒𝑣𝑒𝑙1: 𝐺𝑎𝑖𝑛𝑠𝑐𝑜𝑟𝑒 𝑖𝑗 = 𝛽 0𝑗 + 𝑟 𝑖𝑗 𝐿𝑒𝑣𝑒𝑙1: 𝐺𝑎𝑖𝑛𝑠𝑐𝑜𝑟𝑒 𝑖𝑗 = 𝛽 0𝑗 + 𝑟 𝑖𝑗 Though we have a rough idea about teachers’ classroom discourse based on our classroom observation data, we are still working on how to figure out reliable ways of coding key variables and fully use our case study data. By doing this, we could separate robust cause-effect relationships from statistical noise in measuring teachers’ effectiveness. As we discussed in Part 3, the correlation between teachers teaching (practice, knowledge and grading) and students’ learning growth is still ambiguous. There are some inconsistencies between the 2015-16 and the 2016-17 findings. Currently, we cluster the survey items into composite scores based on our theories. We may explore how some single survey items correlate to students’ learning growth. We will have more students’ test scores and teachers’ survey data in the next two years, which will allow us to test our current findings. And the data presented here is only from the Carbon dimension in our assessments. We also have other two dimensions which would be important to study. 𝐿𝑒𝑣𝑒𝑙2: 𝛽 0𝑗 = 𝛾 00 +𝑢 0𝑗 𝐿𝑒𝑣𝑒𝑙2: 𝛽 0𝑗 = 𝛾 00 + 𝜸 𝟎𝟏 ∗ 𝑭𝑹𝑳 𝒋 +𝑢 0𝑗 We conducted teacher surveys every year to measure teachers’ science knowledge, science-teaching knowledge, grading students’ responses and science-teaching practices. (See sample items on the fliers.) Teachers’ science knowledge: teachers are asked to answer five knowledge questions. These questions are also included in student tests. We have computer models for four questions. (alpha = 0.51 for 2016-17) Teachers’ science teaching knowledge: teachers are asked questions about how they would deal with some students’ misunderstandings in specific contexts. (correlation = 0.27 for two questions for 2016-17) Teachers’ grading students’ answers: we provided three students’ responses for four items. For each item, we evaluate how far the teacher’s ranking of different responses from the correct ranking. The grading for FATLOSS and BIOMASS item has a correlation of 0.29 for 2016-17. Teachers’ science teaching practices: teachers are asked about how frequently they do each practice listed in the survey. We identify three groups of practices. The first group include 13 Carbon TIME featured practices. These practices are highly emphasized in our project (alpha = 0.87 for 2016-17). The second group include 12 practices that are generally encouraged but not specifically highlight in Carbon TIME (alpha = 0.84 for 2016-17). The final group include 11 practices that Carbon TIME encourage teachers NOT to do in their teaching (alpha = 0.62 for 2016-17). We tried several models to study how these teacher-level variables help explain the variation in their students’ learning gains. However, there are some inconsistencies between the results from 2015-16 and 2016-17. For example, the 2015-16 data showed that Carbon TIME featured practices have a positive effect on students’ learning (p = 0.057). However, this effect is not statistically significant anymore in the 2016-17 data. Instead, the 2016-17 data showed some significant interaction effect between teachers’ Carbon TIME featured practices and the effect of students’ prior knowledge (measured by students’ pre-tests). Specifically, the positive effect of Carbon TIME featured practices is significantly greater for low-score students. There are also some findings for which we are still working on possible explanations. For both 2015-16 and 2016-17, we find there is a statistically significant interaction effect between teachers’ grading responses and students’ prior knowledge. The interaction effect is saying that teachers who graded very precisely tend to favor more with better students (students with higher pretests) in terms of learning growth. We have some potential explanation for this but need to validate that reasoning further. 𝑤ℎ𝑒𝑟𝑒 𝐺𝑎𝑖𝑛𝑠𝑐𝑜𝑟𝑒 𝑖𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑔𝑎𝑖𝑛𝑠𝑐𝑜𝑟𝑒 𝑓𝑜𝑟 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑖 𝑤ℎ𝑜𝑠𝑒 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑖𝑠 𝑗; 𝐹𝑅𝐿 𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑐𝑒𝑛𝑡 𝑜𝑓 𝑓𝑟𝑒𝑒 𝑟𝑒𝑑𝑢𝑐𝑒𝑑 𝑙𝑢𝑛𝑐ℎ; 𝑢 0𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑒𝑓𝑓𝑒𝑐𝑡 𝑓𝑜𝑟 𝑡𝑒𝑎𝑐ℎ𝑒𝑟 𝑗. Actual/Observed average gain score Model A predicted average gain score Model B predicted average gain score Figure 2 Average gain score for each teacher Note: the grey area shows a 68% confidence interval for the average gain score Acknowledgements Thank you to the entire Carbon TIME team. This research is supported by grants from the National Science Foundation: A Learning Progression-based System for Promoting Understanding of Carbon-transforming Processes (DRL 1020187), and Sustaining Responsive and Rigorous Teaching Based on Carbon TIME (NSF 1440988 ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. References Frank, K. A. (1998). Chapter 5: Quantitative Methods for Studying Social Context in Multilevels and Through Interpersonal Relations. Review of Research in Education, 23(1), 171–216. https://doi.org/10.3102/0091732X023001171. Goldhaber, D., Walch, J., & Gabele, B. (2014). Does the model matter? Exploring the relationship between different student achievement-based teacher assessments. Statistics and Public Policy, 1(1), 28-39.