An Exploratory Analysis of Teaching Evaluation Data Michael D. Martinez Department of Political Science College of Liberal Arts and Sciences University of Florida August 17, 2011
Questions What does our teaching evaluation instrument actually tell us about our teaching? Are the items that students use to evaluate us actually measuring different things? Do the items in the teaching evaluation instrument actually produce a reliable scale? How much, on average, are teaching evaluations affected by level of instruction and size of class?
Teaching Evaluation form from a social science perspective Closed and open ended questions Questions Format AskedPublicly Visible Part I Q 1-7Closed-endedUF wideYes Part I Q 8-9Closed-endedCLAS onlyNo Part I Q 10Closed-endedUF wideYes Part I Q 11-15Closed-endedUF wideNo Part II Q 1-5Open-endedUF wideNo
Closed Ended Questions
Open ended questions
Data From CLAS Fall 1995 through Spring sections Only includes Publicly Visible Data Excludes CLAS items Excludes “control” variables Q11-15 Excludes open-ended questions
What does our teaching evaluation instrument actually tell us about our teaching? Are the items that students use to evaluate us actually measuring different things? Probably Not Students act as though they develop an attitude about the class and rank it on almost all items based on that attitude. Do the items in the teaching evaluation instrument actually produce a reliable scale? Yes
Inter-item correlations (CLAS Data, Fall 1995 through Spring 2010) Q1Q2Q3Q4Q5Q6Q7Q10 Q Q Q Q Q Q Q Q Cronbach’s alpha = 0.978
How much, on average, are teaching evaluations affected by level of instruction and size of class? SOME, but less than might be expected. Q10 = a + b 1 Lower + b 2 Grad + b 3 Log Enrollment + e LowerGrad level level level01
How much, on average, are teaching evaluations affected by level of instruction and size of class? SOME, but less than might be expected. Q10 = a + b 1 Lower + b 2 Grad + b 3 Log Enrollment + e b 1 will be the average effect of teaching a lower division course, relative to an upper division course, controlling for the size of the class. b 2 will be the average effect of teaching a graduate course, relative to an upper division course, controlling for the size of the class. b 3 will be the average effect of the log of class size, controlling for the level of the class.
Regression of Instructor Evaluation (Q10) on Level of Course and Class size (log) CLAS Lower (.004) Graduate.121 (.007) Lg enroll (.003) Constant (.008) R2R2.052 N of cases Entries are unstandardized regression coefficients, with standard errors in parentheses.
Regression of Instructor Evaluation (Q10) on Level of Course and Class size (log) CLASHumanitiesSoc and Beh Sci Phys and Math Sci Lower (.004) (.006) (.010) (.007) Graduate.121 (.007).095 (.014).074 (.013).273 (.014) Lg enroll (.003) (.005) (.005) (.004) Constant (.008) (.016) (.017) (.013) R2R N of cases
Expected Values: Humanities SizeLowerUpperGrad Expected Values: Phys and Math SizeLowerUpperGrad
Expected Values: Soc and Beh SizeLowerUpperGrad Expected Values: Political Sci SizeLowerUpperGrad
Morals of the Story We have a reliable teaching evaluation instrument which is definitely measuring something. Sections that are evaluated positively on one item tend to be evaluated positively on other items. Reliability isn’t validity. Response set could be a problem, but the cost of fixing it would be a disruption in the continuity of the data that we have. Like GRE scores, these scores should be regarded as a good measure, but not the only measure.
Morals of the Story Most variation in course evaluations is NOT accounted for by level of instruction or class size. Both class size and level of instruction matter, but should not be regarded as excuses for really low evaluations.
Darts and Laurels Laurel – Brian Amos, my graduate research assistant, for help with analyzing these data. Laurel – Dave Richardson and CLAS office, for making these data available to me. Dart – Academic Affairs, for not making these data publicly available in a usable form to everyone. Laurel – Academic Affairs, for (finally) creating a link to allow promotion candidates and award nominees to generate teaching evaluation reports in Word automatically with just a few clicks.
Darts and Laurels Dart – Academic Affairs, for not posting the CLAS-only items, and not posting the teaching evaluations of graduate students who taught their own courses. Laurel – Academic Affairs, for an otherwise much improved website showing evaluations Laurel – You, for patiently listening Thanks!