Student evaluations of teaching Are they reliable? Are they valid?
1920s: first use 1960s: prevalent use Last week of classes, before exam
High degree of reliability? 2x + 5y = 1 1 1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 a P(a) Y=f(x) VS Compared one instructor giving several courses to one course taught by different instructors Variance explained by ‘course’ (6%) < Variance explained by ‘instructor’ (40%) e.g. Gillmore et al 1978, Marsh 1982, Marsh and Roche 1997, Rindermann and Schofield 2001 But, design doesn’t account for all ‘levels’, that is… where are the students?
High degree of reliability? Cross classified multilevel analysis. Variance explained by ‘student’ (25%) = Variance explained by ‘instructor’ (25%) 50% of variance unexplained e.g. Spooren 2010, Rantanen 2013, Staufenbiel et al 2016 But, only focused on 1 dependent variable
High degree of reliability? Planning and presentation Interaction with students Interestingness and relevance Difficulty and complexity Overall rating of course Number of evaluations = 4224 by 480 students Variance explained by ‘instructor and course’ = 16-35% Higher reliability in teacher-centered and lecture-based courses Variance explained includes variables like room size, class size, room location, course topic Variance explained by ‘students’ = 11-21% Cross classified multilevel analysis. Feistauer and Richter 2016 But, is this high enough upon which to base administrational and instructional decisions? What about validity?
Validity? Do student evaluations of teaching evaluate teaching effectiveness? Cross classified multilevel analysis.
Validity? Do student evaluations of teaching evaluate teaching effectiveness? ??? Some things: We have some idea of what effective teachers DO while teaching IF you’ve figured out how to assess teaching effectiveness, then WHEN should you do it?
Validity? Do student evaluations of teaching evaluate teaching effectiveness? Are students able to assess teaching effectiveness (or even just ‘those things that we do that are good for teaching)? (Reliability studies: Variance explained by ‘students’ = 11-21% Some variance explained by ‘course’ and not ‘instructor’)
Mode (online vs in class) Validity? Do student evaluations of teaching evaluate teaching effectiveness? What else can the results of student evaluations predict? Instructor Age Reputation Gender Sexual orientation Presence during eval Student Expected final grade Interest in topic GPA Level of anonymity Gender Course Electivity Time of day Topic Year level Mode (online vs in class) Class size
Validity? Do student evaluations of teaching evaluate teaching effectiveness? What else can the results of student evaluations predict? Other considerations: Student evaluations results do not differ between those done at the beginning of the semester and those done at the end For even standardised metrics, like ‘time to grade papers’, studies show that gender of instructor makes a difference (e.g. female instructors are reported to take longer, even if they don’t)
Validity? Do student evaluations of teaching evaluate teaching effectiveness? What else can the results of student evaluations predict? What CAN student evaluations tell us? Student enjoyment Student interest What CAN’T student evaluations tell us? Amount and accuracy of course content Instructor’s knowledge Instructor’s competency Instructor grading practices Instructor methods of delivery (Reviewed by HEQCO 2008)