Download presentation
Presentation is loading. Please wait.
Published byDonald McCormick Modified over 6 years ago
1
Understanding Your Course Ratings… What to Note and What to Ignore
Mequon Faculty Institute May, 2017 Introduce Paul – course rating king; We’ve put together some tips…a scientific approach but there is also some artistry involved. Interpretation is both a science and an art.
2
“IN Psychology we measure men by their shadows.” LL Thurstone
Provides overall context and lens. You’ll see this theme coming out. Our course evaluations give effective feedback when data is viewed in the proper way. The purpose of this is giving you the right context in looking at it. Proper caution.
3
Five tips for understanding your evaluations
Measurement Error OBSERVED SCORE = TRUE SCORE + ERROR The first key to interpreting a student-faculty evaluation is to understand that there is error inherent in any measuring process – especially in the soft sciences. We try through various techniques to minimize the amount of error – including the way we structure our measurements – but never fully succeed. The college classroom at the end of a semester is very different from a controlled laboratory environment with random selection of subjects. We do have some metaphorically “dirty test tubes”. The feedback you receive on your evaluation reports are simply estimates of teaching effectiveness. These estimates have some degree of accuracy and some level of error – which may vary between instructors and courses. Classical True Score Theory posits the following model: OBSERVED SCORE = TRUE SCORE + ERROR
4
Potential sources of error
Time of day Subject matter Instructor gender and age Level of student Size of class Required or elective course Number of students responding Type of students responding Confounds
5
Five tips for understanding your evaluations
Measurement Error Reliability and Validity Technical Term for taking your temperature and then re-taking it to see if you get the same results. In educational and psychological measurement we attempt to build reliable and valid measuring tools. Reliability, which is a necessary but not sufficient pre-condition for validity, refers to the consistency of the measurement. As an example, think of taking your temperature and then re-taking it to see if you get the same result. This is checking the reliability of the thermometer. Concordia’s course evaluations do show relatively consistent results – both internally (the items correlate with each other) and across time. Validity, on the other hand, refers to whether we are measuring what we intend to measure. The simplest type of validity is called face validity – i.e. do the questions on the surface look like they are addressing what we are trying to measure. Establishing the full validity of an instrument is a very extensive process.
6
Five tips for understanding your evaluations
Measurement Error Reliability and Validity Conceptual Model a number of things of the same kind, growing or held together; a bunch: a cluster of grapes. Through statistical analysis we are able to put together items that cluster together (relate highly to each other) and see if we can identify meaningful concepts from these groupings. This is one way of establishing the validity of the instrument. We can also look at which items are the best predictors of an overall highly rated course and from this infer which items are most important to our students. The following model has been built (shown below)– the structure of which was fully determined through statistical techniques. The top of items in predicting overall success have been highlighted.
7
Structure 1. The course was well organized 2. The instructor answered questions clearly and completely 3. The instructor demonstrated the relevance of the course material 4. The instructor used effective learning strategies
8
Responsiveness The instructor answered questions clearly and
completely 2. The instructor provided timely feedback 3. The instructor was available and approachable 4. The instructor provided useful feedback 5. The instructor showed personal concern and interest in student learning
9
Rigor 1. The instructor challenged students to do their best work 2. The instructor set high expectations for success 3. The instructor demonstrated the relevance of the course material
10
Faith and Learning Given the subject matter, the instructor effectively related Christian faith to learning.
11
Five tips for understanding your evaluations
Measurement Error Reliability and Validity Conceptual Model Evaluating Results Response Rates Distribution of Responses Things to consider when interpreting your results. One of the biggest concerns of faculty is response rates on course evaluations and consequently the question of whether the data is meaningful and useful. To address this it should always be kept in mind that the number of responses is more meaningful than the actual response rate. As a rule of thumb, at least 5 responses in a class are needed to have any level of confidence in the results. Also make sure to look at the distribution of responses and their variability. The more agreement there is between students the more reliable the rating. Especially look for outliers – for example one student replying very differently than the rest. This can greatly impact the average on an item. It is as important, if not more important, to look at the distribution of responses along with the average. The following example illustrates this point:
12
Example Distribution of responses on an item
Strongly Agree = 3 (37.5%) Agree = 4 (50.0%) Somewhat Agree = 0 Somewhat Disagree = 0 Disagree = 1 (12.5%) Individual Average = 4.0 Department Average = 4.7 CUW Average = 4.6 4.43 without outlier
13
Five tips for understanding your evaluations
Measurement Error Reliability and Validity Conceptual Model Evaluating Results Response Rates Distribution of Responses Meaningful Differences Criterion vs Norm Meaningful Differences: When assigning meaning to an item’s results there are two methods to be used: Criterion referenced and norm referenced comparisons. In a criterion referenced comparison, is absolute rather than relative. A fixed scale value. In this case it’s 1 through 5 from disagree to strongly agree. Comparing yourself to a standard. If the highest standard on our scale is 5 and you get You are a 4.5 on a scale of 5. In a norm referenced comparison, item performance is compared to peer averages (averages within the department and averages of CUW as a whole).
14
Example distribution of responses on an item
5=Strongly Agree: 3 students (37.5%) 4=Agree: 4 students (50.0%) 3=Somewhat Agree: 0 students 2=Somewhat Disagree: 0 students 1=Disagree: 1 student (12.5%) Individual Average = 4.0 Department Average = 4.7 CUW Average = 4.6 Added the scale: Criterion referenced interpretation is that my average on this item is a 4 on a 5 point scale Norm referenced is that my score of 4, compared to dept avg and CUW avg is lower. 7/10s 6/10s A rule of thumb here – based on statistical theory – is that a difference of at least one half of a point is one that is meaningful or substantive. It’s a difference of the magnitude that a casual observer would notice when observing a course (Cohen reference). In order to establish a high level of confidence in a difference (statistical significance), 15 or more responses are needed and again at least a difference of one half of a point or more from the average. Courses which differ from the average by .3 or .4 points would be considered to have some level of difference from the comparison group - albeit relatively small. These are differences that would be visible only by a trained observer. The majority of differences we see on course evaluations are in this subjectively “small” range of .4 or less. Larger differences will be found the most in courses with small sample sizes. This reflects as much on the instability of the measure as opposed to actual differences. One very helpful tool can be used if you have multiple sections of a course. This allows you to see if what is being observed in one section is replicated in another. It’s a nice way of checking the reliability of your own data. There will be some differences between sections demonstrating that there is error in the evaluation process. A meaningful difference on a particular item in multiple sections gives added meaning.
15
Five tips for understanding your evaluations
Measurement Error Reliability and Validity Conceptual Model Evaluating Results Student Comments How do they help me better understand the quantitative ratings? Student comments can be very rich. Make sure to always remember that any comment represents a sample size of one. Look for themes or categories that may be consistently mentioned by students. As with objective responses, consistent, frequent responses have more value. Ask yourself: Is this giving me the flavor of what I’ve learned from my quantitative results? Think of it a little like a meal. Data is meat and potatoes, narrative are spices.
16
Sample #1 EE Paul will explain and describe the columns.
Where does this instructor seem to struggle? Where does this instructor seem to excel? How reliable is the data? Sample size? Size of standard deviation? In which one of our conceptual clusters might there be a problem according to this example? In which one of our conceptual clusters does this instructor excel? Use both norm-referenced and criterion referenced interpretations.
17
Sample #2 Where does this instructor excel? How reliable is the data? Use both norm-referenced and criterion referenced interpretations. Comparison of the two samples. Which one of the two samples shows more consistency? Which of the two samples represents overall better classroom performance than the other? One of these is a science course and one of these is a social science which is which do you think?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.