Download presentation
Presentation is loading. Please wait.
Published byKole Ion Modified over 9 years ago
1
Reliability and Validity checks S-005
2
Checking on reliability of the data we collect Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement Compare over time (test-retest) Item analysis Internal consistency Inter-rater agreement
3
Compare over time Test-Retest reliability One sample at two (or more) times Very convincing in theory Often hard to do in practice Time interval? Memory effects? Special sample? Correlation of time-1 answers with time-2 answers Other approaches are often approximations of this idea One sample at two (or more) times Very convincing in theory Often hard to do in practice Time interval? Memory effects? Special sample? Correlation of time-1 answers with time-2 answers Other approaches are often approximations of this idea
4
Split-half reliability Easier than test-retest checks Requires only one time point Works when there is a scale or set of questions on a single topic Divide the items into two sets (two halves) Correlate the scores on the two halves Often adjusted by the Spearman-Brown correction Gives us an estimate of the test-retest reliability Easier than test-retest checks Requires only one time point Works when there is a scale or set of questions on a single topic Divide the items into two sets (two halves) Correlate the scores on the two halves Often adjusted by the Spearman-Brown correction Gives us an estimate of the test-retest reliability
5
Item-analysis approaches When there is a set of questions about a single topic Examine the answers to each item How many answered correctly? Percent correct? Item difficulty. Or, if there is no “correct” answer, look at how the answers were distributed Agree / neutral / disagree Examine the “wrong” answers that are chosen Find items that are too hard or too easy Or those that have little variability (too boring? too trivial?) Do you really need these? Sometimes these are very important Test publishers tend to delete the “easy” and “hard” items Correlate the “item responses” with the “total responses” High correlations indicate consistency Low correlations indicate “different” or “weak” items Negative correlations indicate “something interesting” Confusing wording? The item doesn’t belong? Examine the answers to each item How many answered correctly? Percent correct? Item difficulty. Or, if there is no “correct” answer, look at how the answers were distributed Agree / neutral / disagree Examine the “wrong” answers that are chosen Find items that are too hard or too easy Or those that have little variability (too boring? too trivial?) Do you really need these? Sometimes these are very important Test publishers tend to delete the “easy” and “hard” items Correlate the “item responses” with the “total responses” High correlations indicate consistency Low correlations indicate “different” or “weak” items Negative correlations indicate “something interesting” Confusing wording? The item doesn’t belong?
6
Internal consistency reliability When there is a “scale” or set of questions on a single topic Cronbach’s coefficient alpha a measure of “internal consistency” Look at all of the items Check the “average correlation” Then adjust for the number of items Find items that do not correlate with others Check the item-total correlations If low, delete these or move them elsewhere Assess the overall internal consistency Cronbach’s coefficient alpha a measure of “internal consistency” Look at all of the items Check the “average correlation” Then adjust for the number of items Find items that do not correlate with others Check the item-total correlations If low, delete these or move them elsewhere Assess the overall internal consistency
7
Internal consistency reliability Comparing answers from different sources Compare similar questions that appear in different parts of the questionnaire Compare answers from different places during an interview Compare interview responses with questionnaire responses Compare questionnaires with actual observations Compare similar questions that appear in different parts of the questionnaire Compare answers from different places during an interview Compare interview responses with questionnaire responses Compare questionnaires with actual observations
8
Inter-rater agreement Useful in checking on coding open-ended answers, observations, etc. Try this on a sample or pilot study Check the overall percent agreement Sometimes we adjust for “chance agreement” -- Cohen’s Kappa A very important step in lots of studies If agreement is high, then okay to rely on one primary coder or rater If not high, then perhaps we need more than one rater Or perhaps we need to revise or clarify the coding rules Then check on things again There are often several iterations here Keep going until the agreement is acceptable Useful in checking on coding open-ended answers, observations, etc. Try this on a sample or pilot study Check the overall percent agreement Sometimes we adjust for “chance agreement” -- Cohen’s Kappa A very important step in lots of studies If agreement is high, then okay to rely on one primary coder or rater If not high, then perhaps we need more than one rater Or perhaps we need to revise or clarify the coding rules Then check on things again There are often several iterations here Keep going until the agreement is acceptable
9
Check out some examples Bayley Scales of Infant Development Inter-rater agreement example Internal consistency example Then try some clicker questions! Bayley Scales of Infant Development Inter-rater agreement example Internal consistency example Then try some clicker questions!
10
Observing students and teachers in classrooms. What type of reliability check is most important? 1.Inter-observer agreement (have more than one observer) 2.Time 1 - Time 2 (Observe at two or more times) 3.Consistency within the classroom sessions 4.Other 1.Inter-observer agreement (have more than one observer) 2.Time 1 - Time 2 (Observe at two or more times) 3.Consistency within the classroom sessions 4.Other
11
Coding transcripts from individual interviews What type of reliability check is most helpful? 1.Have multiple transcribers 2.Inter-rater agreement 3.Internal consistency checks 4.Other 1.Have multiple transcribers 2.Inter-rater agreement 3.Internal consistency checks 4.Other
12
Using answers from questionnaires. What type of reliability check is most important? 1.Inter-rater agreement 2.Internal consistency checks 3.Item-analysis checks 4.Other 1.Inter-rater agreement 2.Internal consistency checks 3.Item-analysis checks 4.Other
13
Using a mix of open-ended and closed- ended questions on a questionnaire. Why is this a good idea? 1.Internal consistency checks 2.Makes replying less boring 3.Terry has said this about 50 times, so it must be a good idea 4.Other 5.All of the above 1.Internal consistency checks 2.Makes replying less boring 3.Terry has said this about 50 times, so it must be a good idea 4.Other 5.All of the above
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.