Download presentation
Presentation is loading. Please wait.
Published byPhebe Hensley Modified over 9 years ago
1
Validity/Reliability Matters Really? Beverly Mitchell, Kennesaw State University
2
Can a test be valid and not be reliable? Beverly Mitchell, Kennesaw State University
3
Can a test be reliable and not be valid? Beverly Mitchell, Kennesaw State University
4
Justifiable Relevant True to its purpose (consistently) Validity Beverly Mitchell, Kennesaw State University
5
Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University
6
Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University
7
Design: Creating the Instrument 1-Inference 2-Complexity Beverly Mitchell, Kennesaw State University
8
Inference LowHigh Beverly Mitchell, Kennesaw State University
9
High Inference To draw a conclusion To guess, surmise To suggest, hint Beverly Mitchell, Kennesaw State University
10
Low Inference Straightforward Language = precise & targeted Clear – no competing interpretations of words No doubt as to what point is being made Beverly Mitchell, Kennesaw State University
11
Inference LowHigh Beverly Mitchell, Kennesaw State University
12
Complexity LowHigh Beverly Mitchell, Kennesaw State University
13
High Complexity Complicated Comprised of interrelated parts or sections Developed with great care or with much detail Beverly Mitchell, Kennesaw State University
14
Low Complexity Simplistic Plain Unsophisticated Beverly Mitchell, Kennesaw State University
15
Complexity LowHigh Beverly Mitchell, Kennesaw State University
16
Low High Inference Complexity Low High How They Are Related Beverly Mitchell, Kennesaw State University
17
Low High Inference Complexity Low High Designing the Instrument Beverly Mitchell, Kennesaw State University
18
Low High Inference Complexity Low High Due “Yesterday”! Beverly Mitchell, Kennesaw State University
19
Low High Inference Complexity Low High “Overachieving” Beverly Mitchell, Kennesaw State University
20
Low High Inference Complexity Low High How Much Error Are You Willing to Risk? Error Beverly Mitchell, Kennesaw State University
21
Low High Inference Complexity Low High Compromise Beverly Mitchell, Kennesaw State University
22
Does the OBSERVED Behavior = True Behavior? Observed SCORE ≠ TRUE SCORE E R R O R Beverly Mitchell, Kennesaw State University
23
Design: Creating the Instrument 1-Inference General Rubric - high Qualitative analytic rubric – low 2-Complexity Easy to develop – question worthiness, guidance, single interpretation - low Time to develop – labor intensive, onerous, long - high Beverly Mitchell, Kennesaw State University
24
Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University
25
Application Issues Designated Use Limitations/Conditions Beverly Mitchell, Kennesaw State University
26
Application Issues Designated Use Don’t borrow from neighbor! Beverly Mitchell, Kennesaw State University
27
Application Issues Limitations/Conditions One size does not fit all or apply to all circumstances Beverly Mitchell, Kennesaw State University
28
Ways to Increase Probability for Accuracy Compare language: standards & concepts The concepts/expectations in the standards are apparent in the assessments – same depth and breadth Good example of Content Validity Behavior (performance) expected in the standard matches the performance expected in the assessment – i.e., knowledge of…demonstrating skill… Identify Key/Critical items/concepts to evaluate Give it away for analysis (many eyes) Invite external “expert” review Be receptive to feedback Surveys from P-12 partners, candidates Regular evaluation and analysis: revise, revise, revise Awareness of design and application issues Beverly Mitchell, Kennesaw State University
29
Ways to Increase Reliability Begin with a valid instrument Two reliability issues: Reliability of the instrument: repeated use of instrument by same evaluators If problematic: revise, re-think, abandon Reliability of the scoring: performance rated same by different evaluators, i.e., objectivity If problematic: ensure qualifications of evaluators, check rubric, check language, minimize generalized concepts applied to all subject areas Train evaluators frequently Beverly Mitchell, Kennesaw State University
30
AN APPLICATION: A KSU Workshop (Handouts Available) Th irty experienced teachers participated in a daylong workshop to help us evaluate three student teaching observation rating forms. Beverly Mitchell, Kennesaw State University
31
Three Instruments Traditional Candidate Performance Instrument (CPI) Observation of Student Teaching. Observer is asked to indicate strengths and weaknesses and areas for improvement in three broad outcomes (Subject matter, Facilitation of Learning, and Collaborative Professional). Modified CPI Observation of Student Teaching (Observer is asked to explicitly rate each proficiency within each outcome and then provide narrative indicating any strengths, weaknesses, suggestions for improvement. Formative Analysis Class Keys: Observer is asked to rate 26 elements from Georgia Department of Education’s Class Keys. No required narrative. Beverly Mitchell, Kennesaw State University
32
Generally we were interested in two areas………………. Validity/Accuracy – Which instrument provides us the best inference about the present of positive behaviors (proficiencies) we deem important? AND Reliability/Consistency – Which instrument demonstrates the best inter-rater reliability? Beverly Mitchell, Kennesaw State University
33
Study Design InstrumentGroup 1Group 2Group 3 Period 1:Traditional CPI-Narrative Video AVideo BVideo C Period 2: Modified CPI Rating and Narrative Video BVideo CVideo A Period 3: Class Key Formative Analysis Video CVideo AVideo B Beverly Mitchell, Kennesaw State University
34
Reliability Strongest inter-rater agreement between Modified CPI with performance level rating followed by Class Keys Formative Assessment Instrument with a performance level rating. Very little agreement between behaviors noted in Traditional CPI narratives and no performance level ratings were available. Probably not a reliable instrument for rating student teaching behaviors. Beverly Mitchell, Kennesaw State University
35
Validity Both the traditional CPI and Modified CPI are explicitly aligned with institutional (and other) standards but the Traditional CPI is a global assessment and the Modified CPI requires a rating and narrative for each proficiency. However, the traditional CPI has not demonstrated reliability….so Participants were also asked to provide information about the language, clarity, ease of use for all instruments. Beverly Mitchell, Kennesaw State University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.