Download presentation
Presentation is loading. Please wait.
Published byStanley Lawson Modified over 9 years ago
1
Measurement Validity & Reliability
2
Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct Validity l Measurement Validity Measurement Validity Measurement Validity l Construct Validity Tools – Conceptual –The Nomological Network The Nomological NetworkThe Nomological Network –The Multitrait-Multimethod Matrix The Multitrait-Multimethod MatrixThe Multitrait-Multimethod Matrix l Threats to Measurement Construct Validity Threats to Measurement Construct Validity Threats to Measurement Construct Validity
3
Measurement - Validity & Reliability Reliability l Measurement Error Measurement Error Measurement Error l Levels of Measurement (data type) Levels of Measurement (data type) Levels of Measurement (data type) l Relationship – Validity & Reliability Relationship – Validity & Reliability Relationship – Validity & Reliability l Theory of Reliability Theory of Reliability Theory of Reliability l Types of Reliability Types of Reliability Types of Reliability l Reliability Summary Reliability Summary Reliability Summary
5
Construct Validity l Trochim & Donnelly define as: –Refers to the degree inferences can legitimately be made from the operationalizations in the study to the theoretical constructs on which those operationalizations are based. l Others have defined more narrowly, such as: –Refers to the nature of the psychological construct or characteristic being measured. l Two broad perspectives related to Construct Validity – Definitionalist & Relationalist
6
Construct Validity Theory Observation CauseConstructEffectConstruct ProgramObservations What you do What you see What you think cause-effect construct program-outcome relationship Can we generalize to the constructs?
7
The Analogy to Law Definitionalist Perspective (as opposed to the Relationalist Perspective)
8
The Analogy to Law Definitionalist Perspective The truth…
9
The Analogy to Law Definitionalist Perspective The truth… the whole truth,
10
The Analogy to Law Definitionalist Perspective The truth … the whole truth, and nothing but the truth.
11
In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet
12
In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct…
13
In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct,
14
In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct, and nothing but the construct.
15
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D
16
What Is the Goal? Other construct: A Other construct: C Other construct: B Other construct: D Didn’t measure any of the construct The construct
17
What Is the Goal? Other construct: A Other construct: C Other construct: B Measure part of the construct and part of construct B The construct Other construct: D
18
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure part of the construct and nothing else
19
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure all of the construct and nothing else.
20
The Problem l Definitionalist’s perspective often does not work in social science. (“all” & “nothing but”) l Concepts are not mutually exclusive. –They exist in a web of overlapping meaning. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l To enhance construct validity, show where the construct is in its broader network of meaning. –Next slides demonstrate this concept.
21
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D The construct is slightly related to the other four.
22
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C and constructs B and D are related to each other.
23
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C are not related to constructs B and D.
24
Example: You Want to Measure Self -esteem
25
Example: Distinguish From... Self- esteem Self-worth Confidence Self- disclosure Openness
26
To Establish Construct Validity l Set the construct within a semantic net (net of meaning). l Provide evidence that you control the operationalization of the construct. –That your theory has some correspondence with reality – explain the operationalization. l Provide evidence that your data support the theoretical structure. –Constructs that should be more related, are; constructs that should be less related, are.
28
Measurement Validity Types
29
Measurement Validity l Validity is the most important idea to consider when preparing/selecting a measurement instrument. –Quality of instrument critical for the conclusions researchers draw. l Researchers use a number of procedures to ensure that the inferences they draw, based on the data collected are valid and reliable. –Validity refers to the appropriateness, meaningfulness, correctness, and usefulness of the inferences a researcher makes. –Reliability refers to the consistency of scores or answers from one administration of an instrument to another.
30
Measurement Validity Types l Translational Validity –Face validity –Content validity l Criterion-related validity –Predictive validity –Concurrent validity –Convergent validity –Discriminate validity
31
Face Validity l The measure seems valid “on its face”. l A judgment call. l Probably the weakest form of measurement validity. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet
32
Content Validity l Refers to the content of the instrument. –How appropriate is the content? –How comprehensive? –Does it get at the intended variable? –How adequately does the sample of items/questions represent the content? l Check the measure against the relevant content domain. –Standards may represent the content domain. –The content domain is not always clear (i.e., self-esteem) l Involves some judgment. –Can make this relatively rigorous with a systematic check. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Content domain
33
Criterion-Related Validity There are Multiple Forms l Refers to the relationship between scores obtained using the instrument and scores obtained using one or more other instruments or measures (the criterion). l Empirically based. l Questions to ask: –How strong is this relationship –How well do scores estimate present or predict future performance. l Specific types of Criterion-related validity are explored on the next slides.
34
Predictive Validity l A type of criterion-related validity l Look at measure’s ability to predict something it should be able to predict. l Examples: –Driver written test should predict hands-on driving test. –Math ability test may predict how well a person does in an engineering profession. –GRE scores predict how well a person does in graduate school. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet TestCriterion
35
Concurrent Validity l A type of criterion-related validity l Does the measure distinguish between groups that it should distinguish between? l Measurements taken at the same, or nearly same time. l Examples: –A measure of empowerment should show higher scores for managers and lower scores for their workers. –Attitude measurement should match with current behavior and distinguish good attitudes from bad.
36
Convergent and Discriminant Validity (to determine measurement validity)
37
The Convergent Principle Measures of constructs that are related to each other should be strongly correlated.
38
How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4
39
How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 You theorize that the items all reflect self-esteem.
40
How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 1.00.83.89.91.831.00.85.90.89.851.00.86.91.90.861.00
41
How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 1.00.83.89.91.831.00.85.90.89.851.00.86.91.90.861.00 The correlations provide evidence that the items all converge on the same construct.
42
The Discriminant Principle Measures of different constructs should not correlate highly with each other.
43
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2
44
How It Works Theory Self- esteem construct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 You theorize that you have two distinguishable constructs.
45
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11
46
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11 The correlations provide evidence that the items on the two tests discriminate.
47
Putting It All Together Convergent and Discriminant Validity
48
Theory Self-esteemconstruct Locus-of-controlconstruct We have two constructs. We want to measure self-esteem and locus of control.
49
Theory Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 We have two constructs. We want to measure self-esteem and locus of control. For each construct, we develop three scale items; our theory is that items within the construct will converge and Items across constructs will discriminate.
50
Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3
51
Theory Observation Self-esteemConstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 Green and red correlations are Convergent; White are Discriminant.
52
Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 The correlations support both convergence and discrimination, and therefore construct validity.
53
Relationships Among Validity Types l Almost all can be considered a sub-case of construct validity. l Construct validity is the most encompassing and comprehensive standard. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l Three specific steps for establishing measurement construct validity – see next slide.
54
To Establish Construct Validity (for measurements) l Generally - Three specific steps in obtaining evidence for measurement construct validity: –The variable being measured is clearly defined. –Hypotheses are formed »Based on a theory underlying the variable about those who possess a lot, versus a little of the variable measured. –The hypotheses are tested both logically and empirically. l Example: Next slide
55
To Establish Construct Validity - Example (for measurement instrument) l Researcher interested in developing an instrument to measure honesty. –First – Define honesty –Second – Theorize about how honest people behave versus dishonest people. –Third: Tested both logically and empirically »Develops and administers measurement instrument – collects data. »Gives all participants an opportunity to be honest or dishonest (leave $ out) l If instrument has construct validity should see relationship between scores and behavior.
57
The Nomological Network
58
What Is the Nomological Net? l An idea developed by Cronbach, L. and Meehl, P. (1955). –Construct Validity in Psychological Tests, Psychological Bulletin, 52, 4, 281-302. l Nomological is derived from Greek and means “lawful”. l Links interrelated theoretical ideas with empirical evidence. l A view of construct validity. –Cronback & Meehl argued that a Nomological Network had to be developed to show construct validity.
59
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct Construct Construct ConstructConstruct
60
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct...their observable manifestations,
61
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study,...their observable manifestations, and the interrelationships among and among these. construct ObsObs ObsObs ObsObs ObsObs ObsObs construct construct constructconstruct
62
What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas
63
What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas Observed Level: Measures, Programs
64
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Scientifically, to make clear what something is means to set forth the laws in which it occurs. This interlocking system of laws is the Nomological Network.
65
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... observable properties or quantities to each other.
66
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... different theoretical constructs to each other.
67
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... theoretical constructs to observables.
68
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct At least some of the laws in the network must involve observables.
69
PrinciplesPrinciples ObsObsObsObs ConstructConstructConstruct "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs. ObsObs
70
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or of increasing the definiteness of its components. "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs... ObsObs
71
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The basic rule for adding a new construct or relation to a theory is that it must generate laws (nomologicals) confirmed by observation... ObsObs Construct
72
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or reduce the number of nomologicals required to predict some observables. ObsObs Construct
73
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations that are qualitatively different "overlap" or "measure the same thing"...
74
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations which are qualitatively different "overlap" or "measure the same thing"... if their positions in the nomological net tie them to the same construct variable.
75
The Main Problem with the Nomological Net It doesn't tell us how we can assess the construct validity in a study.
77
The Multitrait- Multimethod Matrix
78
What Is the MTMM Matrix? An approach developed by Campbell, D. and Fiske, D. (1959). Convergent and Dicriminant Validation by the Multitrait- Multimethod Matrix. 56, 2, 81-105. A matrix (table) of correlations arranged to facilitate the assessment of construct validity An integration of both convergent and discriminate validity
79
What Is the MTMM Matrix? Assumes that you measure each of several concepts (trait) by more than one method. Very restrictive -- ideally you should measure each concept by each method. Arranges the correlation matrix by concepts within methods.
80
PrinciplesPrinciples Convergence: Things that should be related are. Divergence/Discrimination: Things that shouldn't be related aren't.
81
A Hypothetical MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85)
82
(.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 The reliability diagonal
83
Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Validity diagonals
84
(.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Monomethod heterotrait triangles
85
(.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod heterotrait triangles
86
Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Monomethod blocks
87
(.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod blocks
88
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Reliability should be highest coefficients.
89
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Convergent validity diagonals should have strong r's.
90
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Convergent: The same pattern of trait interrelationship should occur in all triangles (mono and heteromethod blocks).
91
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Discriminant: A validity diagonal should be higher than the other values in its row and column within its own block (heteromethod).
92
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Discriminant: A variable should have higher r with another measure of the same trait than with different traits measured by the same method.
93
AdvantagesAdvantages l Addresses convergent and discriminant validity simultaneously l Addresses the importance of method of measurement l Provides a rigorous standard for construct validity
94
DisadvantagesDisadvantages l Hard to implement l No known overall statistical test for validity –In some cases Structural Equation Modeling may provide an overall test l Requires judgment call on interpretation
96
Threats to Construct Validity (Design Threats)
97
Inadequate Pre-Operational Explication of Constructs Preoperational = before translating constructs into measures or treatments In other words, you didn't do a good enough job of defining (operationally) what you mean by the construct. Solution: More thinking Use methods such as concept mapping Expert opinions to better define the construct.
98
Mono-Operation Bias Pertains to the treatment or program (independent variable) Used only one version of the treatment or program. This typically results in an under representation of the construct and lowers construct validity Challenge - Not always possible to have alternative versions. Try at different times, places.
99
Mono-Method Bias Pertains to the measures or outcomes (dependent variable) Only operationalized measures in one way The method used may influence results. l Solution: Implement multiple measures of key constructs, and demonstrate the measures behave as theorized. –Using a pilot study allows issues to be determined before implementing full study. l Feasibility / practicality of using multiple methods can be an issue.
100
Confounding Constructs & Levels of Constructs - (threat to construct validity) Operationalization of treatment construct Wrong conclusions related to level of treatment – not the treatment itself. Really a dosage issue -- related to mono- operation because you only looked at one or two levels. Educational program implemented for 1 hr day Conclude no impact but 2 hours may have worked Drug dosage – dosage level may be the issue related to whether it works or not.
101
Interaction of Different Treatments People get more than one treatment. This happens all the time in social ameliorative studies. Again, the construct validity issue is largely a labeling issue.
102
Interaction of Testing and Treatment Does the testing itself make the groups more sensitive or receptive to the treatment? This is a labeling issue. It differs from testing threat to internal validity; here, the testing interacts with the treatment to make it more effective; there, it is not a treatment effect at all (but rather an alternative cause).
103
Restricted Generalizability Across Constructs You didn't measure your outcomes completely. You didn't measure some key affected constructs at all (for example, unintended effects).
104
Threats to Construct Validity (Social Threats)
105
Hypothesis Guessing (threat to construct validity) People guess the hypothesis and respond to it rather than respond "naturally“. People want to look good or look smart. This is a construct validity issue because the "cause" will be mislabeled. You'll attribute effect to treatment rather than to good guessing.
106
Evaluation Apprehension (threat to construct validity) l Perhaps their apprehension makes them consistently respond poorly -- you mislabel this as a negative treatment effect.
107
Experimenter Expectancies (threat to construct validity) The experimenter can bias results consciously or unconsciously. Bias becomes confused (mixed up with) the treatment; you mislabel the results as a treatment effect.
108
Threats to Construct Validity (Conclusion) l Design Threats l Social Threats
110
Measurement Error
111
True Score Theory 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Observedscore = Trueability + Randomerror T e + X
112
The Error Component T e + X Two components:
113
The Error Component T e + X Two components: erererer
114
The Error Component T e + X Two components: Random error Random error erererer
115
The Error Component T e + X Two components: Random error Random error erererer eseseses
116
The Error Component T e + X Two components: Random error Random error Systematic error Systematic error erererer eseseses
117
The Revised True Score Model T erererer + X eseseses +
118
What Is Random Error? l Any factors that randomly affect measurement of the variable across the sample. l For instance, each person’s mood can inflate or deflate performance on any occasion. l Random error adds variability to the data but does not affect average performance for the group.
119
Random Error X Frequency The distribution of X with no random error
120
Random Error X Frequency The distribution of X with no random error The distribution of X with random error
121
Random Error X Frequency The distribution of X with no random error The distribution of X with random error Notice that random error doesn’t affect the average, only the variability around the average.
122
What Is Systematic Error?
123
Systematic Error: Any factors that systematically affect measurement of the variable across the sample. l Systematic error = bias. l For instance, asking questions that start “do you agree with right-wing fascists that...” will tend to yield a systematic lower agreement rate. l Systematic error does affect average performance for the group.
124
Systematic Error X Frequency The distribution of X with no systematic error
125
Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error
126
Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error Notice that systematic error does affect the average; we call this a bias.
127
Reducing Measurement Error l Pilot test your instruments -- get feedback from respondents. l Train your interviewers or observers l Make observation/measurement as unobtrusive as possible. l Double-check your data. l Triangulate across several measures that might have different biases.
129
Levels of Measurement
130
The Levels of Measurement l Nominal l Ordinal l Interval l Ratio
131
Some Definitions VariableVariable
132
VariableVariable AttributeAttributeAttributeAttribute
133
GenderGenderVariableVariable AttributeAttributeAttributeAttribute
134
GenderGender FemaleFemaleMaleMale VariableVariable AttributeAttributeAttributeAttribute
135
Qualities of Variables l Exhaustive -- Should include all possible answerable responses. l Mutually exclusive -- No respondent should be able to have two attributes simultaneously.
136
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable
137
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable Relationship
138
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values
139
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes RepublicanIndependentDemocrat
140
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes Variable RepublicanIndependentDemocrat Party Affiliation
141
Why Is Level of Measurement Important? l Helps you decide what statistical analysis is appropriate on the values that were assigned l Helps you decide how to interpret the data from that variable
142
Nominal Measurement l The values “name” the attribute uniquely. l The value does not imply any ordering of the cases, for example, jersey numbers in football. l Even though player 32 has higher number than player 19, you can’t say from the data that he’s greater than or more than the other.
143
Ordinal Measurement When attributes can be rank-ordered… l However - Distances between attributes do not have any meaning, for example, code Educational Attainment as: – 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college l Is the distance from 0 to 1 the same as 3 to 4? –We can’t say, there’s an order, but no real meaning between the distance between values.
144
Interval Measurement When distance between attributes has meaning, for example, temperature (in Fahrenheit) -- distance from 30-40 is same as distance from 70-80 l Note that ratios don’t make any sense -- 80 degrees is not twice as hot as 40 degrees (although the attribute values are).
145
Ratio Measurement l Has an absolute zero that is meaningful l Can construct a meaningful ratio (fraction), for example, number of clients in past six months l It is meaningful to say that “...we had twice as many clients in this period as we did in the previous six months.
146
The Hierarchy of Levels Nominal
147
Nominal Attributes are only named; weakest
148
The Hierarchy of Levels Nominal Attributes are only named; weakest Ordinal
149
The Hierarchy of Levels Nominal Attributes are only named; weakest Attributes can be ordered Ordinal
150
The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Ordinal
151
The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal
152
The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal
153
The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Absolute zero Ordinal
155
Relationship of Reliability and Validity l Validity requires Reliability. l Reliability does not necessarily imply Validity.
156
Reliability and Validity Reliable but not valid
157
Reliability and Validity Valid: Measures what it is intended to measure consistently
159
The Theory of Reliability
160
What Is Reliability l The “repeatability” of a measure l The “consistency” of a measure l The “dependability” of a measure
161
If a Measure Is Reliable... We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).
162
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).
163
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 But, if the scores are similar, why are they similar?
164
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 Recall from true score theory that... But, if the scores are similar, why are they similar?
165
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 The only thing common to the two measures is the true score, T. Therefore, the true score must determine the reliability.
166
Reliability Is... a ratio true level on the measure the entire measure
167
Reliability Is... a ratio variance of the true scores variance of the measure var(T) var(X) So, theoretically a measure that is perfectly reliable would have a value of “1” because the top and bottom values would be equal. However, remember there’s error in real measurements!
168
Reliability Is... a ratio variance of the true scores variance of the measure We can measure the variance of the observed score, X.
169
Reliability Is... variance of the true scores variance of the measure We can measure the variance of the observed score, X. But, how do we measure the true scores?
170
Reliability Is... variance of the true scores variance of the measure But, how do we measure the true scores? We can’t!
171
This Leads Us to... l We cannot calculate reliability exactly; we can only estimate it. l Each estimate attempts to capture the consequences of the true score in different ways.
173
Types of Reliability
174
Reliability of Consistency of What? l Inter-Rater: Observers or raters l Test-retest: Tests over time l Alternate Forms –Different versions of the same test l Split-halves –Estimate of alternate forms –Internal Consistency l K20 & Coefficient Alpha –Internal Consistency
175
Inter-Rater or Inter-Observer Reliability Object or phenomenon
176
Inter-Rater or Inter-Observer Reliability Observer 1 Object or phenomenon
177
Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon
178
Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon = ?
179
Inter-Rater or Inter-Observer Reliability l Are different observers consistent? l Can establish this outside of your study in a pilot study. l Can look at percent of agreement (especially with category ratings). l Can use correlation (with continuous ratings).
180
Test-Retest Reliability Time 1 Time 2
181
Test-Retest Reliability TestTest Time 1 Time 2 =
182
Test-Retest Reliability TestTest Time 1 Time 2 = Stability over time
183
Test-Retest Reliability l Measure instrument at two times for multiple persons. l Compute correlation between the two measures. l Assumes there is no change in the underlying trait between time 1 and time 2.
184
Parallel-Forms Reliability Time 1 Time 2
185
Parallel-Forms Reliability Form B Time 1 Time 2 Form A =
186
Parallel-Forms Reliability Form B Time 1 Time 2 Stability across forms Form A =
187
Parallel-Forms Reliability l Administer both forms to the same people. l Get correlation between the two forms. l Usually done in educational contexts where you need alternative forms because of the frequency of retesting and where you can sample from lots of equivalent questions.
188
Internal Consistency Reliability A few different ways to calculate Average inter-item correlation
189
Internal Consistency Reliability Test Average Inter-Item Correlation
190
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Average Inter-Item correlation
191
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation
192
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation -Note. Does not include same item correlations (all = 1).89 (average)
193
Average item-total correlation Internal Consistency Reliability A few different ways to calculate
194
Test Average item-total correlation Internal Consistency Reliability
195
Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
196
1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00.84.88.86.87.83.821.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
197
1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00.84.88.86.87.83.821.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation.85 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
198
Split-half correlations Internal Consistency Reliability A few different ways to calculate
199
Test Split-half correlations Internal Consistency Reliability
200
Split-half correlations Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
201
Split-half correlations Item 1 Item 3 Item 4 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
202
Split-half correlations Item 2 Item 5 Item 6 Internal Consistency Reliability Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
203
Split-half correlations.87 Internal Consistency Reliability Item 2 Item 5 Item 6 Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
204
Cronbach’s alpha ( ) Internal Consistency Reliability A few different ways to calculate
205
Test Cronbach’s alpha ( ) Internal Consistency Reliability
206
Cronbach’s alpha ( ) Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
207
Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
208
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85 Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
209
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85 =.85 Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
210
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85 =.85 Like the average of all possible split-half correlations Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
211
Internal Consistency Reliability - Summary l Average inter-item correlation l Average item-total correlation l Split-half reliability –Spearman-brown formula l K20 – Kudor Richardson (1937) –Dichotomously scored items Cronbach’s alpha ( ) (1951) Cronbach’s alpha ( ) (1951) –More general form of K20 –Can be used for dichotomously or more continuous scale
212
General Rules of Thumb for “r” as a Cronbach alpha l In general:.90 - high reliability.80 - moderate to high.70 - low to moderate.60 - unacceptable
213
Reliability Summary l l Different types of instruments have different levels of reliability. l l Standardized multiple-choice – –Typically.85-.95 l l Open ended questions – –Typically.65-.80 l l Portfolio Scoring – –.40 -.60 l l “The more important and the less reversible is the decision about an individual based on the instrument, the higher the reliability should be” (Nitko, 2004). – –Particularly relevant in the case of student assessments.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.