Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measurement Validity & Reliability. Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct.

Similar presentations


Presentation on theme: "Measurement Validity & Reliability. Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct."— Presentation transcript:

1 Measurement Validity & Reliability

2 Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct Validity l Measurement Validity Measurement Validity Measurement Validity l Construct Validity Tools – Conceptual –The Nomological Network The Nomological NetworkThe Nomological Network –The Multitrait-Multimethod Matrix The Multitrait-Multimethod MatrixThe Multitrait-Multimethod Matrix l Threats to Measurement Construct Validity Threats to Measurement Construct Validity Threats to Measurement Construct Validity

3 Measurement - Validity & Reliability Reliability l Measurement Error Measurement Error Measurement Error l Levels of Measurement (data type) Levels of Measurement (data type) Levels of Measurement (data type) l Relationship – Validity & Reliability Relationship – Validity & Reliability Relationship – Validity & Reliability l Theory of Reliability Theory of Reliability Theory of Reliability l Types of Reliability Types of Reliability Types of Reliability l Reliability Summary Reliability Summary Reliability Summary

4

5 Construct Validity l Trochim & Donnelly define as: –Refers to the degree inferences can legitimately be made from the operationalizations in the study to the theoretical constructs on which those operationalizations are based. l Others have defined more narrowly, such as: –Refers to the nature of the psychological construct or characteristic being measured. l Two broad perspectives related to Construct Validity – Definitionalist & Relationalist

6 Construct Validity Theory Observation CauseConstructEffectConstruct ProgramObservations What you do What you see What you think cause-effect construct program-outcome relationship Can we generalize to the constructs?

7 The Analogy to Law Definitionalist Perspective (as opposed to the Relationalist Perspective)

8 The Analogy to Law Definitionalist Perspective The truth…

9 The Analogy to Law Definitionalist Perspective The truth… the whole truth,

10 The Analogy to Law Definitionalist Perspective The truth … the whole truth, and nothing but the truth.

11 In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet

12 In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct…

13 In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct,

14 In Terms of Construct Validity 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct, and nothing but the construct.

15 What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D

16 What Is the Goal? Other construct: A Other construct: C Other construct: B Other construct: D Didn’t measure any of the construct The construct

17 What Is the Goal? Other construct: A Other construct: C Other construct: B Measure part of the construct and part of construct B The construct Other construct: D

18 What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure part of the construct and nothing else

19 What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure all of the construct and nothing else.

20 The Problem l Definitionalist’s perspective often does not work in social science. (“all” & “nothing but”) l Concepts are not mutually exclusive. –They exist in a web of overlapping meaning. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l To enhance construct validity, show where the construct is in its broader network of meaning. –Next slides demonstrate this concept.

21 Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D The construct is slightly related to the other four.

22 Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C and constructs B and D are related to each other.

23 Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C are not related to constructs B and D.

24 Example: You Want to Measure Self -esteem

25 Example: Distinguish From... Self- esteem Self-worth Confidence Self- disclosure Openness

26 To Establish Construct Validity l Set the construct within a semantic net (net of meaning). l Provide evidence that you control the operationalization of the construct. –That your theory has some correspondence with reality – explain the operationalization. l Provide evidence that your data support the theoretical structure. –Constructs that should be more related, are; constructs that should be less related, are.

27

28 Measurement Validity Types

29 Measurement Validity l Validity is the most important idea to consider when preparing/selecting a measurement instrument. –Quality of instrument critical for the conclusions researchers draw. l Researchers use a number of procedures to ensure that the inferences they draw, based on the data collected are valid and reliable. –Validity refers to the appropriateness, meaningfulness, correctness, and usefulness of the inferences a researcher makes. –Reliability refers to the consistency of scores or answers from one administration of an instrument to another.

30 Measurement Validity Types l Translational Validity –Face validity –Content validity l Criterion-related validity –Predictive validity –Concurrent validity –Convergent validity –Discriminate validity

31 Face Validity l The measure seems valid “on its face”. l A judgment call. l Probably the weakest form of measurement validity. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet

32 Content Validity l Refers to the content of the instrument. –How appropriate is the content? –How comprehensive? –Does it get at the intended variable? –How adequately does the sample of items/questions represent the content? l Check the measure against the relevant content domain. –Standards may represent the content domain. –The content domain is not always clear (i.e., self-esteem) l Involves some judgment. –Can make this relatively rigorous with a systematic check. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Content domain

33 Criterion-Related Validity There are Multiple Forms l Refers to the relationship between scores obtained using the instrument and scores obtained using one or more other instruments or measures (the criterion). l Empirically based. l Questions to ask: –How strong is this relationship –How well do scores estimate present or predict future performance. l Specific types of Criterion-related validity are explored on the next slides.

34 Predictive Validity l A type of criterion-related validity l Look at measure’s ability to predict something it should be able to predict. l Examples: –Driver written test should predict hands-on driving test. –Math ability test may predict how well a person does in an engineering profession. –GRE scores predict how well a person does in graduate school. 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet TestCriterion

35 Concurrent Validity l A type of criterion-related validity l Does the measure distinguish between groups that it should distinguish between? l Measurements taken at the same, or nearly same time. l Examples: –A measure of empowerment should show higher scores for managers and lower scores for their workers. –Attitude measurement should match with current behavior and distinguish good attitudes from bad.

36 Convergent and Discriminant Validity (to determine measurement validity)

37 The Convergent Principle Measures of constructs that are related to each other should be strongly correlated.

38 How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4

39 How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 You theorize that the items all reflect self-esteem.

40 How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 1.00.83.89.91.831.00.85.90.89.851.00.86.91.90.861.00

41 How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 1.00.83.89.91.831.00.85.90.89.851.00.86.91.90.861.00 The correlations provide evidence that the items all converge on the same construct.

42 The Discriminant Principle Measures of different constructs should not correlate highly with each other.

43 How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2

44 How It Works Theory Self- esteem construct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 You theorize that you have two distinguishable constructs.

45 How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11

46 How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11 The correlations provide evidence that the items on the two tests discriminate.

47 Putting It All Together Convergent and Discriminant Validity

48 Theory Self-esteemconstruct Locus-of-controlconstruct We have two constructs. We want to measure self-esteem and locus of control.

49 Theory Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 We have two constructs. We want to measure self-esteem and locus of control. For each construct, we develop three scale items; our theory is that items within the construct will converge and Items across constructs will discriminate.

50 Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3

51 Theory Observation Self-esteemConstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 Green and red correlations are Convergent; White are Discriminant.

52 Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 1.00.83.89.02.12.09.831.00.85.05.11.03.89.851.00.04.00.06.02.05.041.00.84.93.12.11.00.841.00.91.09.03.06.93.911.00 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 The correlations support both convergence and discrimination, and therefore construct validity.

53 Relationships Among Validity Types l Almost all can be considered a sub-case of construct validity. l Construct validity is the most encompassing and comprehensive standard. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l Three specific steps for establishing measurement construct validity – see next slide.

54 To Establish Construct Validity (for measurements) l Generally - Three specific steps in obtaining evidence for measurement construct validity: –The variable being measured is clearly defined. –Hypotheses are formed »Based on a theory underlying the variable about those who possess a lot, versus a little of the variable measured. –The hypotheses are tested both logically and empirically. l Example: Next slide

55 To Establish Construct Validity - Example (for measurement instrument) l Researcher interested in developing an instrument to measure honesty. –First – Define honesty –Second – Theorize about how honest people behave versus dishonest people. –Third: Tested both logically and empirically »Develops and administers measurement instrument – collects data. »Gives all participants an opportunity to be honest or dishonest (leave $ out) l If instrument has construct validity should see relationship between scores and behavior.

56

57 The Nomological Network

58 What Is the Nomological Net? l An idea developed by Cronbach, L. and Meehl, P. (1955). –Construct Validity in Psychological Tests, Psychological Bulletin, 52, 4, 281-302. l Nomological is derived from Greek and means “lawful”. l Links interrelated theoretical ideas with empirical evidence. l A view of construct validity. –Cronback & Meehl argued that a Nomological Network had to be developed to show construct validity.

59 What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct Construct Construct ConstructConstruct

60 What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct...their observable manifestations,

61 What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study,...their observable manifestations, and the interrelationships among and among these. construct ObsObs ObsObs ObsObs ObsObs ObsObs construct construct constructconstruct

62 What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas

63 What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas Observed Level: Measures, Programs

64 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Scientifically, to make clear what something is means to set forth the laws in which it occurs. This interlocking system of laws is the Nomological Network.

65 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... observable properties or quantities to each other.

66 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... different theoretical constructs to each other.

67 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... theoretical constructs to observables.

68 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct At least some of the laws in the network must involve observables.

69 PrinciplesPrinciples ObsObsObsObs ConstructConstructConstruct "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs. ObsObs

70 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or of increasing the definiteness of its components. "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs... ObsObs

71 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The basic rule for adding a new construct or relation to a theory is that it must generate laws (nomologicals) confirmed by observation... ObsObs Construct

72 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or reduce the number of nomologicals required to predict some observables. ObsObs Construct

73 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations that are qualitatively different "overlap" or "measure the same thing"...

74 PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations which are qualitatively different "overlap" or "measure the same thing"... if their positions in the nomological net tie them to the same construct variable.

75 The Main Problem with the Nomological Net It doesn't tell us how we can assess the construct validity in a study.

76

77 The Multitrait- Multimethod Matrix

78 What Is the MTMM Matrix? An approach developed by Campbell, D. and Fiske, D. (1959). Convergent and Dicriminant Validation by the Multitrait- Multimethod Matrix. 56, 2, 81-105. A matrix (table) of correlations arranged to facilitate the assessment of construct validity An integration of both convergent and discriminate validity

79 What Is the MTMM Matrix? Assumes that you measure each of several concepts (trait) by more than one method. Very restrictive -- ideally you should measure each concept by each method. Arranges the correlation matrix by concepts within methods.

80 PrinciplesPrinciples Convergence: Things that should be related are. Divergence/Discrimination: Things that shouldn't be related aren't.

81 A Hypothetical MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85)

82 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 The reliability diagonal

83 Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Validity diagonals

84 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Monomethod heterotrait triangles

85 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod heterotrait triangles

86 Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Monomethod blocks

87 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod blocks

88 Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Reliability should be highest coefficients.

89 Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Convergent validity diagonals should have strong r's.

90 Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Convergent: The same pattern of trait interrelationship should occur in all triangles (mono and heteromethod blocks).

91 Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Discriminant: A validity diagonal should be higher than the other values in its row and column within its own block (heteromethod).

92 Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76).57.22.09(.93).22.57.10.68(.94).11.11.46.59.58(.84).56.22.11.67.42.33(.94).23.58.12.43.66.34.67(.92).11.11.45.34.32.58.58.60(.85) Discriminant: A variable should have higher r with another measure of the same trait than with different traits measured by the same method.

93 AdvantagesAdvantages l Addresses convergent and discriminant validity simultaneously l Addresses the importance of method of measurement l Provides a rigorous standard for construct validity

94 DisadvantagesDisadvantages l Hard to implement l No known overall statistical test for validity –In some cases Structural Equation Modeling may provide an overall test l Requires judgment call on interpretation

95

96 Threats to Construct Validity (Design Threats)

97 Inadequate Pre-Operational Explication of Constructs Preoperational = before translating constructs into measures or treatments In other words, you didn't do a good enough job of defining (operationally) what you mean by the construct. Solution: More thinking Use methods such as concept mapping Expert opinions to better define the construct.

98 Mono-Operation Bias Pertains to the treatment or program (independent variable) Used only one version of the treatment or program. This typically results in an under representation of the construct and lowers construct validity Challenge - Not always possible to have alternative versions. Try at different times, places.

99 Mono-Method Bias Pertains to the measures or outcomes (dependent variable) Only operationalized measures in one way The method used may influence results. l Solution: Implement multiple measures of key constructs, and demonstrate the measures behave as theorized. –Using a pilot study allows issues to be determined before implementing full study. l Feasibility / practicality of using multiple methods can be an issue.

100 Confounding Constructs & Levels of Constructs - (threat to construct validity) Operationalization of treatment construct Wrong conclusions related to level of treatment – not the treatment itself. Really a dosage issue -- related to mono- operation because you only looked at one or two levels. Educational program implemented for 1 hr day Conclude no impact but 2 hours may have worked Drug dosage – dosage level may be the issue related to whether it works or not.

101 Interaction of Different Treatments People get more than one treatment. This happens all the time in social ameliorative studies. Again, the construct validity issue is largely a labeling issue.

102 Interaction of Testing and Treatment Does the testing itself make the groups more sensitive or receptive to the treatment? This is a labeling issue. It differs from testing threat to internal validity; here, the testing interacts with the treatment to make it more effective; there, it is not a treatment effect at all (but rather an alternative cause).

103 Restricted Generalizability Across Constructs You didn't measure your outcomes completely. You didn't measure some key affected constructs at all (for example, unintended effects).

104 Threats to Construct Validity (Social Threats)

105 Hypothesis Guessing (threat to construct validity) People guess the hypothesis and respond to it rather than respond "naturally“. People want to look good or look smart. This is a construct validity issue because the "cause" will be mislabeled. You'll attribute effect to treatment rather than to good guessing.

106 Evaluation Apprehension (threat to construct validity) l Perhaps their apprehension makes them consistently respond poorly -- you mislabel this as a negative treatment effect.

107 Experimenter Expectancies (threat to construct validity) The experimenter can bias results consciously or unconsciously. Bias becomes confused (mixed up with) the treatment; you mislabel the results as a treatment effect.

108 Threats to Construct Validity (Conclusion) l Design Threats l Social Threats

109

110 Measurement Error

111 True Score Theory 1234512345 1234512345 3Scan a multitude of information and decide what is important. 1234512345 1234512345 1234512345 1234512345 1234512345 1Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Observedscore = Trueability + Randomerror T e + X

112 The Error Component T e + X Two components:

113 The Error Component T e + X Two components: erererer

114 The Error Component T e + X Two components: Random error Random error erererer

115 The Error Component T e + X Two components: Random error Random error erererer eseseses

116 The Error Component T e + X Two components: Random error Random error Systematic error Systematic error erererer eseseses

117 The Revised True Score Model T erererer + X eseseses +

118 What Is Random Error? l Any factors that randomly affect measurement of the variable across the sample. l For instance, each person’s mood can inflate or deflate performance on any occasion. l Random error adds variability to the data but does not affect average performance for the group.

119 Random Error X Frequency The distribution of X with no random error

120 Random Error X Frequency The distribution of X with no random error The distribution of X with random error

121 Random Error X Frequency The distribution of X with no random error The distribution of X with random error Notice that random error doesn’t affect the average, only the variability around the average.

122 What Is Systematic Error?

123 Systematic Error: Any factors that systematically affect measurement of the variable across the sample. l Systematic error = bias. l For instance, asking questions that start “do you agree with right-wing fascists that...” will tend to yield a systematic lower agreement rate. l Systematic error does affect average performance for the group.

124 Systematic Error X Frequency The distribution of X with no systematic error

125 Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error

126 Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error Notice that systematic error does affect the average; we call this a bias.

127 Reducing Measurement Error l Pilot test your instruments -- get feedback from respondents. l Train your interviewers or observers l Make observation/measurement as unobtrusive as possible. l Double-check your data. l Triangulate across several measures that might have different biases.

128

129 Levels of Measurement

130 The Levels of Measurement l Nominal l Ordinal l Interval l Ratio

131 Some Definitions VariableVariable

132 VariableVariable AttributeAttributeAttributeAttribute

133 GenderGenderVariableVariable AttributeAttributeAttributeAttribute

134 GenderGender FemaleFemaleMaleMale VariableVariable AttributeAttributeAttributeAttribute

135 Qualities of Variables l Exhaustive -- Should include all possible answerable responses. l Mutually exclusive -- No respondent should be able to have two attributes simultaneously.

136 What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable

137 What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable Relationship

138 What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values

139 What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes RepublicanIndependentDemocrat

140 What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes Variable RepublicanIndependentDemocrat Party Affiliation

141 Why Is Level of Measurement Important? l Helps you decide what statistical analysis is appropriate on the values that were assigned l Helps you decide how to interpret the data from that variable

142 Nominal Measurement l The values “name” the attribute uniquely. l The value does not imply any ordering of the cases, for example, jersey numbers in football. l Even though player 32 has higher number than player 19, you can’t say from the data that he’s greater than or more than the other.

143 Ordinal Measurement When attributes can be rank-ordered… l However - Distances between attributes do not have any meaning, for example, code Educational Attainment as: – 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college l Is the distance from 0 to 1 the same as 3 to 4? –We can’t say, there’s an order, but no real meaning between the distance between values.

144 Interval Measurement When distance between attributes has meaning, for example, temperature (in Fahrenheit) -- distance from 30-40 is same as distance from 70-80 l Note that ratios don’t make any sense -- 80 degrees is not twice as hot as 40 degrees (although the attribute values are).

145 Ratio Measurement l Has an absolute zero that is meaningful l Can construct a meaningful ratio (fraction), for example, number of clients in past six months l It is meaningful to say that “...we had twice as many clients in this period as we did in the previous six months.

146 The Hierarchy of Levels Nominal

147 Nominal Attributes are only named; weakest

148 The Hierarchy of Levels Nominal Attributes are only named; weakest Ordinal

149 The Hierarchy of Levels Nominal Attributes are only named; weakest Attributes can be ordered Ordinal

150 The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Ordinal

151 The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal

152 The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal

153 The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Absolute zero Ordinal

154

155 Relationship of Reliability and Validity l Validity requires Reliability. l Reliability does not necessarily imply Validity.

156 Reliability and Validity Reliable but not valid

157 Reliability and Validity Valid: Measures what it is intended to measure consistently

158

159 The Theory of Reliability

160 What Is Reliability l The “repeatability” of a measure l The “consistency” of a measure l The “dependability” of a measure

161 If a Measure Is Reliable... We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).

162 If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).

163 If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 But, if the scores are similar, why are they similar?

164 If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 Recall from true score theory that... But, if the scores are similar, why are they similar?

165 If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 The only thing common to the two measures is the true score, T. Therefore, the true score must determine the reliability.

166 Reliability Is... a ratio true level on the measure the entire measure

167 Reliability Is... a ratio variance of the true scores variance of the measure var(T) var(X) So, theoretically a measure that is perfectly reliable would have a value of “1” because the top and bottom values would be equal. However, remember there’s error in real measurements!

168 Reliability Is... a ratio variance of the true scores variance of the measure We can measure the variance of the observed score, X.

169 Reliability Is... variance of the true scores variance of the measure We can measure the variance of the observed score, X. But, how do we measure the true scores?

170 Reliability Is... variance of the true scores variance of the measure But, how do we measure the true scores? We can’t!

171 This Leads Us to... l We cannot calculate reliability exactly; we can only estimate it. l Each estimate attempts to capture the consequences of the true score in different ways.

172

173 Types of Reliability

174 Reliability of Consistency of What? l Inter-Rater: Observers or raters l Test-retest: Tests over time l Alternate Forms –Different versions of the same test l Split-halves –Estimate of alternate forms –Internal Consistency l K20 & Coefficient Alpha –Internal Consistency

175 Inter-Rater or Inter-Observer Reliability Object or phenomenon

176 Inter-Rater or Inter-Observer Reliability Observer 1 Object or phenomenon

177 Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon

178 Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon = ?

179 Inter-Rater or Inter-Observer Reliability l Are different observers consistent? l Can establish this outside of your study in a pilot study. l Can look at percent of agreement (especially with category ratings). l Can use correlation (with continuous ratings).

180 Test-Retest Reliability Time 1 Time 2

181 Test-Retest Reliability TestTest Time 1 Time 2 =

182 Test-Retest Reliability TestTest Time 1 Time 2 = Stability over time

183 Test-Retest Reliability l Measure instrument at two times for multiple persons. l Compute correlation between the two measures. l Assumes there is no change in the underlying trait between time 1 and time 2.

184 Parallel-Forms Reliability Time 1 Time 2

185 Parallel-Forms Reliability Form B Time 1 Time 2 Form A =

186 Parallel-Forms Reliability Form B Time 1 Time 2 Stability across forms Form A =

187 Parallel-Forms Reliability l Administer both forms to the same people. l Get correlation between the two forms. l Usually done in educational contexts where you need alternative forms because of the frequency of retesting and where you can sample from lots of equivalent questions.

188 Internal Consistency Reliability A few different ways to calculate Average inter-item correlation

189 Internal Consistency Reliability Test Average Inter-Item Correlation

190 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Average Inter-Item correlation

191 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation

192 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation -Note. Does not include same item correlations (all = 1).89 (average)

193 Average item-total correlation Internal Consistency Reliability A few different ways to calculate

194 Test Average item-total correlation Internal Consistency Reliability

195 Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

196 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00.84.88.86.87.83.821.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

197 1.00.89 1.00.91.92 1.00.88.93.95 1.00.84.86.92.85 1.00.88.91.95.87.85 1.00.84.88.86.87.83.821.00 I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation.85 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

198 Split-half correlations Internal Consistency Reliability A few different ways to calculate

199 Test Split-half correlations Internal Consistency Reliability

200 Split-half correlations Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

201 Split-half correlations Item 1 Item 3 Item 4 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

202 Split-half correlations Item 2 Item 5 Item 6 Internal Consistency Reliability Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

203 Split-half correlations.87 Internal Consistency Reliability Item 2 Item 5 Item 6 Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

204 Cronbach’s alpha (  ) Internal Consistency Reliability A few different ways to calculate

205 Test Cronbach’s alpha (  ) Internal Consistency Reliability

206 Cronbach’s alpha (  ) Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

207 Cronbach’s alpha (  ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

208 SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85 Internal Consistency Reliability Cronbach’s alpha (  ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

209 SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85  =.85 Internal Consistency Reliability Cronbach’s alpha (  ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

210 SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH 5.86... SH n.85  =.85 Like the average of all possible split-half correlations Internal Consistency Reliability Cronbach’s alpha (  ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6

211 Internal Consistency Reliability - Summary l Average inter-item correlation l Average item-total correlation l Split-half reliability –Spearman-brown formula l K20 – Kudor Richardson (1937) –Dichotomously scored items Cronbach’s alpha (  ) (1951) Cronbach’s alpha (  ) (1951) –More general form of K20 –Can be used for dichotomously or more continuous scale

212 General Rules of Thumb for “r” as a Cronbach alpha l In general:.90 - high reliability.80 - moderate to high.70 - low to moderate.60 - unacceptable

213 Reliability Summary l l Different types of instruments have different levels of reliability. l l Standardized multiple-choice – –Typically.85-.95 l l Open ended questions – –Typically.65-.80 l l Portfolio Scoring – –.40 -.60 l l “The more important and the less reversible is the decision about an individual based on the instrument, the higher the reliability should be” (Nitko, 2004). – –Particularly relevant in the case of student assessments.


Download ppt "Measurement Validity & Reliability. Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct."

Similar presentations


Ads by Google