Measurement Validity & Reliability
Measurement – Validity & Reliability l The Idea of Construct Validity The Idea of Construct Validity The Idea of Construct Validity l Measurement Validity Measurement Validity Measurement Validity l Construct Validity Tools – Conceptual –The Nomological Network The Nomological NetworkThe Nomological Network –The Multitrait-Multimethod Matrix The Multitrait-Multimethod MatrixThe Multitrait-Multimethod Matrix l Threats to Measurement Construct Validity Threats to Measurement Construct Validity Threats to Measurement Construct Validity
Measurement - Validity & Reliability Reliability l Measurement Error Measurement Error Measurement Error l Levels of Measurement (data type) Levels of Measurement (data type) Levels of Measurement (data type) l Relationship – Validity & Reliability Relationship – Validity & Reliability Relationship – Validity & Reliability l Theory of Reliability Theory of Reliability Theory of Reliability l Types of Reliability Types of Reliability Types of Reliability l Reliability Summary Reliability Summary Reliability Summary
Construct Validity l Trochim & Donnelly define as: –Refers to the degree inferences can legitimately be made from the operationalizations in the study to the theoretical constructs on which those operationalizations are based. l Others have defined more narrowly, such as: –Refers to the nature of the psychological construct or characteristic being measured. l Two broad perspectives related to Construct Validity – Definitionalist & Relationalist
Construct Validity Theory Observation CauseConstructEffectConstruct ProgramObservations What you do What you see What you think cause-effect construct program-outcome relationship Can we generalize to the constructs?
The Analogy to Law Definitionalist Perspective (as opposed to the Relationalist Perspective)
The Analogy to Law Definitionalist Perspective The truth…
The Analogy to Law Definitionalist Perspective The truth… the whole truth,
The Analogy to Law Definitionalist Perspective The truth … the whole truth, and nothing but the truth.
In Terms of Construct Validity Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet
In Terms of Construct Validity Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct…
In Terms of Construct Validity Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct,
In Terms of Construct Validity Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet The construct… the whole construct, and nothing but the construct.
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D
What Is the Goal? Other construct: A Other construct: C Other construct: B Other construct: D Didn’t measure any of the construct The construct
What Is the Goal? Other construct: A Other construct: C Other construct: B Measure part of the construct and part of construct B The construct Other construct: D
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure part of the construct and nothing else
What Is the Goal? The construct Other construct: A Other construct: C Other construct: B Other construct: D Measure all of the construct and nothing else.
The Problem l Definitionalist’s perspective often does not work in social science. (“all” & “nothing but”) l Concepts are not mutually exclusive. –They exist in a web of overlapping meaning. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l To enhance construct validity, show where the construct is in its broader network of meaning. –Next slides demonstrate this concept.
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D The construct is slightly related to the other four.
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C and constructs B and D are related to each other.
Could Show That... The construct Other construct: A Other construct: C Other construct: B Other construct: D Constructs A and C are not related to constructs B and D.
Example: You Want to Measure Self -esteem
Example: Distinguish From... Self- esteem Self-worth Confidence Self- disclosure Openness
To Establish Construct Validity l Set the construct within a semantic net (net of meaning). l Provide evidence that you control the operationalization of the construct. –That your theory has some correspondence with reality – explain the operationalization. l Provide evidence that your data support the theoretical structure. –Constructs that should be more related, are; constructs that should be less related, are.
Measurement Validity Types
Measurement Validity l Validity is the most important idea to consider when preparing/selecting a measurement instrument. –Quality of instrument critical for the conclusions researchers draw. l Researchers use a number of procedures to ensure that the inferences they draw, based on the data collected are valid and reliable. –Validity refers to the appropriateness, meaningfulness, correctness, and usefulness of the inferences a researcher makes. –Reliability refers to the consistency of scores or answers from one administration of an instrument to another.
Measurement Validity Types l Translational Validity –Face validity –Content validity l Criterion-related validity –Predictive validity –Concurrent validity –Convergent validity –Discriminate validity
Face Validity l The measure seems valid “on its face”. l A judgment call. l Probably the weakest form of measurement validity Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet
Content Validity l Refers to the content of the instrument. –How appropriate is the content? –How comprehensive? –Does it get at the intended variable? –How adequately does the sample of items/questions represent the content? l Check the measure against the relevant content domain. –Standards may represent the content domain. –The content domain is not always clear (i.e., self-esteem) l Involves some judgment. –Can make this relatively rigorous with a systematic check Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Content domain
Criterion-Related Validity There are Multiple Forms l Refers to the relationship between scores obtained using the instrument and scores obtained using one or more other instruments or measures (the criterion). l Empirically based. l Questions to ask: –How strong is this relationship –How well do scores estimate present or predict future performance. l Specific types of Criterion-related validity are explored on the next slides.
Predictive Validity l A type of criterion-related validity l Look at measure’s ability to predict something it should be able to predict. l Examples: –Driver written test should predict hands-on driving test. –Math ability test may predict how well a person does in an engineering profession. –GRE scores predict how well a person does in graduate school Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet TestCriterion
Concurrent Validity l A type of criterion-related validity l Does the measure distinguish between groups that it should distinguish between? l Measurements taken at the same, or nearly same time. l Examples: –A measure of empowerment should show higher scores for managers and lower scores for their workers. –Attitude measurement should match with current behavior and distinguish good attitudes from bad.
Convergent and Discriminant Validity (to determine measurement validity)
The Convergent Principle Measures of constructs that are related to each other should be strongly correlated.
How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4
How It Works Theory Self-esteemconstruct Item 1 Item 2 Item 3 Item 4 You theorize that the items all reflect self-esteem.
How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item
How It Works Theory Observation Self-esteemconstruct Item 1 Item 2 Item 3 Item The correlations provide evidence that the items all converge on the same construct.
The Discriminant Principle Measures of different constructs should not correlate highly with each other.
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2
How It Works Theory Self- esteem construct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 You theorize that you have two distinguishable constructs.
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11
How It Works Theory Self-esteemconstruct SE 1 SE 2 Locus-of-controlconstruct LOC 1 LOC 2 Observation r SE 1, LOC 1 =.12 r SE 1, LOC 2 =.09 r SE 2, LOC 1 =.04 r SE 2, LOC 2 =.11 The correlations provide evidence that the items on the two tests discriminate.
Putting It All Together Convergent and Discriminant Validity
Theory Self-esteemconstruct Locus-of-controlconstruct We have two constructs. We want to measure self-esteem and locus of control.
Theory Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC 3 We have two constructs. We want to measure self-esteem and locus of control. For each construct, we develop three scale items; our theory is that items within the construct will converge and Items across constructs will discriminate.
Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3
Theory Observation Self-esteemConstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 Green and red correlations are Convergent; White are Discriminant.
Theory Observation Self-esteemconstruct SE 1 SE 2 SE 3 Locus-of-controlconstruct LOC 1 LOC 2 LOC SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 SE 1 SE 2 SE 3 LOC 1 LOC 2 LOC 3 The correlations support both convergence and discrimination, and therefore construct validity.
Relationships Among Validity Types l Almost all can be considered a sub-case of construct validity. l Construct validity is the most encompassing and comprehensive standard. l There is no single piece of evidence that satisfies construct-related validity, rather –A variety of different types of evidence allows warranted inferences. l Three specific steps for establishing measurement construct validity – see next slide.
To Establish Construct Validity (for measurements) l Generally - Three specific steps in obtaining evidence for measurement construct validity: –The variable being measured is clearly defined. –Hypotheses are formed »Based on a theory underlying the variable about those who possess a lot, versus a little of the variable measured. –The hypotheses are tested both logically and empirically. l Example: Next slide
To Establish Construct Validity - Example (for measurement instrument) l Researcher interested in developing an instrument to measure honesty. –First – Define honesty –Second – Theorize about how honest people behave versus dishonest people. –Third: Tested both logically and empirically »Develops and administers measurement instrument – collects data. »Gives all participants an opportunity to be honest or dishonest (leave $ out) l If instrument has construct validity should see relationship between scores and behavior.
The Nomological Network
What Is the Nomological Net? l An idea developed by Cronbach, L. and Meehl, P. (1955). –Construct Validity in Psychological Tests, Psychological Bulletin, 52, 4, l Nomological is derived from Greek and means “lawful”. l Links interrelated theoretical ideas with empirical evidence. l A view of construct validity. –Cronback & Meehl argued that a Nomological Network had to be developed to show construct validity.
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct Construct Construct ConstructConstruct
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study, Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct...their observable manifestations,
What Is the Nomological Net? A representation of the concepts (constructs) of interest in a study,...their observable manifestations, and the interrelationships among and among these. construct ObsObs ObsObs ObsObs ObsObs ObsObs construct construct constructconstruct
What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas
What Is the Nomological Net? Construct ObsObs ObsObs ObsObs ObsObs ObsObs Construct Construct ConstructConstruct Theoretical Level: Concepts, Ideas Observed Level: Measures, Programs
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Scientifically, to make clear what something is means to set forth the laws in which it occurs. This interlocking system of laws is the Nomological Network.
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... observable properties or quantities to each other.
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... different theoretical constructs to each other.
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The laws in a nomological network may relate... theoretical constructs to observables.
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct At least some of the laws in the network must involve observables.
PrinciplesPrinciples ObsObsObsObs ConstructConstructConstruct "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs. ObsObs
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or of increasing the definiteness of its components. "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs... ObsObs
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct The basic rule for adding a new construct or relation to a theory is that it must generate laws (nomologicals) confirmed by observation... ObsObs Construct
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct or reduce the number of nomologicals required to predict some observables. ObsObs Construct
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations that are qualitatively different "overlap" or "measure the same thing"...
PrinciplesPrinciples ObsObsObsObsObsObs ConstructConstructConstruct Operations which are qualitatively different "overlap" or "measure the same thing"... if their positions in the nomological net tie them to the same construct variable.
The Main Problem with the Nomological Net It doesn't tell us how we can assess the construct validity in a study.
The Multitrait- Multimethod Matrix
What Is the MTMM Matrix? An approach developed by Campbell, D. and Fiske, D. (1959). Convergent and Dicriminant Validation by the Multitrait- Multimethod Matrix. 56, 2, A matrix (table) of correlations arranged to facilitate the assessment of construct validity An integration of both convergent and discriminate validity
What Is the MTMM Matrix? Assumes that you measure each of several concepts (trait) by more than one method. Very restrictive -- ideally you should measure each concept by each method. Arranges the correlation matrix by concepts within methods.
PrinciplesPrinciples Convergence: Things that should be related are. Divergence/Discrimination: Things that shouldn't be related aren't.
A Hypothetical MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85)
(.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 The reliability diagonal
Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Validity diagonals
(.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Monomethod heterotrait triangles
(.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod heterotrait triangles
Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Monomethod blocks
(.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Parts of the Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 Heteromethod blocks
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Reliability should be highest coefficients.
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Convergent validity diagonals should have strong r's.
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Convergent: The same pattern of trait interrelationship should occur in all triangles (mono and heteromethod blocks).
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Discriminant: A validity diagonal should be higher than the other values in its row and column within its own block (heteromethod).
Interpreting the MTMM Matrix Method 1Method 2Method 3 TraitsA 1 B 1 C 1 A 2 B 2 C 2 A 3 B 3 C 3 A 1 Method 1B 1 C 1 A 2 Method 2 B 2 C 2 A 3 Method 3 B 3 C 3 (.89).51(.89).38.37(.76) (.93) (.94) (.84) (.94) (.92) (.85) Discriminant: A variable should have higher r with another measure of the same trait than with different traits measured by the same method.
AdvantagesAdvantages l Addresses convergent and discriminant validity simultaneously l Addresses the importance of method of measurement l Provides a rigorous standard for construct validity
DisadvantagesDisadvantages l Hard to implement l No known overall statistical test for validity –In some cases Structural Equation Modeling may provide an overall test l Requires judgment call on interpretation
Threats to Construct Validity (Design Threats)
Inadequate Pre-Operational Explication of Constructs Preoperational = before translating constructs into measures or treatments In other words, you didn't do a good enough job of defining (operationally) what you mean by the construct. Solution: More thinking Use methods such as concept mapping Expert opinions to better define the construct.
Mono-Operation Bias Pertains to the treatment or program (independent variable) Used only one version of the treatment or program. This typically results in an under representation of the construct and lowers construct validity Challenge - Not always possible to have alternative versions. Try at different times, places.
Mono-Method Bias Pertains to the measures or outcomes (dependent variable) Only operationalized measures in one way The method used may influence results. l Solution: Implement multiple measures of key constructs, and demonstrate the measures behave as theorized. –Using a pilot study allows issues to be determined before implementing full study. l Feasibility / practicality of using multiple methods can be an issue.
Confounding Constructs & Levels of Constructs - (threat to construct validity) Operationalization of treatment construct Wrong conclusions related to level of treatment – not the treatment itself. Really a dosage issue -- related to mono- operation because you only looked at one or two levels. Educational program implemented for 1 hr day Conclude no impact but 2 hours may have worked Drug dosage – dosage level may be the issue related to whether it works or not.
Interaction of Different Treatments People get more than one treatment. This happens all the time in social ameliorative studies. Again, the construct validity issue is largely a labeling issue.
Interaction of Testing and Treatment Does the testing itself make the groups more sensitive or receptive to the treatment? This is a labeling issue. It differs from testing threat to internal validity; here, the testing interacts with the treatment to make it more effective; there, it is not a treatment effect at all (but rather an alternative cause).
Restricted Generalizability Across Constructs You didn't measure your outcomes completely. You didn't measure some key affected constructs at all (for example, unintended effects).
Threats to Construct Validity (Social Threats)
Hypothesis Guessing (threat to construct validity) People guess the hypothesis and respond to it rather than respond "naturally“. People want to look good or look smart. This is a construct validity issue because the "cause" will be mislabeled. You'll attribute effect to treatment rather than to good guessing.
Evaluation Apprehension (threat to construct validity) l Perhaps their apprehension makes them consistently respond poorly -- you mislabel this as a negative treatment effect.
Experimenter Expectancies (threat to construct validity) The experimenter can bias results consciously or unconsciously. Bias becomes confused (mixed up with) the treatment; you mislabel the results as a treatment effect.
Threats to Construct Validity (Conclusion) l Design Threats l Social Threats
Measurement Error
True Score Theory Scan a multitude of information and decide what is important Manage time effectively 2Manage resources effectively. 3Scan a multitude of information and decide what is important. 4Decide how to manage multiple tasks. 5Organize the work when directions are not specific. 1Manage time effectively Rating Sheet Observedscore = Trueability + Randomerror T e + X
The Error Component T e + X Two components:
The Error Component T e + X Two components: erererer
The Error Component T e + X Two components: Random error Random error erererer
The Error Component T e + X Two components: Random error Random error erererer eseseses
The Error Component T e + X Two components: Random error Random error Systematic error Systematic error erererer eseseses
The Revised True Score Model T erererer + X eseseses +
What Is Random Error? l Any factors that randomly affect measurement of the variable across the sample. l For instance, each person’s mood can inflate or deflate performance on any occasion. l Random error adds variability to the data but does not affect average performance for the group.
Random Error X Frequency The distribution of X with no random error
Random Error X Frequency The distribution of X with no random error The distribution of X with random error
Random Error X Frequency The distribution of X with no random error The distribution of X with random error Notice that random error doesn’t affect the average, only the variability around the average.
What Is Systematic Error?
Systematic Error: Any factors that systematically affect measurement of the variable across the sample. l Systematic error = bias. l For instance, asking questions that start “do you agree with right-wing fascists that...” will tend to yield a systematic lower agreement rate. l Systematic error does affect average performance for the group.
Systematic Error X Frequency The distribution of X with no systematic error
Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error
Systematic Error X Frequency The distribution of X with no systematic error The distribution of X with systematic error Notice that systematic error does affect the average; we call this a bias.
Reducing Measurement Error l Pilot test your instruments -- get feedback from respondents. l Train your interviewers or observers l Make observation/measurement as unobtrusive as possible. l Double-check your data. l Triangulate across several measures that might have different biases.
Levels of Measurement
The Levels of Measurement l Nominal l Ordinal l Interval l Ratio
Some Definitions VariableVariable
VariableVariable AttributeAttributeAttributeAttribute
GenderGenderVariableVariable AttributeAttributeAttributeAttribute
GenderGender FemaleFemaleMaleMale VariableVariable AttributeAttributeAttributeAttribute
Qualities of Variables l Exhaustive -- Should include all possible answerable responses. l Mutually exclusive -- No respondent should be able to have two attributes simultaneously.
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable Relationship
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes RepublicanIndependentDemocrat
What Is Level of Measurement? The relationship of the values that are assigned to the attributes for a variable 123 Relationship Values Attributes Variable RepublicanIndependentDemocrat Party Affiliation
Why Is Level of Measurement Important? l Helps you decide what statistical analysis is appropriate on the values that were assigned l Helps you decide how to interpret the data from that variable
Nominal Measurement l The values “name” the attribute uniquely. l The value does not imply any ordering of the cases, for example, jersey numbers in football. l Even though player 32 has higher number than player 19, you can’t say from the data that he’s greater than or more than the other.
Ordinal Measurement When attributes can be rank-ordered… l However - Distances between attributes do not have any meaning, for example, code Educational Attainment as: – 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college l Is the distance from 0 to 1 the same as 3 to 4? –We can’t say, there’s an order, but no real meaning between the distance between values.
Interval Measurement When distance between attributes has meaning, for example, temperature (in Fahrenheit) -- distance from is same as distance from l Note that ratios don’t make any sense degrees is not twice as hot as 40 degrees (although the attribute values are).
Ratio Measurement l Has an absolute zero that is meaningful l Can construct a meaningful ratio (fraction), for example, number of clients in past six months l It is meaningful to say that “...we had twice as many clients in this period as we did in the previous six months.
The Hierarchy of Levels Nominal
Nominal Attributes are only named; weakest
The Hierarchy of Levels Nominal Attributes are only named; weakest Ordinal
The Hierarchy of Levels Nominal Attributes are only named; weakest Attributes can be ordered Ordinal
The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Ordinal
The Hierarchy of Levels Nominal Interval Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal
The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Ordinal
The Hierarchy of Levels Nominal Interval Ratio Attributes are only named; weakest Attributes can be ordered Distance is meaningful Absolute zero Ordinal
Relationship of Reliability and Validity l Validity requires Reliability. l Reliability does not necessarily imply Validity.
Reliability and Validity Reliable but not valid
Reliability and Validity Valid: Measures what it is intended to measure consistently
The Theory of Reliability
What Is Reliability l The “repeatability” of a measure l The “consistency” of a measure l The “dependability” of a measure
If a Measure Is Reliable... We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 We should see that a person’s score on the same test given twice is similar (assuming the trait being measured isn’t changing).
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 But, if the scores are similar, why are they similar?
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 Recall from true score theory that... But, if the scores are similar, why are they similar?
If a Measure Is Reliable... X1X1X1X1 X2X2X2X2 T + e 1 T + e 2 The only thing common to the two measures is the true score, T. Therefore, the true score must determine the reliability.
Reliability Is... a ratio true level on the measure the entire measure
Reliability Is... a ratio variance of the true scores variance of the measure var(T) var(X) So, theoretically a measure that is perfectly reliable would have a value of “1” because the top and bottom values would be equal. However, remember there’s error in real measurements!
Reliability Is... a ratio variance of the true scores variance of the measure We can measure the variance of the observed score, X.
Reliability Is... variance of the true scores variance of the measure We can measure the variance of the observed score, X. But, how do we measure the true scores?
Reliability Is... variance of the true scores variance of the measure But, how do we measure the true scores? We can’t!
This Leads Us to... l We cannot calculate reliability exactly; we can only estimate it. l Each estimate attempts to capture the consequences of the true score in different ways.
Types of Reliability
Reliability of Consistency of What? l Inter-Rater: Observers or raters l Test-retest: Tests over time l Alternate Forms –Different versions of the same test l Split-halves –Estimate of alternate forms –Internal Consistency l K20 & Coefficient Alpha –Internal Consistency
Inter-Rater or Inter-Observer Reliability Object or phenomenon
Inter-Rater or Inter-Observer Reliability Observer 1 Object or phenomenon
Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon
Inter-Rater or Inter-Observer Reliability Observer 1 Observer 2 Object or phenomenon = ?
Inter-Rater or Inter-Observer Reliability l Are different observers consistent? l Can establish this outside of your study in a pilot study. l Can look at percent of agreement (especially with category ratings). l Can use correlation (with continuous ratings).
Test-Retest Reliability Time 1 Time 2
Test-Retest Reliability TestTest Time 1 Time 2 =
Test-Retest Reliability TestTest Time 1 Time 2 = Stability over time
Test-Retest Reliability l Measure instrument at two times for multiple persons. l Compute correlation between the two measures. l Assumes there is no change in the underlying trait between time 1 and time 2.
Parallel-Forms Reliability Time 1 Time 2
Parallel-Forms Reliability Form B Time 1 Time 2 Form A =
Parallel-Forms Reliability Form B Time 1 Time 2 Stability across forms Form A =
Parallel-Forms Reliability l Administer both forms to the same people. l Get correlation between the two forms. l Usually done in educational contexts where you need alternative forms because of the frequency of retesting and where you can sample from lots of equivalent questions.
Internal Consistency Reliability A few different ways to calculate Average inter-item correlation
Internal Consistency Reliability Test Average Inter-Item Correlation
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Average Inter-Item correlation
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation
Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I1I1I2I2I3I3I4I4I5I5I6I6I1I1I2I2I3I3I4I4I5I5I6I6 Average inter-item correlation -Note. Does not include same item correlations (all = 1).89 (average)
Average item-total correlation Internal Consistency Reliability A few different ways to calculate
Test Average item-total correlation Internal Consistency Reliability
Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6I1I2I3I4I5I6 I 1 I 2 I 3 I 4 I 5 I 6 Total Average item-total correlation.85 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Split-half correlations Internal Consistency Reliability A few different ways to calculate
Test Split-half correlations Internal Consistency Reliability
Split-half correlations Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Split-half correlations Item 1 Item 3 Item 4 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Split-half correlations Item 2 Item 5 Item 6 Internal Consistency Reliability Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Split-half correlations.87 Internal Consistency Reliability Item 2 Item 5 Item 6 Item 1 Item 3 Item 4 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Cronbach’s alpha ( ) Internal Consistency Reliability A few different ways to calculate
Test Cronbach’s alpha ( ) Internal Consistency Reliability
Cronbach’s alpha ( ) Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Internal Consistency Reliability Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH SH n.85 Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH SH n.85 =.85 Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
SH 1.87 SH 2.85 SH 3.91 SH 4.83 SH SH n.85 =.85 Like the average of all possible split-half correlations Internal Consistency Reliability Cronbach’s alpha ( ).87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6.87 item 1 item 3 item 4 item 2 item 5 item 6 Test Item 1 Item 2 Item 3 Item 4 Item 5 Item 6
Internal Consistency Reliability - Summary l Average inter-item correlation l Average item-total correlation l Split-half reliability –Spearman-brown formula l K20 – Kudor Richardson (1937) –Dichotomously scored items Cronbach’s alpha ( ) (1951) Cronbach’s alpha ( ) (1951) –More general form of K20 –Can be used for dichotomously or more continuous scale
General Rules of Thumb for “r” as a Cronbach alpha l In general:.90 - high reliability.80 - moderate to high.70 - low to moderate.60 - unacceptable
Reliability Summary l l Different types of instruments have different levels of reliability. l l Standardized multiple-choice – –Typically l l Open ended questions – –Typically l l Portfolio Scoring – – l l “The more important and the less reversible is the decision about an individual based on the instrument, the higher the reliability should be” (Nitko, 2004). – –Particularly relevant in the case of student assessments.