SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory
Want to measure variables Variables are persons, places or things A conceptual entity, any construct or characteristic to which different numerical values can be assigned for purposes of analysis or comparison
Variables Independent, Dependent and Control variables Measurement is the process of assigning numbers (or things that take the place of numbers) to variables according to a set of rules
Measurement It’s a process because you want to measure change (variables). It’s a process, not an event. Measurement deals with variables that change.
Measurement Scales Set of rules proposed by S. S. Sterns in 1946 in the journal Science. He proposed a four-tiered hierarchy of scales, from most simple to most complex Nominal Ordinal Interval Ratio
Nominal (or Categorical) The process of grouping individual observations into qualitative categories or classes Does not involve magnitude Examples: gender, religion, & ethnicity
Ordinal A measuring procedure which assigns one object a greater number, the same number, or a smaller number than a second object only if the first possesses, respectively, more, the same, or less of the characteristic being measured than the second object For example: Likert scales, which rate items (from strongly, disagree to strongly agree)
Interval A special kind of ordinal scaling where the measurement assigned to an object is linearly related to its true magnitude Has an arbitrary origin (zero-point) and a fixed, though arbitrary, unit of measure Has set intervals (i.e. time)
Ratio A special kind of interval scaling where the measurement assigned to an object is proportional to its true magnitude Has an absolute zero (i.e. weight)
To measure variables First you need to figure out how you will measure Just because variables may have numeric values does not necessarily make them interval or ratio (e.g. Likert Scales)
Reliability & Validity Involves Classical Measurement Theory O = T + E (observed = true score plus error) Benefit of classical Measurement Theory is that it solves for E
Reliability Instrument Reliability- consistency with which you measure whatever you intend to measure Consistency of scores. Ex. if using a scale to weigh yourself, if use several times and obtain similar weights, it’s reliable Three paradigms: internal consistency, test/retest, and alternate/parallel forms
Measures of Internal Consistency (Reliability) Split halves: split test in half and correlate the two halves Odd/even: is method for solving for the problems of split-halves Kuder Richardson 20: estimates the correlation of all permutations KR-21: simplified K-R 20 Cornbach alpha: can be used with the widest variety of data collection procedures
Test/Retest No intervention, one test, then same test later. (purpose is to test the instrument, not achievement) Problems include: memory and practice effects 1 – 3 week delay between tests is the best because no fatigue and low memory and practice effects
Alternate/Parallel Forms Alternate: same test items, but in a different sequence Parallel: write two items from blueprint. Use one item for one test and the other item for the other test (i.e. Columbus in 1492 discovered ___. America is 1492 was discovered by ___. Parallel reduces memory effects. Alternate reduces practice effects
Standard Error of Measure (standard deviation of error) SEM indicates the range within which the “true” score of the individual is likely to fall, while taking into consideration the unreliability of the test E.g. If a student received a score (observed) of 85 on a test, and the standard error of measure (SEM) is 4.0, then the true score would probable range somewhere between 81 and 89
SEM SEM: standard deviation divided by the square root of one minus the reliability coefficient As range increases, interpretability goes down. As confidence range increases, interpretability decreases The more variability the less useful it is
z & t-Scores z = raw score minus mean Standard deviation t = (z) Used to compare individual scores to the population who took test
Instrument Validity Degree to which a test measures what it purports to measure. Reliability is prerequisite to validity, to be valid, a test must first be reliable Past texts had validity before reliability because it occurred first, however reliability is primary to validity Tests themselves are not valid, it’s their application that is or is not Four types of validity: content, concurrent, predictive, & construct
Content Validity Degree to which the content on a test matches the content in the blueprint (or course) Can use curriculum guides, other teachers, blueprints, principal, professional standards Deals with the question of whether a given data collection technique adequately measures the whole range of topics it is suppose to measure
Concurrent Validity A type of measurement validity that deals with the question of whether a given data collection technique correlates highly with another data collection technique that is suppose to measure the same thing The degree to which the scores on a test are related to the scores on another, already established test administered at the same time, or to some other valid criterion available at the same time
Predictive Validity (aka: Criterion Validity) Degree to which a test is able to predict how well an individual will do in a future situation A type of measurement validity that deals with the question of whether a measurement process forecasts a person’s performance on a future task
Construct Validity A fiction or invention used to explain reality (i.e. math anxiety) A type of measurement validity that deals with the question of whether a given data collection technique is actually providing an assessment of an abstract, theoretical psychological characteristic.
Construct Validity (continued) The degree to which a test measures an intended hypothetical construct, or non- observable trait, which explains behavior Factor analysis is the statistical technique used to measure construct validity