Examining Data
Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy of these items along that construct. 3. Choose a response format 4. Investigate how well the hierarchy holds for members of your response frame. 5. Ensure that your scale is unidimensional.
Unidimensionality Always Remember – Unidimensionality is never perfect. It is always approximate. Need to ask: "Is dimensionality in the data big enough to merit dividing the items into separate tests, or constructing new tests, one for each dimension?“ It may be that two or three off-dimension items have been included in your item instrument and should be dropped. The question then becomes "Is the lack of unidimensionality in my data sufficiently large to threaten the validity of my results?"
Do my items fall along a unidimensional scale? We can investigate through Person and Item Fit Statistics The Principal Components Analysis of Residuals
A Rasch Assumption The Rasch model is based on the specification of "local independence". Meaning that after the contribution of the measures to the data has been removed, all that will be left is random, normally distributed, noise. When a residual is divided by its model standard deviation, it will have the characteristics of being sampled from a unit normal distribution.
Residual-based Principal Components Analysis This is not a typical factor analysis PCAR intention is to explain variance. Specifically, it looks for the factor in the residuals that explains the most variance. If factor is at the "noise" level, then no shared second dimension. If factor is above the “noise” level, then it is the "second" dimension in the data. Similarly, a third dimension is investigated, etc.
Example: Table 23 Table of STANDARDIZED RESIDUAL variance (in Eigenvalue units) Empirical Total variance in observations = % Variance explained by measures = % Unexplained variance (total) = % (100%) Unexpl var explained by 1st factor = % (18.5) The Rasch dimension explains 80.5% of the variance in the data. Is this good? The largest secondary dimension, "the first factor in the residuals" explains 3.6% of the variance. What do you think?
Table of STANDARDIZED RESIDUAL variance Empirical: variance components for the observed data Model: variance components expected for the data if exactly fit the Rasch model Total variance in observations: total variance in the observations around their Rasch expected values in standardized residual units Variance explained by measures: variance explained by the item difficulties, person abilities and rating scale structures. Unexplained variance (total): variance not explained by the Rasch measures Unexplained variance (explained by 1st, 2nd,... factor): size of the first, second,... component in the principal component decomposition of residuals
Unexplained variance explained by 1 st factor The eigenvalue of the biggest residual dimension is 4.6. Indicating it has the strength of almost 5 items In other words, the contrast between the strongly positively loading items and the strongly negatively loading items on the first factor in the residuals has the strength of about 5 items. Since positive and negative loading is arbitrary, it is necessary to look at the items at the top and the bottom of the factor plot. Are those items substantively different? To the point they merit the construction of two separate tests?
How Big is Big? Rules of Thumb A "secondary dimension" must have the strength of at least 3 items. If the first factor has an eigenvalue less than 3, then the test is probably unidimensional. Individual items may still misfit. Simulation studies indicate that an eigenvalue less than 1.4 is at the random level; larger values indicate there is some structure present (R. Smith). No established criteria for when a deviation becomes a dimension. PCA is only indicative, but not definitive.
Consider Liking for Science Output… Do the items at the top differ substantively from those at the bottom?
If still in doubt… Split your items into two subtests, based on positive and negative loadings on the first residual factor. Measure everyone on the two subtests and cross-plot the measures. What is their correlation? Do you see two versions of the same story about the persons? If only a few people are noticeably off-diagonal, then you have a substantively unidimensional test.