Re-Expressing Variables SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ Statistics SPSS An Integrative Approach SECOND EDITION Re-Expressing Variables Using Chapter 4
Linear and Non-Linear Transformations: An Overview The effect on the shape of a distribution The effect on summary statistics of a distribution Common linear transformations Standard scores z-scores Nonlinear transformations: Square roots Logarithms Rankings Other transformations: Recoding Combining Two or More Variables
Linear Transformations: Examples New Test Score = Old Test Score + 10 HEIGHTCM = 2.54 *HEIGHTIN Centigrade =
Linear Transformations: Definition The general form of a linear transformation is XNew= K*XOld + C where K and C are constants and K ≠ 0. Note that multiplication includes division (multiplying by a fraction) and addition includes subtraction (adding a negative).
Linear Transformations: Effect on the Shape of a Distribution by Multiplying by a Non-Zero Constant 5
Linear Transformations: The Effect on Summary Statistics by Multiplying by a Non-Zero Constant The Transformation is: XNew= K*XOld, K≠0 The Effect on Measures of Central Tendency: MedianNew= K*MedianOld MeanNew = K*MeanOld ModeNew = K*ModeOld 6
Linear Transformations: The Effect on Summary Statistics by Multiplying by a Non-Zero Constant The Transformation is: XNew= K*XOld, K≠0 The Effect on Measures of Spread: IQRNew = |K|*IQROld SDNew = |K|*SDOld RangeNew = |K|*RangeOld VarianceNew= K2*VarianceOld 7
Linear Transformations: The Effect on Summary Statistics by Multiplying by a Non-Zero Constant The Transformation is: XNew= K*XOld, K≠0 The Effect on Measures of Shape: If K is positive SkewnessNew = SkewnessOld Skewness RatioNew= Skewness RatioOld If K is negative SkewnessNew = (-1)*SkewnessOld Skewness RatioNew= (-1)*Skewness RatioOld 8
Linear Transformations: Effect on the Shape of a Distribution by Adding a Constant
Linear Transformations: The Effect on Summary Statistics by Adding a Constant The Transformation is: XNew= XOld+C The Effect on Measures of Central Tendency: MedianNew= MedianOld+C MeanNew = MeanOld+C ModeNew = ModeOld+C 10
Linear Transformations: The Effect on Summary Statistics by Adding a Constant The Transformation is: XNew= XOld+C The Effect on Measures of Spread: IQRNew = IQROld SDNew = SDOld RangeNew = RangeOld VarianceNew= VarianceOld 11
Linear Transformations: The Effect on Summary Statistics by Adding a Constant The Transformation is: XNew= XOld+C The Effect on Measures of Skew: SkewnessNew = SkewnessOld Skewness RatioNew= Skewness RatioOld 12
Linear Transformations - Examples Translation Reflection Standard scores
Common Linear Transformations: Translation Example: Adding a Constant Number of Points to Each Score in a Distribution 14
Common Linear Transformations: Reflection Example: Consider the following item from the Survey of Attitudes Toward Statistics 15
Common Linear Transformations: Reflection The item is measured on a seven point Likert scale with 1 = Strongly Disagree and 7 = Strongly Disagree. In this case, a high score represents a negative attitude toward statistics. A reflection is required to create a new variable with the property that a high score represents a positive attitude toward statistics. 16
Common Linear Transformations: Reflection ScoreReflected = (-1)*ScoreOriginal 17
Common Linear Transformations: Combining Reflection with Addition After reflection, a translation may be used to bring the variable back to a 1-7 point Likert scale. The combined transformation is ScoreReflected = (-1)*ScoreOriginal+8 18
Common Linear Transformations: Combining Reflection with Addition 19
Common Linear Transformations: Standard Scores Means and standard deviations of commonly-used standard score systems 20
Common Linear Transformations: z-scores The z-score measures the number of standard deviations above (when positive) or below (when negative) the mean the score is. Given a raw score, X, the formula to find the associated z-score is: 21
Common Linear Transformations: z-scores Example: Convert to a z-score: IQ = 90 Solution: From the table of standard scores, we know that the mean and standard deviation of an IQ scale are 100 and 15, respectively. In other words, an IQ score of 110 is .67 standard deviations below the IQ mean 22
Common Linear Transformations: z-scores Example: Find the z-score associated with a score of 30 on the socioeconomic status (SES) scale from the NELS data set. Solution: From Descriptives in SPSS, we see that the mean and standard deviation of SES are 18.43 and 6.92, respectively. 23
Common Linear Transformations: z-scores Example: Convert all SES scores in the NELS data set to z- scores. Solution: The easiest way to create standardized variables is to go to Analyze on the main menu bar, Descriptive Statistics, Descriptives. Move SES into the Variable(s) box. Click the box next to Save Standardized Values as Variables. Click OK. The new variable, zses, appears as the last column in the NELS data set. 24
Common Linear Transformations: z-scores Given a z-score, the formula to find the associated raw score, X, is: 25
Common Linear Transformations: Using z-scores to Detect Outliers A score may be considered an outlier if it falls more than two standard deviations away from its distribution’s mean; that is, if its z-score is greater than 2 in magnitude. Example: Are there outliers in the SES distribution? Solution in two steps: Create the z-score variable for SES with Descriptives Create a frequency distribution table for zses. 26
Common Linear Transformations: Using z-scores to Detect Outliers According to the z-score criterion, there are 11 outliers in SES. 27
Common Linear Transformations: Using z-scores to Compare Scores in Different Distributions Appropriate when distributions are similarly shaped. If they are not similarly shaped, use percentiles. Example: Is a twelfth grade reading achievement score of 65 higher relative to males or females? Solution: The score of 65 is 1.22 standard deviations above the mean for females and only 1.13 standard deviations above the mean for males. Therefore, a female with a score of 65 did better relative to her cohort of females than a male did relative to his cohort of males. 28
Common Nonlinear Transformations: Square Roots and Logarithms These nonlinear transformations are: Monotonic because they retain the order of values in a distribution. Nonlinear because they change the relative distances between the values in the distribution. Often used to reduce the severity of a distribution’s skew. Not necessarily defined for all values taken on by a variable. Square roots are defined only on values greater than or equal to zero Logarithms are defined only on values greater than zero. 29
Common Nonlinear Transformations: Square Roots and Logarithms Example: The distribution of expected income at age 30 (EXPINC30) is positively skewed and the distribution of self-concept in grade 12 (SLFCNC12) is negatively skewed. Use appropriate transformations to reduce the magnitude of the skew in both distributions, if possible. 30
Common Nonlinear Transformations: Square Roots and Logarithms 31
Common Nonlinear Transformations: Square Roots and Logarithms 32
Common Nonlinear Transformations: Square Roots and Logarithms To create the transformed variables of EXPINC30, we use the Compute function under the Transform option. In this case, the Numeric Expression is defined as SQRT(EXPINC30 + 1). We may label the Target Variable as EXPINCSQ (to stand for the square root transformation of expected income). Click OK. Repeat to create the log transformation with the Numeric Expression LG10(EXPINC30 + 1) and Target Variable EXPINCLG. 33
Common Nonlinear Transformations: Square Roots and Logarithms 34
Common Nonlinear Transformations: Square Roots and Logarithms In this case, the log transformation overcompensates for the positive skew and results in a highly negatively skewed distribution. Accordingly, we choose the square root transformation as the better of the two. 35
Common Nonlinear Transformations: Negatively Skewed Distributions – Need to Reflect the Variable Before Applying the Transformation To reflect self-concept, we simply multiply each self-concept value by –1. To do so using SPSS, we use the Compute function under the Transform option. In this case, the Numeric Expression is defined as -1*SLFCNC12. We may label the Target Variable as SLFCNCF1 (to stand for self- concept, reflected). Using SPSS, we can implement this translation by using the Compute function under the Transform option and creating a new variable, SLFCNCF2, using the following Numeric Expression: 44 + SLFCNCF1. 36
Common Nonlinear Transformations: Reflecting a negatively skewed distribution prior to a square root and/or logarithmic transformation 37
Common Nonlinear Transformations: Square Roots and Logarithms 38
Common Nonlinear Transformations: Square Roots and Logarithms In this case, because both transformations, after reflection, make the twelfth-grade self- concept distribution more negatively skewed than it was originally (with skew = -.384), neither transformation improves symmetry. Accordingly, the best course of action in this case would be to leave SLFCNC12 in its original form. 39
Common Nonlinear Transformations: Ranking Variables Ranking variables: Use the CALORIES and FAT variables from the Hamburg data set. To rank the cases, click Transform on the main menu bar and Rank Cases. Move FAT and CALORIES into the Variables box. Under the heading Assign Rank 1 to, click the circle next to Largest Value. Click OK. 40
Common Nonlinear Transformations: Ranking Variables 41
Common Nonlinear Transformations: Recoding Variables Example: In the NELS, the variable SCHTYP8 is coded with 1 = Public, 2 = Private religious, and 3 = Private non- religious. Create a new variable SCHTYP8DI coded with 1 = Public and 2 = Private. 42
Common Nonlinear Transformations: Recoding Variables To use SPSS to collapse the three categories of SCHTYP8 into two (public and private), click Transform from the main menu bar, Recode, and Into Different Variables. Click SCHTYP8 and move it into the Input Variable box. Under the Output Variable Name, type in SCHTYPRE and click Change. Click Old and New Values. Next to Old Value type in 1 and next to New Value type in 1. Click Add. Again, next to Old Value type in 2 and next to New Value type in 2. Click Add. Finally, next to Old Value type in 3 and next to New Value type in 2. Click Add. Click Continue and OK. 43
Common Nonlinear Transformations: Combining Two or More Variables Example: Create the variable SUMUNITS by adding UNITMATH and UNITENGL Solution: Click Transform from the main menu bar, then Compute. Type SUMUNITS in the Target Variable box and move the variable UNITMATH into the Numeric Expressions box, type or click the sign and then move the second variable, UNITENGL, into the Numeric Expressions box. Click OK. The new variable SUMUNITS will appear as the last variable in the data set. 44