Chapter 10: Re-expressing Data It’s easier than you think!
Goals of Re-expression Goal 1: Make the distribution of a variable more symmetric. Easier to summarize the center, using mean and standard deviation. If distribution is unimodal, use the Rule.
Goals of Re-expression Goal 2: Make the spread of several groups mare alike, even if their centers differ. Goal 3: Make the form of the scatterplot more nearly linear. Goal 4: Make the scatter in the scatterplot spread out evenly rather than following a fan shape.
The Ladder of Powers PowerNameComment 2y2y2 Unimodal, skewed left 1Raw data Data that takes on +/- values ½y 1/2 Counted data 0logarithm Measurements that cannot be - - ½1/ y 1/2 Preserves the direction of relationship -1/yRatios of two quantities
Attack of the Logarithms Model Namex-axisy-axisComment Exponentialxlog(y)Useful with values that grow by % increase. Logarithmiclog(x)y Useful with wide range of x values or scatterplot descending rapidly then leveling off. Powerlog(x)log(y) When one of the ladder’s powers is too big and the other is too small.
Let’s Try It! (Pg 192) Shutter speed and f/stop of the lens L1: shutter speed L2: f/stop Curved stat plot Try logarithms Take log of L1→L3 Take log of L2→L4
Let’s Try It! Scatterplot #1: Xlist→L3, Ylist→L2 Scatterplot #2: Xlist→L1, Ylist→L4
Let’s Try It! Use Scatterplot #3: LinReg L3, L4 LinReg L3, L4
Multiple Benefits A single re-expression may improve each of our goals at the same time. Re-expression certainly simplifies efforts to analyze and understand relationships. Simpler explanations and simpler models tend to give a true picture of the relationship. (Occam’s Razor)
TI Tips Regressions that automatically and appropriately re-express the data:
Equivalent Models Type ofRe-expressionCalculator’sCurve ModelEquationCommandEquation Logarithmic LnReg Exponential ExpReg Power PwrReg
What Can Go Wrong?!? Beware of multiple modes. Re-expression cannot pull separate modes together. Watch out for scatterplots that turn around. Watch out for negative data values. It is impossible to re-express negative values by any power that is not positive.
What Can Go Wrong?!? Watch for data far from one. Re-expressing data with a range from 1 to 1000 is far more effective than re-expressing data with a range of 100,000 to 100,100. Don’t stray too far from the ladder. Stick to powers between -2 and 2. Stick to the simpler powers contained in the “ladder.”