More about Relationships Between Two Variables Chapter 4 More about Relationships Between Two Variables
4.1 Objectives Explain what is meant by transforming (re-expressing) data. Discuss the advantages of transforming nonlinear data. Tell where y = log(x) fits into the hierarchy of power transformations. Explain the ladder of power transformations. Explain how linear growth differs from exponential growth. Identify real-life situations in which transformation can be used to linearize data from an exponential model. Use a logarithmic transformation to linearize a data set that can be modeled from an exponential model. Identify situations in which a transformation is required to linearize a power model. Use a transformation to linearize a data set that can be modeled by a power model.
Transforming Data If data does not appear to be linear, we look for a transformation that will linearize it. In Statistics data is most often transformed using linear transformations, with negative and positive powers, or with logarithms. In this course we will concentrate on power and logarithmic transformations.
Exponential Regression Logarithmic Regression Power Regression Linear Regression y-hat = a + bx Exponential Regression y-hat = abx Logarithmic Regression y-hat = a + b(lnx) Power Regression y-hat = axb
Transforming to Achieve Linearity The Sound of Music The following data set represents the frequencies of the notes on a musical scale over an octave starting with A = 440 hertz (Hz). Note Position Frequency (Hz) A 1 440 E 8 659.26 B-flat 2 466.16 F 9 698.46 B 3 493.88 F-sharp 10 739.99 C 4 523.25 G 11 783.99 C-sharp 5 554.37 A-flat 12 830.61 D 6 587.33 13 880.00 D-sharp 7 622.25 PRB Chapter 4 extra example
Sketch the scatterplot and residual plot. Is a linear model appropriate? Take the ln of the frequencies and plot them against note position. Use the new model to predict the position for the frequency of the C note at position 16. F = e^(6.03 + 0.0578P)
Example 4.1, Page 265
Transforming with Powers Page 268
Parent Graphs- Know Your Functions!
Linear growth Exponential growth Increases by a fixed amount in each equal time period. Adding/subtracting the same amount to get the next term. Exponential growth Increases by a fixed percent of the previous total in each equal time period. Multiplying/dividing the same amount to get the next term.
The Logarithm Transformation Algebraic Properties of Logarithms logbx = y if and only if by = x The rules for logarithms are Product Law logb(MN) = logbM + logbN Quotient law logb (M/N) = logbM – logbN Power rule logbXp = plogbX =
Example Page 277 #4.7
Page 476 #4.5
Power Law Models 1. The power model is y = axb 2. Take the logarithm of both sides of this equations. You see that log y = log a + plog x This is a linear relationship between log x and log y. 3. The power p becomes the slope of the straight line that links log y to log x.
Example Page 282 Example 4.10 What happens when Pluto is removed?
Which Model Should You Choose? 1. Plot the data and look for patterns, and if there is a pattern look for deviations from that pattern. Investigate the deviations and if there are no known extraordinary reasons for these deviations, consider removing these points. 2. Is there a linear pattern? If the data describe a linear trend, then use the methods of Chapter 3 to perform least-squares regression and construct a residual plot to assess the quality of the linear model. Note the r- and r2- values, and use the line to make predictions for the appropriate values of the explanatory variable. Remember the danger of extrapolation!
3. Are the plots increasing, decreasing exponentially over time 3. Are the plots increasing, decreasing exponentially over time? Many things in nature grow exponentially over time: The size of a population of animals grows exponentially in the absence of predators and limited resources. Money invested at a constant rate of return. Average Major League Baseball salaries The cost of prescription drugs Epidemics. If the situation you are pondering fits this category, then consider transforming that data into (x, log y) or (x, ln y) and see if that tends to straighten the data.
4. Is there a dimensional relationship 4. Is there a dimensional relationship? The weight of a fish, for a given species, is related to its length. Length is a one-dimensional quantity, and weight is a three-dimensional quantity. Intuition suggests that weight should be proportional to the cube of the length. So a model of the form y = axb seems reasonable (and we expect the power, b, to be approximately 3). If this power function is a viable model, then plotting the transformed data in the form (log x, log y) should straighten the data. If the two variables you are considering have different dimensions, and the dimensions are different, then consider a power function model of this form.
The most used transformations Linear Positive and Negative Powers 5. The hierarchy of powers. What if all of the above efforts fail to produce a satisfactory model? The “ladder of powers” transformations may be fruitful. The most used transformations Linear Positive and Negative Powers Logarithms Page 268
Relationships between Categorical Variables 4.2 Relationships between Categorical Variables
4.2 Objectives Explain what is meant by a two-way table. Explain what is meant by marginal distributions in a two-way table. Describe how changing counts to percents is helpful in describing relationships between categorical data. Explain what is meant by a conditional distribution. Define Simpson’s paradox, and give an example of it.
The big ideas Categorical variables place individuals into categories. To present the relationship between two categorical variables measured on the same individuals, use a two-way table of counts of individuals falling into all combinations of categories. We may be interested in the marginal distribution of one variable, or the conditional distribution of one variable for a fixed value of the other variable. Both marginal and conditional distributions, usually expressed as percents, are found from the two-way table. If there is an explanatory-response relationship, describe the relationship comparing the conditional distributions of the response for the different values of the explanatory variable.
Example: Where do college-age young adults live? A large sample survey interviewed a random sample of young adults in 2000 and 2001. One question asked was, “Where do you live now? That is, where do you stay most often?”
Young Adults by Age and Living Arrangement 19 20 21 22 Total Parents’ Home 324 378 337 318 1357 Another person’s home 37 47 40 38 162 Your own place 116 279 372 487 1254 Group quarters 58 60 49 25 192 Other 5 2 3 9 540 766 801 877 2984
Marginal Distributions Each marginal distribution from a two-way table is a distribution for a single categorical variable. These have to do with what goes on in the margins (the counts for each row and column)
Conditional Distributions These have to do with calculating the distribution of percents for one variable across some condition on the other variable.