More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations.

Slides:



Advertisements
Similar presentations
The Question of Causation YMS3e 4.3:Establishing Causation AP Statistics Mr. Molesky.
Advertisements

Chapter 4 Review: More About Relationship Between Two Variables
MATH 2400 Chapter 5 Notes. Regression Line Uses data to create a linear equation in the form y = ax + b where “a” is the slope of the line (unit rate.
Chapter 10 Re-Expressing data: Get it Straight
Chapter 4 The Relation between Two Variables
Aim: How do we establish causation?
Chapter 3 Bivariate Data
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Chapter 3 Review Two Variable Statistics Veronica Wright Christy Treekhem River Brooks.
Lesson Quiz: Part I 1. Change 6 4 = 1296 to logarithmic form. log = 4 2. Change log 27 9 = to exponential form = log 100,000 4.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Describing the Relation Between Two Variables
Ch 2 and 9.1 Relationships Between 2 Variables
Scatter Diagrams and Correlation
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
+ Hw: pg 788: 37, 39, 41, Chapter 12: More About Regression Section 12.2b Transforming using Logarithms.
More about Relationships Between Two Variables
The Question of Causation
HW#9: read Chapter 2.6 pages On page 159 #2.122, page 160#2.124,
Transforming to achieve linearity
Chapter 4: More on Two-Variable (Bivariate) Data.
The Practice of Statistics
C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Sec. 4.2 – Cautions about Correlation and Regression.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Regression Wisdom.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Chapter 4 More on Two-Variable Data YMS 4.1 Transforming Relationships.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Chapter 3.3 Cautions about Correlations and Regression Wisdom.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
Slide 9-1 Copyright © 2004 Pearson Education, Inc.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Copyright © 2010 Pearson Education, Inc. Slide The lengths of individual shellfish in a population of 10,000 shellfish are approximately normally.
Business Statistics for Managerial Decision Making
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
Chapter 4 Day Six Establishing Causation. Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming.
1. Plot the data. What kind of growth does it exhibit? (plot by hand but you may use calculators to confirm answers.) 2. Use logs to transform the data.
Fall Looking Back In Chapters 7 & 8, we worked with LINEAR REGRESSION We learned how to: Create a scatterplot Describe a scatterplot Determine the.
Cautions About Correlation and Regression Section 4.2.
The Question of Causation
Scatterplots and Correlation Section 3.1 Part 2 of 2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates, Moore.
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
The Question of Causation 4.2:Establishing Causation AP Statistics.
AP Statistics. Issues Interpreting Correlation and Regression  Limitations for r, r 2, and LSRL :  Can only be used to describe linear relationships.
Chapter 3 Unusual points and cautions in regression.
Cautions About Correlation and Regression Section 4.2
Chapter 4.2 Notes LSRL.
Cautions About Correlation and Regression
Cautions about Correlation and Regression
Chapter 2 Looking at Data— Relationships
Cautions about Correlation and Regression
The Question of Causation
Least-Squares Regression
EQ: What gets in the way of a good model?
4.2 Cautions about Correlation and Regression
Chapters Important Concepts and Terms
Presentation transcript:

More on Two-Variable Data

Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations involving powers and logarithms to linearize curved relationships. Explain what is meant by a two-way table, and describe its parts. Give an example of Simpson’s Paradox. Explain what gives the best evidence for causation. Explain the criteria for establishing causation when experimentation is not feasible.

The Goal Our goal is to fit a model to curved data so that we can make predictions as we did in chapter 3. HOWEVER, the only statistical tool we have to fit a model is the least-squares regression model. THEREFORE, in order to find a model for curved data, we must first “straighten it out”….

Transforming Relationships Data that displays a curved pattern can be modeled by a number of different functions. Two most common: –Exponential (y=AB x ) –Power (y=Ax B ) Chapter 4 focuses on these two models

pp. 195 – 6 Example 4.1 Brain weight v. body weight Note about variables: –Sometimes we wish to transform x, or y, or both x and y. –Therefore we refer to variables generically as t.

Why Linear transformations cannot straighten a curved relationship between two variables. Because of this, we must resort to functions that are not linear.

A Note about Monotonic Functions

4.1 A. y = 2.54 x monotonic increasing B. y = 60/x monotonic decreasing C. circumference = π(diameter) monotonic increasing D. SquaredError = (time – 5) 2 Not monotonic

Figure 4.5 What can we learn? –The graph of a linear function (power p = 1) is a straight line. –Powers greater than 1 (like p = 2 and p = 4) give graphs that bend upward. The sharpness of the bend increases as p increases. –Powers less than 1 but greater than 0 (like p = 0.5) give graphs that bend downward. –Powers less than 0 (like p = -0.5 and p = -1) give graphs that decrease as x increases. Greater negative values of p result in graphs that decrease more quickly. –Look at the p = 0 graph. You may be surprised that this is not the graph of y = x 0. Why not? The 0 th power x 0 is just the constant 1, which is not very useful. The p = 0 entry in the figure is not constant; it is the logarithm, log x. That is, the logarithm fits into the hierarchy of power transformations at p = 0.

pp Example 4.2 runs through several steps from the ladder of power transformations. This emphasizes that the process can be one of –(a) making a good guess, based on observations of a graph of the data, about the type of transformation needed and –(b) trying several types of the transformation chosen. This can get tedious, so the next section introduces a more analytic approach. The first approach is to look for an exponential growth pattern, which has the advantage that it can be linearized by taking logarithms (of the response variable) to transform the data.

4.3 Weight = c 1 (height) 3 and strength = c 2 (height) 2 ; therefore, strength = c (weight) 2/3, where c is a constant.

4.4 A graph of the power law y =x 2/3 shows that strength does not increase linearly with body weight, as would be the case if a person 1 million times as heavy as an ant could lift 1 million times more than the ant. Rather, strength increases more slowly. For example, if weight is multiplied by 1000, strength will increase by a factor of (1000) 2/3 = 100.

4.5 Let y = average heart rate and x = body weight. Keibler’s law says that total energy consumed is proportional to the three-fourths power of body weight, that is, Energy = c 1 x 3/4. But total energy consumed is also proportional to the product of the volume of blood pumped by the heart and the heart rate, that is, Energy = c 2 (volume)y. The volume of blood pumped by the heart is proportional to body weight, that is, Volume = c 3 x. Putting these three equations together yields c 1 x 3/4 = c 2 (volume)y = c 2 (c 3 x)y. Solving for y, we obtain

Exponential Growth Linear growth: adding a fixed increment in each equal time period. Exponential growth: multiplying by a fixed number in each equal time period. –Can also be looked at as growing by a fixed percentage.

p. 205 Example 4.4 Is this exponential growth? What is the projected amount for 2005? Actual was 203,000,000 (2005) Other interesting statistics: –2,000,000,000 cell phones world wide 4.5% world without –Average American spends 13 talking hours per month –Average American in 18 – 24 age group spends 22 talking hours per month

Texting in the United States

Logarithm log b x=y if and only if b y =x The rules for logarithms are

p. 209 Example 4.6

4.6 A.

4.6 B /63024 = / = / = 3.12 C. log y yields , , ,

4.6 C.

4.6 D. use calculator to confirm E. The residual plot of the transformed data shows no clear pattern, so the line is a reasonable model for these points.

4.6 F.

4.6 G. The predicted number of acres defoliated in 1982 is the exponential function evaluated at 1982, which gives 10,719, acres.

4.9

4.10 A. Year# children killed

4.10 B.

4.10 C. If x = number of years after 1950, then y = the number of children killed x years after 1950 = 2 x. At x = 45, y = 2 45 = 3.52 x 10 13, or 35,200,000,000,000.

4.10 D.

4.10 E. b = a =

p. 215 Exponential growth models become linear when we apply the logarithm transformation to the response variable y. Power law models become linear when we apply the logarithm transformation to both variables.

4.17 A. YearValue

4.17 B.

4.17 C. 2.73, 2.76, 2.79, 2.82, 2.86, 2.89, 2.92, 2.95, 2.98, 3.01

4.18 Alice has Fred has

Cautions About Correlation and Regression

Our Tools for Describing Data Sets Correlation –r: Strength, form, direction Regression –Generalized pattern –Useful for predictions Limitations of our tools –Correlation and regression describe only linear relationships –The correlation “r” and the “LSRL” are NOT RESISTANT

Other Cautions Extrapolation –The use of a regression line for prediction far outside the domain used. –Examples: Age v. Height Time v. Death Rate ( Swine Flu) Time v. Water Level of a Lake Time v. Children gunned down

Other Cautions Lurking Variables –A variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among these variables. –Can falsely suggest relationship between x and y –Can hide actual relationship between x and y

Other Cautions Lurking Variables –An example….

There's this guy who's going to clean the windows of a mental asylum. A patient follows him shouts to him "I gotta secret, I gotta secret...", he ignores the patient. Again the patient follows him, but he ignores his cries. By the time he's nearly finished the building, he's really curious about what the patients secret is, so he decides to ask the patient. The patient pulls a matchbox out of his pocket, opens it and puts it on a table. Out crawls this little spider. The patient says "spider go left", and the spider walks to it's left a bit. Then he says "spider go right", the spider walks to its right a little bit. He says "spider turn around, walk forward then go right", and sure enough the spider turns around, walks forward, and then goes right a bit. The window cleaner is amazed "Wow! He says, that's amazing!", "No, that's not my secret says the patient, watch". He picks up the spider in his hand and pulls all its legs off then puts it back on the table. "Spider go right", the spider doesn't move, "spider go Left", the spider doesn't move, "Spider turn around" again the spider doesn't move. "There!" he says, "that's my secret, if you pull all a spiders legs off they go deaf

The answer is not available in the original data, but was discovered through some additional research on the Buick Estate Wagon. These data were collected by Consumer's Union on a test track (rather than using the EPA test values for fuel efficiency) following the manufacturer's recommendations for each car's maintenance. Additional research revealed that starting with this model year, Buick recommended a higher tire inflation pressure for the Buick Estate Wagon. The recommended inflation pressure level was higher than the level for other cars in the survey. Harder tires present less rolling resistance and improve gas mileage; therefore, the Buick Estate Wagon outperformed our expectations based on our regression model, which did not account for tire inflation pressure. In our model Tire Pressure is a lurking variable, variable that seems to help in predicting gas mileage but is not included in the model.

Other Cautions Using averaged data –Pay particular attention to data that has been averaged –The correlation and LSRL of these data sets should not be applied to the individuals that the averages came from Example –Examining monthly data and attempting to apply it to a day of that month.

Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming that an observed correlation is due to causation, you must put any statement of relationship through sharp inspection. Causation can not be established “after the fact.” It can only be established through well-designed experiments. {see Ch 5}

Explaining Association Strong Associations can generally be explained by one of three relationships. Confounding Confounding: x may cause y, but y may instead be caused by a confounding variable z CommonResponse Common Response: x and y are reacting to a lurking variable z Causation Causation: x causes y

Causation Causation is not easily established. The best evidence for causation comes from experiements that change x while holding all other factors fixed. Even when direct causation is present, it is rarely a complete explanation of an association between two variables. Even well established causal relations may not generalize to other settings.

Common Response “Beware the Lurking Variable” The observed association between two variables may be due to a third variable. Both x and y may be changing in response to changes in z.

Confounding Two variables are confounded when their effects on a response variable cannot be distinguished from each other. Confounding prevents us from drawing conclusions about causation. We can help reduce the chances of confounding by designing a well-controlled experiment.

Example People with two cars tend to live longer than people who own only one car. Owning three cars is even better, and so on. What might explain the association?

p : People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar. Does artificial sweetener use cause weight gain? –There may be a causative effect, but in the direction opposite to the one suggested: People who are overweight are more likely to be on diets, and so choose artificial sweeteners over sugar. Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to artificial sweeteners.

p : Women who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed chemicals cause the miscarriages. Another explanation may be the fact these workers spend a lot of time on their feet. –Time standing up is a confounding variable in this case.

p : Children who watch many hours of TV get lower grades on average than those who watch less TV. Why does this fact not show that watching TV causes low grades?

p : High school students who take the SAT, enroll in an SAT coaching course, and take the SAT again raise their mathematics score from an average of 521 to 561. Can this increase be attributed entirely to taking the course? The effect of coaching and confounded with those of experience. A student who has taken the SAT once may improve his ro her score on the second attempt because of increased familiarity with the test.