1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Chapter 4 Review: More About Relationship Between Two Variables
Chapter 3 Examining Relationships
Chapter 4: More on Two- Variable Data.  Correlation and Regression Describe only linear relationships Are not resistant  One influential observation.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
AP Statistics Section 4.2 Relationships Between Categorical Variables.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
S ECTION 4.1 – T RANSFORMING R ELATIONSHIPS Linear regression using the LSRL is not the only model for describing data. Some data are not best described.
AP Statistics Causation & Relations in Categorical Data.
2.4 Cautions about Correlation and Regression. Residuals (again!) Recall our discussion about residuals- what is a residual? The idea for line of best.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Ch 2 and 9.1 Relationships Between 2 Variables
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
More about Relationships Between Two Variables
The Question of Causation
HW#9: read Chapter 2.6 pages On page 159 #2.122, page 160#2.124,
1 10. Causality and Correlation ECON 251 Research Methods.
The Practice of Statistics
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
BPS - 5TH ED.CHAPTER 6 1 An important measure of the performance of a locomotive is its "adhesion," which is the locomotive's pulling force as a multiple.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Chapter 10 Correlation and Regression
Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 2 Looking at Data: Relationships.
Analysis of Two-Way tables Ch 9
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Chapter 4 More on Two-Variable Data YMS 4.1 Transforming Relationships.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
 Some variables are inherently categorical, for example:  Sex  Race  Occupation  Other categorical variables are created by grouping values of a.
Chapter 7 Scatterplots, Association, and Correlation.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Aim: How do we analyze data with a two-way table?
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Business Statistics for Managerial Decision Making
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Chapter 4 Day Six Establishing Causation. Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
1. Plot the data. What kind of growth does it exhibit? (plot by hand but you may use calculators to confirm answers.) 2. Use logs to transform the data.
Cautions About Correlation and Regression Section 4.2.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Second factor: education
Cautions About Correlation and Regression Section 4.2
Examining Relationships Least-Squares Regression & Cautions about Correlation and Regression PSBE Chapters 2.3 and 2.4 © 2011 W. H. Freeman and Company.
Chapter 2 Looking at Data— Relationships
Chapter 2: Looking at Data — Relationships
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
Chapter 2 Looking at Data— Relationships
Second factor: education
7 Minutes of Silence Determine if the data is linear or exponential.
Chapter 2 Looking at Data— Relationships
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Least-Squares Regression
Section 4-3 Relations in Categorical Data
Correlation/regression using averages
Relations in Categorical Data
Chapter 4: More on Two-Variable Data
Correlation/regression using averages
Presentation transcript:

1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data

2 Example Year Cell Phone Users (thousands) 5,283 16,009 24,134 33,786 44,043 55,312 69,209 86,047

3 Scatterplot for Cell Phone Example

4 Residuals Plot

5 What’s going on here? Do the data (y) increase by a constant amount each year? –This would suggest a linear model. Or, do the data increase by a fixed percentage each year? That is, can you multiply the y-value by a fixed number to get the next year’s number, and then multiply that number by the fixed number to get the following year’s number? –This would suggest an exponential model.

6 Use an Exponential (Non-Linear) Model

7 Plotting our original data vs. our exponential model …

8 Prediction Using the Model Model: Now use the new model to predict cell phone subscribers for 2000.

9 Problem 4.6, p. 212 –Parts a, b, g Problem 4.11, p. 213 –Create a model, and then let’s see how well we can predict population in Practice

10 Power Law Models General form of a power law model: Biologists have found that many characteristics of living things are described quite closely by power laws. –For example, the rate at which animals use energy goes up as the ¾ power of their body weight (Kleiber’s Law).

11 Problem 4.13, p 219

12 Problem 4.13, p. 219

13 Residuals Analysis (excluding Hippo and Elephant)

14 What is the standardized residual for the hippo?

15 Predicting Lifespan for Humans

16 Another Practice Problem 4.25, pp Create appropriate model Predict seed count for tree with seed weight of 1,000 mg.

17 HW Problem: –4.14, p. 220

Cautions about Correlation and Regression The correlation (r) and the LSR line are not resistant. Extrapolation (predicting past the x-variable for which the model was developed) is often misleading.

19 The French Paradox The paradox refers to the fact that the French have long had low rates of heart disease (Japan is the only developed country with a lower rate), despite a diet relatively rich in saturated animal fats. The French propensity to drink wine the way some Americans guzzle soft drinks has been cited as a likely explanation of the paradox, since numerous studies have indicated that alcohol consumed in moderation helps to prevent atherosclerosis, or accumulation of fatty deposits in arteries, which is the underlying cause of most heart attacks. + from NY Times article

20 Lurking Variables There can be other variables not measured in a correlation study that may influence the interpretation of relationships among those variables. –Lurking Variables It is possible to show, for example, that there is a high correlation between shoe size and intelligence for a group of children varying in age from, say, 4 to 15. –What is the lurking variable? To control for age, we can calculate the correlation between shoe size and IQ for each of the different ages. –Age 4, 5, 6, …

21 Correlation Between Shoe Size and IQ? Age Shoe Size IQ

22 Using Averaged Data Be careful when applying the results of a study that uses averages to individuals. Correlations based on averages are usually too high when applied to individuals. Problem 4.31, p. 231

23 Causation Simply put, a strong correlation between two variables says nothing about one variable causing the other. One variable may in fact cause the other to change, but a correlation or LSR line cannot tell us that. –More investigation is needed! A designed study with proper experimental controls should be used.

24 Figure 4.22, p. 232 Causation Common Response Confounding

25 Confounding The effects of two variables on a response variable are said to be confounded when they cannot be distinguished from one another. –Definition: Two or more variables that might have caused an effect were simultaneously present, so that we do not know to which to attribute the effect.

26 Confounding Example (adopted from your text, pp ) A study of Mexican American girls recorded Body Mass Index (BMI) for girls and their mothers. The study also measured hours of TV, minutes of physical activity, and diet. The strongest correlation (r=0.506) was between BMI of daughters and BMI of mothers. –BMI is to some extent hereditary, so it is easy to see a causal effect here.

27 Confounding Example, cont. However, there are other possible explanations: –Mothers who are overweight also set an example of little exercise, poor eating habits, and lots of TV. Their daughters pick up these habits to some extent, so the influence of heredity is mixed up with influences from the girls’ environment. This is confounding.

28 Causation Smoking causes lung cancer. Study hours before exam vs. exam score

29 Common Response Example 4.15, p. 233 –Item 3

30 Homework Reading through pp Problems on p. 237: –4.33, 4.34, 4.35

31 Problems 4.73, p.257

32 Problem 4.73, p. 257 Power law model might best fit, so take log of L1 and L2. Plot below of L3 and L4.

, cont. The pendulum period is proportional to the square root of its length.

Relations in Categorical Variables There are many relationships of interest to us that cannot be described by using correlation and LSR techniques. –Recall that correlation and LSR require both variables to be quantitative. Often, we want to study the relationship between two variables that are inherently categorical.

35 Two-Way Table (Ex. 4.19, p. 241) Age Group Education25 to 3435 to 5455+Total Did not complete HS 4,4749,15514,22427,853 Complete HS11,54626,48120,06058, yrs college 10,70022,61811,12744, yrs college11,06623,18310,59644,845 Total37,78681,43556,008175,230 cell

36 Two-Way Table The row variable is level of education. –In this study, is level of education the explanatory or response variable? The column variable is age. –Explanatory or response? Marginal distributions: –The distributions of education alone and age alone are called marginal distributions because their totals are in the margins: Education at the right, and age at the bottom.

37 Marginal Distributions It is often advantageous to display the marginal distribution in percents instead of raw numbers.

38 Conditional Distributions The previous graph looked at the breakdown of education levels for the entire population. Many times, however, we are looking for breakdowns (i.e., distributions) for a certain group within the population. –For example, of those people with 4+ years of college, look at the distribution across age groups. –Let’s complete a bar graph for this comparison. –This is a conditional distribution.

39 One Conditional Distribution for Example 4.19

40 Different Question What proportion of each age group received 4+ years of college education?

41 Read paragraph at the bottom of page 248.

42 One set of conditional distributions: Figure 4.27, p. 248

43 Problems 4.53, p , p. 251

44 Graph for Problem 4.59

45 Homework Read through the end of the chapter. Be sure you understand “Simpson’s Paradox.” Problem: –4.62, p. 253

46 Simpson’s Paradox Problem 4.60, p. 251 Statement of the Paradox: –Simpson’s paradox refers to the reversal of the direction of a comparison or association when data from several groups are combined to form a single group.

47 Practice/Review Problems Problem: –4.68, p. 254 –4.72 (parts a-c), p. 257

48