Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be.

Similar presentations


Presentation on theme: "Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be."— Presentation transcript:

1 Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be born” Loren Eiseley

2 4.1 Some models for scatterplots with non-linear data (pp. 176-197) Exponential growth Growth or decay function Form: Power function Form:

3 Logarithms Rules for logarithms

4 In other words… The log of a product is the sum of the logs. The log of a quotient is the difference of the logs. The log of a power is the power times the log.

5 4.2 Interpreting Correlation and Regression (pp. 206-214) Overview: Correlation and regression need to be interpreted with CAUTION. Two variables may be strongly associated, but this DOES NOT MEAN that one causes the other. High Correlation does not imply causation! We need to consider lurking variables and common response.

6 Extrapolation The use of a regression line or curve to make a prediction outside of the domain of the values of your explanatory variable x that you used to obtain your line or curve. These predictions cannot be trusted.

7 Lurking Variable A variable that affects the relationship of the variables in the study. NOT INCLUDED among the variables studied. Example: strong positive association might exist between shirt size and intelligence for teenage boys. A lurking variable is AGE. Shirt size and intelligence among teenage boys generally increases with age.

8 If there is a strong association between two variables x and y, any one of the following statements could be true: x causes y: Association DOES NOT imply causation, but causation could exist. Both x and y are responding to changes in some unobserved variable or variables. This is called common response. The effect of x on y is hopelessly mixed up with the effects of other variables on y. This is called confounding. Always a potential problem in observational studies. Can be somewhat controlled in experiments with a control group and a treatment group.

9 4.3 Relations in Categorical Data (pp. 215-226) Overview: We can see relations between two or more categorical variables by setting up tables. So far, we have studied relationships with a quantitative response variable.

10 Notation Prob(X) is the probability that X is true. Prob(X/Y) is the probability that X is true, given that Y is true

11 Two-way Table Describes the relationship between two categorical variables: Row variable Column variable Row totals and column totals give MARGINAL DISTRIBUTIONS of the two variables separately. DO NOT give any information about the relationships between the variables. Can be used in the calculation of probabilities.

12 Example: 200 employees of a company are classified according to the Table below, where A, B, and C are mutually exclusive. Have AHave BHave C Totals Female 20 40 60120 Male 30 10 40 80 Totals 50 50 100200

13 Example: (con’t) What is the probability that a randomly chosen person is female? Prob(F) = 120/200 = 60% What is the probability that a randomly chosen person has property A? Prob(A) = 50/200 = 25% If a randomly chosen person is female, what is the probability that she has property B? Prob(B/F) = 40/50 = 80% Note: equals Prob(B and F)/Prob(B)

14 Example: (con’t) If a randomly chosen person has property C, what is the probability that the individual is male? Prob(M/C) = 40/100 = 40% Note: equals Prob(C and M)/Prob(M) If a randomly chosen person has B or C, what is the probability that the person is male? Prob(M/B or C) = 50/150 = 33.3%

15 Simpson’s Paradox The reversal of the direction of a comparison or an association when data from several groups are combined to form a single group. Lurking variables are categorical. An extreme form of the fact that observed associations can be misleading when there are lurking variables.

16 Example of Simpson’s Paradox First Half of BB Season HitsTimesBat at batavg. Caldwell 60 200.300 Wilson 29 100.290 Second Half of BB Season HitsTimes Bat at bat avg. 50 200.250 1 5.200 Batting avgs. For entire season:Caldwell: 110/400 =.275 Wilson: 30/105 =.286 Calwell had a better avg. than Wilson in each half; however, Caldwell ends up with a LOWER OVERALL avg. than Wilson.


Download ppt "Chapter 4 More on Two-Variable Data “Each of us is a statistical impossibility around which hover a million other lives that were never destined to be."

Similar presentations


Ads by Google