September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Lesson 10: Linear Regression and Correlation
Inference for Regression
Scatter Diagrams and Linear Correlation
Correlation and Linear Regression
Describing the Relation Between Two Variables
The Simple Regression Model
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
15: Linear Regression Expected change in Y per unit X.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Correlation and Linear Regression
Correlation and Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Linear Regression and Correlation
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 8: Simple Linear Regression Yang Zhenlin.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Statistics 200 Lecture #6 Thursday, September 8, 2016
Topic 10 - Linear Regression
Correlation and Simple Linear Regression
Chapter 14: Correlation and Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Linear Regression/Correlation
Chapter 10 Correlation and Regression
Unit 4 Vocabulary.
Basic Practice of Statistics - 3rd Edition Inference for Regression
Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

September 151

2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

Data Quantitative response variable Y (“dependent variable”) Quantitative explanatory variable X (“independent variable”) Historically important public health data set used to illustrate techniques (Doll, 1955) –n = 11 countries –Explanatory variable = per capita cigarette consumption in 1930 (CIG1930) –Response variable = lung cancer mortality per 100,000 (LUNGCA)

4 Data, cont.

5 §14.2 Scatterplot Bivariate (x i, y i ) points plotted as scatter plot.

6 Inspect scatterplot’s Form: Can the relation be described with a straight or some other type of line? Direction: Do points tend trend upward or downward? Strength of association: Do point adhere closely to an imaginary trend line? Outliers (in any): Are there any striking deviations from the overall pattern?

7 Judging Correlational Strength Correlational strength refers to the degree to which points adhere to a trend line The eye is not a good judge of strength. The top plot appears to show a weaker correlation than the bottom plot. However, these are plots of the same data sets. (The perception of a difference is an artifact of axes scaling.)

8 §14.3. Correlation Correlation coefficient r quantifies linear relationship with a number between −1 and 1. When all points fall on a line with an upward slope, r = 1. When all data points fall on a line with a downward slope, r = −1 When data points trend upward, r is positive; when data points trend downward, r is negative. The closer r is to 1 or −1, the stronger the correlation.

9 Examples of correlations

10 Calculating r Formula Correlation coefficient tracks the degree to which X and Y “go together.” Recall that z scores quantify the amount a value lies above or below its mean in standard deviations units. When z scores for X and Y track in the same direction, their products are positive and r is positive (and vice versa).

11 Calculating r, Example

12 Calculating r In practice, we rely on computers and calculators to calculate r. I encourage my students to use these tools whenever possible.

13 Calculating r SPSS output for Analyze > Correlate > Bivariate using the illustrative data:

14 Interpretation of r 1.Direction. The sign of r indicates the direction of the association: positive (r > 0), negative (r < 0), or no association (r ≈ 0). 2.Strength. The closer r is to 1 or −1, the stronger the association. 3.Coefficient of determination. The square of the correlation coefficient (r 2 ) is called the coefficient of determination. This statistic quantifies the proportion of the variance in Y [mathematically] “explained” by X. For the illustrative data, r = and r 2 = Therefore, 54% of the variance in Y is explained by X.

15 Notes, cont. 4. Reversible relationship. With correlation, it does not matter whether variable X or Y is specified as the explanatory variable; calculations come out the same either way. [This will not be true for regression.] 5. Outliers. Outliers can have a profound effect on r. This figure has an r of 0.82 that is fully accounted for by the single outlier.

16 Notes, cont. 6. Linear relations only. Correlation applies only to linear relationships This figure shows a strong non-linear relationship, yet r = Correlation does not necessarily mean causation. Beware lurking variables (next slide).

17 Confounded Correlation A near perfect negative correlation (r = −.987) was seen between cholera mortality and elevation above sea level during a 19th century epidemic. We now know that cholera is transmitted by water. The observed relationship between cholera and elevation was confounded by the lurking variable proximity to polluted water.

18 Hypothesis Test We conduct the hypothesis test to guard against identifying too many random correlations. Random selection from a random scatter can result in an apparent correlation

19 Hypothesis Test A.Hypotheses. Let ρ represent the population correlation coefficient. H 0 : ρ = 0 vs. H a : ρ ≠ 0 (two-sided) [or H a : ρ > 0 (right-sided) or H a : ρ < 0 (left-sided)] B.Test statistic C.P-value. Convert t stat to P-value with software or Table C.

20 Hypothesis Test – Illustrative Example A.H 0 : ρ = 0 vs. H a : ρ ≠ 0 (two-sided) B.Test stat C..005 < P <.01 by Table C. P =.0097 by computer. The evidence against H 0 is highly significant.

21 Confidence Interval for ρ

22 Confidence Interval for ρ

23 Conditions for Inference Independent observations Bivariate Normality (r can still be used descriptively when data are not bivariate Normal)

24 §14.4. Regression Regression describes the relationship in the data with a line that predicts the average change in Y per unit X. The best fitting line is found by minimizing the sum of squared residuals, as shown in this figure.

25 Regression Line, cont. The regression line equation is: where ŷ ≡ predicted value of Y, a ≡ the intercept of the line, and b ≡ the slope of the line Equations to calculate a and b SLOPE: INTERCEPT:

26 Regression Line, cont. Slope b is the key statistic produced by the regression

27 Regression Line, illustrative example Here’s the output from SPSS:

28 Let α represent the population intercept, β represent population slope, and ε i represent the residual “error” for point i. The population regression model is The estimated standard error of the regression is A (1−α)100% CI for population slope β is Inference

29 Confidence Interval for β–Example

30 t Test of Slope Coefficient A.Hypotheses. H 0 : β = 0 against H a : β ≠ 0 B.Test statistic. C. P-value. Convert the t stat to a P-value

31 t Test: Illustrative Example

32 Analysis of Variance of the Regression Model An ANOVA technique equivalent to the t test can also be used to test H 0 : β = 0. This technique is covered on pp. 321 – 324 in the text but is not included in this presentation.

33 Conditions for Inference Inference about the regression line requires these conditions Linearity Independent observations Normality at each level of X Equal variance at each level of X

34 Conditions for Inference This figure illustrates Normal and equal variation around the regression line at all levels of X

35 Assessing Conditions The scatterplot should be visually inspected for linearity, Normality, and equal variance Plotting the residuals from the model can be helpful in this regard. The table lists residuals for the illustrative data

36 Assessing Conditions, cont. A stemplot of the residuals show no major departures from Normality This residual plot shows more variability at higher X values (but the data is very sparse) |-1|6 |-0|2336 | 0|01366 | 1|4 x10

37 Residual Plots With a little experience, you can get good at reading residual plots. Here’s an example of linearity with equal variance.

38 Residual Plots Example of linearity with unequal variance

39 Example of Residual Plots Example of non-linearity with equal variance