Chapter 14: Correlation and Regression

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and Linear Regression.
Correlation and Regression
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Objectives (BPS chapter 24)
Correlation and Linear Regression
Statistics for the Social Sciences
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
The Simple Regression Model
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
15: Linear Regression Expected change in Y per unit X.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Examining Relationships in Quantitative Research
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Chapter 11: Linear Regression and Correlation
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
Introduction to Regression Analysis
Topic 10 - Linear Regression
Linear Regression.
CHAPTER 3 Describing Relationships
Correlation and Simple Linear Regression
Inference for Regression
Chapter 11: Simple Linear Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Inference about the Slope and Intercept
Linear Regression/Correlation
Review for Exam 2 Some important themes from Chapters 6-9
Inference about the Slope and Intercept
Unit 3 – Linear regression
Correlation and Simple Linear Regression
Chapter 14 Inference for Regression
Simple Linear Regression
Simple Linear Regression and Correlation
Linear Regression and Correlation
Product moment correlation
Chapter 14 Inference for Regression
Linear Regression and Correlation
Algebra Review The equation of a straight line y = mx + b
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 14: Correlation and Regression 9/18/2018 9/18/2018 Chapter 14: Correlation and Regression 9/18/2018 Basic Biostatistics 1

In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 9/18/2018 9/18/2018 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression 9/18/2018 Basic Biostatistics 2

Data Quantitative explanatory variable X Quantitative response variable Y Objective: To quantify the linear relationship between X and Y 9/18/2018

Illustrative Data (Doll, 1955) lung cancer mortality per 100,000 in 1950 (Y) per capita cigarette consumption (X) per capita cigarette consumption (X) n = 11 9/18/2018

Scatterplot Assess: Form Direction of association Outliers Strength of relation 9/18/2018

Doll, 1955 Form: linear Direction: positive association Outlier: no clear outliers Strength: difficult to determine by eye 9/18/2018

Correlation Coefficient r r ≡ Pearson’s product-moment correlation coefficient Measures degree to which X and Y “go together” Always between −1 and 1 r ≈ 0  no correlation r > 0  positive correlation r < 0  negative correlation Closer r is to 1 or −1, the stronger the correlation Karl Pearson 1857 - 1936 9/18/2018

Correlational Direction and Strength 9/18/2018

Interpretation of r Direction of association: positive, negative, ~0 Strength of association close to 1 or –1  “strong” close to 0  “weak” guidelines if |r| ≥ .7  say “strong” if |r| ≤ .3  say “weak” 9/18/2018

By hand, calculator or computer program Calculating r By hand, calculator or computer program We opt for latter 9/18/2018

SPSS > Analyze > Correlate > Bivariate SPSS output SPSS > Analyze > Correlate > Bivariate r r = 0.74 indicates a strong, positive association 9/18/2018

Coefficient of determination (r2) Square the correlation coefficient  r2 = proportion of variance in Y mathematically explained by X Illustrative data: r2 = 0.7372 = 0.54  54% of variance in lung cancer mortality is mathematically explained per capita smoking rates 9/18/2018

Cautions Outliers Non-linear relations Confounding (correlation is NOT causation) Randomness 9/18/2018 16

Outliers can have profound influence on r These data have r = 0.82 all because of this guy 9/18/2018

This strong relationship is missed by r because it is not linear Linear Relations Only r = 0.00 This strong relationship is missed by r because it is not linear 9/18/2018

Confounding Correlation ≠ Causation William Farr showed this strong negative correlation between cholera mortality and elevation above sea level in defense of miasma theory However, he failed to account for the fact that people who lived at low elevations were more likely to drink from contaminated water sources ( confounding) 9/18/2018

Don’t be fooled by randomness Selection of specific data points would result in a false correlation 9/18/2018

Hypothesis Test Test the claim H0: ρ = 0 where ρ ≡ correlation coefficient parameter SPSS > Analyze > Correlate > Bivariate output:  P = .010 (two-sided)  reliable evidence against H0  the correlation is statistically significant 9/18/2018

Bivariate Normality Strictly speaking: P-value requires Normality of the joint distribution of X and Y (“bivariate Normality”) 9/18/2018

Regression model (equation for line): ŷi = a + b∙Xi where ŷi ≡ predicted value of Y at xi a ≡ intercept coefficient b = slope coefficient 9/18/2018

Least Squares Line Residual ≡ distance of data point from regression line (dotted) The best fitting line minimizes the residuals Determine a and b of best fitting line via formula, calculator, or computer. 9/18/2018

Coefficient by SPSS Analyze > Regression > Linear Slope estimate (b) Intercept estimate (a) Regression line: ŷ = 6.756 + 0.02284 ∙ X 9/18/2018

ŷ = 6.756 + 0.0284 ∙ X Slope = “rise over run” .0228 increase per unit X “Rise” over 200 units = 200 ∙ .0228 = 5.68 6.756 (intercept) 9/18/2018 31

Population Regression Model where α ≡ intercept parameter β ≡ slope parameter εi ≡ residual error, observation i Objective: To estimate β with (1 – α)100% confidence 9/18/2018

CI for β Analyze > Regression > Linear > Statistics SPSS statistics options Dialogue box 95% CI for β 95% CI for β (.007 to.039) 9/18/2018

Testing H0: β = 0 P = .010  evidence against H0 is good tstat P value df = n – 2 = 11 – 2 = 9 P = .010  evidence against H0 is good  the slope is statistically significant 9/18/2018

Conditions for Regression Inference Linearity Independent observations Normality Equal variance (homoscedasticity) 9/18/2018

Assessing L.I.N.E Inspect scatterplot for linearity Inspect residuals for linearity Normality equal variance 9/18/2018

Assessing Conditions   -1|6 -0|2336 0|01366 1|4 x10 no major departures from Normality   9/18/2018

Residual plotted against X values Data too sparse to assess 9/18/2018

Example of linearity with equal variance Residual Plot Example of linearity with equal variance 9/18/2018

Residual Plot Example of linearity with unequal variance 9/18/2018

Residual Plot Example of non-linearity with equal variance 9/18/2018