Linear Regression/Correlation

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Correlation and Regression
Objectives (BPS chapter 24)
Simple Linear Regression
Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
BPS - 3rd Ed. Chapter 211 Inference for Regression.
EQT 272 PROBABILITY AND STATISTICS
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Introduction to Linear Regression
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
BPS - 5th Ed. Chapter 231 Inference for Regression.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Regression
Linear Regression/Correlation
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Linear Regression/Correlation Quantitative Explanatory and Response Variables Goal: Test whether the level of the response variable is associated with (depends on) the level of the explanatory variable Goal: Measure the strength of the association between the two variables Goal: Use the level of the explanatory to predict the level of the response variable

Linear Relationships Notation: Y: Response (dependent, outcome) variable X: Explanatory (independent, predictor) variable Linear Function (Straight-Line Relation): Y = a + b X (Plot Y on vertical axis, X horizontal) Slope (b): The amount Y changes when X increases by 1 b > 0  Line slopes upward (Positive Relation) b = 0  Line is flat (No linear Relation) b < 0  Line slopes downward (Negative Relation) Y-intercept (a): Y level when X=0

Example: Service Pricing Internet History Resources (New South Wales Family History Document Service) Membership fee: $20A 20¢ ($0.20A) per image viewed Y = Total cost of service X = Number of images viewed a = Cost when no images viewed b = Incremental Cost per image viewed Y = a + b X = 20+0.20X

Example: Service Pricing

Probabilistic Models In practice, the relationship between Y and X is not “perfect”. Other sources of variation exist. We decompose Y into 2 components: Systematic Relationship with X: a + b X Random Error: e Random respones can be written as the sum of the systematic (also thought of as the mean) and random components: Y = a + b X + e The (conditional on X) mean response is: E(Y) = a + b X

Least Squares Estimation Problem: a, b are unknown parameters, and must be estimated and tested based on sample data. Procedure: Sample n individuals, observing X and Y on each one Plot the pairs Y (vertical axis) versus X (horizontal) Choose the line that “best fits” the data. Criteria: Choose line that minimizes sum of squared vertical distances from observed data points to line. Least Squares Prediction Equation:

Example - Pharmacodynamics of LSD Response (Y) - Math score (mean among 5 volunteers) Predictor (X) - LSD tissue concentration (mean of 5 volunteers) Raw Data and scatterplot of Score vs LSD concentration: Source: Wagner, et al (1968)

Example - Pharmacodynamics of LSD (Column totals given in bottom row of table)

SPSS Output and Plot of Equation

Example - Retail Sales U.S. SMSA’s Y = Per Capita Retail Sales X = Females per 100 Males

Residuals Residuals (aka Errors): Difference between observed values and predicted values: Error sum of squares: Estimate of (conditional) standard deviation of Y:

Linear Regression Model Data: Y = a + b X + e Mean: E(Y) = a + b X Conditional Standard Deviation: s Error terms (e) are assumed to be independent and normally distributed

Example - Pharmacodynamics of LSD

Correlation Coefficient Slope of the regression describes the direction of association (if any) between the explanatory (X) and response (Y). Problems: The magnitude of the slope depends on the units of the variables The slope is unbounded, doesn’t measure strength of association Some situations arise where interest is in association between variables, but no clear definition of X and Y Population Correlation Coefficient: r Sample Correlation Coefficient: r

Correlation Coefficient Pearson Correlation: Measure of strength of linear association: Does not delineate between explanatory and response variables Is invariant to linear transformations of Y and X Is bounded between -1 and 1 (higher values in absolute value imply stronger relation) Same sign (positive/negative) as slope

Example - Pharmacodynamics of LSD Using formulas for standard deviation from beginning of course: sX = 1.935 and sY = 18.611 From previous calculations: b = -9.01 This represents a strong negative association between math scores and LSD tissue concentration

Coefficient of Determination Measure of the variation in Y that is “explained” by X Step 1: Ignoring X, measure the total variation in Y (around its mean): Step 2: Fit regression relating Y to X and measure the unexplained variation in Y (around its predicted values): Step 3: Take the difference (variation in Y “explained” by X), and divide by total:

Example - Pharmacodynamics of LSD TSS SSE

Inference Concerning the Slope (b) Parameter: Slope in the population model (b) Estimator: Least squares estimate: b Estimated standard error: Methods of making inference regarding population: Hypothesis tests (2-sided or 1-sided) Confidence Intervals

Significance Test for b 1-sided Test H0: b = 0 HA+: b > 0 or HA-: b < 0 2-Sided Test H0: b = 0 HA: b  0

(1-a)100% Confidence Interval for b Conclude positive association if entire interval above 0 Conclude negative association if entire interval below 0 Cannot conclude an association if interval contains 0 Conclusion based on interval is same as 2-sided hypothesis test

Example - Pharmacodynamics of LSD Testing H0: b = 0 vs HA: b  0 95% Confidence Interval for b : t.025,5

Analysis of Variance in Regression Goal: Partition the total variation in y into variation “explained” by x and random variation These three sums of squares and degrees of freedom are: Total (TSS) dfTotal = n-1 Error (SSE) dfError = n-2 Model (SSR) dfModel = 1

Analysis of Variance in Regression Analysis of Variance - F-test H0: b = 0 HA: b  0 F represents the F-distribution with 1 numerator and n-2 denominator degrees of freedom

Example - Pharmacodynamics of LSD Total Sum of squares: Error Sum of squares: Model Sum of Squares:

Example - Pharmacodynamics of LSD Analysis of Variance - F-test H0: b = 0 HA: b  0

Example - SPSS Output

Significance Test for Pearson Correlation Test identical (mathematically) to t-test for b, but more appropriate when no clear explanatory and response variable H0: r = 0 Ha: r  0 (Can do 1-sided test) Test Statistic: P-value: 2P(t|tobs|)

Model Assumptions & Problems Linearity: Many relations are not perfectly linear, but can be well approximated by straight line over a range of X values Extrapolation: While we can check validity of straight line relation within observed X levels, we cannot assume relationship continues outside this range Influential Observations: Some data points (particularly ones with extreme X levels) can exert a large influence on the predicted equation.