Linear Regression Exploring relationships between two metric variables.

Slides:



Advertisements
Similar presentations
Section 10-3 Regression.
Advertisements

BA 275 Quantitative Business Methods
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
Scatter Diagrams and Linear Correlation
Simple Linear Regression
Multiple Regression Predicting a response with multiple explanatory variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Linear Regression and Correlation
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Simple Linear Regression Analysis
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Ch. 14: The Multiple Regression Model building
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation and Regression Analysis
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Linear Regression/Correlation
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
PCA Example Air pollution in 41 cities in the USA.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
6 Mar 2007EMBnet Course – Introduction to Statistics for Biologists Linear Models I Correlation and Regression.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Elementary Statistics Correlation and Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Using R for Marketing Research Dan Toomey 2/23/2015
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Robust Regression V & R: Section 6.5 Denise Hum. Leila Saberi. Mi Lam.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Examining Relationships in Quantitative Research
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Linear Models Alan Lee Sample presentation for STATS 760.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
Quantitative Methods Simple Regression.
BIVARIATE REGRESSION AND CORRELATION
CHAPTER 29: Multiple Regression*
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Correlation and Regression
Correlation and Regression
Section 5 Multiple Regression.
Simple Linear Regression
Estimating the Variance of the Error Terms
Created by Erin Hodgess, Houston, Texas
Presentation transcript:

Linear Regression Exploring relationships between two metric variables

Correlation The correlation coefficient measures the strength of a relationship between two variables The relationship involves our ability to estimate or predict a variable based on knowledge of another variable

Linear Regression The process of fitting a straight line to a pair of variables. The equation is of the form: y = a + bx x is the independent or explanatory variable y is the dependent or response variable

Linear Coefficients Given x and y, linear regression estimates values for a and b The coefficient a, the intercept, gives the value of y when x=0 The coefficient b, the slope, gives the amount that y increases (or decreases) for each increase in x

x=1:5 y <- 2.5*x + 1 plot(y~x, xlim=c(0, 5), ylim=c(0, 14), yaxp=c(0, 14, 14), las=1, pch=16) abline(lm(y~x)) points(0, 1, pch=8) points(mean(x), mean(y), cex=3) segments(c(1, 2), c(3.5, 3.5), c(2, 2), c(3.5, 6)) text(c(1.5, 2.25), c(3, 4.75), c("1", "2.5")) text(mean(x), mean(y), "x = mean(x), y = mean(y)", pos=4, offset=1) text(0, 1, "y-intercept = 1", pos=4) text(1.5, 5, "slope = 2.5/1 = 2.5", pos=2) text(2, 12, "y = x", cex=1.5)

Least Squares Many lines could fit the data depending on how we define the “best fit” Least squares regression minimizes the squared deviations between the y-values and the line

lm() Function lm() performs least squares linear regression in R Formula used to indicate Dependent/Response from Independent/Explanatory Tilde(~) separates them D~I or R~E Rcmdr Statistics | Fit model | Linear regression

> RegModel.1 <- lm(LMS~People, data=Kalahari) > summary(RegModel.1) Call: lm(formula = LMS ~ People, data = Kalahari) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) People *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 13 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 13 DF, p-value:

plot(LMS~People, data=Kalahari, pch=16, las=1) RegModel.1 <- lm(LMS~People, data=Kalahari) abline(Line) segments(Kalahari$People, Kalahari$LMS, Kalahari$People, RegModel.1$fitted, lty=2) text(12, 250, paste("y = ", as.character(round(RegModel.1$coefficients[[1]], 2)), " + ", as.character(round(RegModel.1$coefficients[[2]], 2)), "x", sep=""), cex=1.25, pos=4)

Errors Linear regression assumes all errors are in the measurement of y There are also errors in the estimation of a (intercept) and b (slope) Significance for a and b is based on the t distribution

Errors 2 The errors in the intercept and slope can be combined to develop a confidence interval for the regression line We can also compute a prediction interval which is the confidence we have in a single prediction

predict() predict() uses the results of a linear regression to predict the values of the dependent/response variable It can also produce confidence and prediction intervals: –predict(RegModel.1, data.frame(People = c(10, 20, 30)), interval="prediction")

RegModel.1 <- lm(LMS~People, data=Kalahari) plot(LMS~People, data=Kalahari, pch=16, las=1) xp<-seq(10,25,.1) yp<-predict(RegModel.1,data.frame(People=xp),int="c") matlines(xp,yp, lty=c(1,2,2),col="black") yp<-predict(RegModel.1,data.frame(People=xp),int="p") matlines(xp,yp, lty=c(1,3,3),col="black") legend("topleft", c("Confidence interval (95%)", "Prediction interval (95%)"), lty=c(2, 3))

Diagnostics Models | Graphs | Basic diagnostic plots –Look for trend in residuals –Look for change in residual variance –Look for deviation from normally distributed residuals –Look for influential data points

Diagnostics 2 influence(RegModel.1) returns –Hat (leverage) coefficients –Coefficient changes (leave one out) –Sigma, residual changes (leave one out) –wt.res, weighted residuals

Other Approaches rlm() fits a robust line that is less influenced by outliers sma() in package smatr fits standardized major axis (aka reduced major axis) regression and major axis regression – used in allometry