Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Slides:



Advertisements
Similar presentations
Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 12 Simple Linear Regression
Multiple Regression Predicting a response with multiple explanatory variables.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 12 Simple Regression
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
SIMPLE LINEAR REGRESSION
Contrasts Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Lecture 5 Correlation and Regression
Lecture 15 Basics of Regression Analysis
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Simple Linear Regression Models
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Regression. Population Covariance and Correlation.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Relationship with one independent variable.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
ENGR 610 Applied Statistics Fall Week 11 Marshall University CITE Jack Smith.
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Chapter 11: Simple Linear Regression
Analysis of Variance Harry R. Erwin, PhD
Chapter 11 Simple Regression
Checking Regression Model Assumptions
Correlation and Regression
CHAPTER 29: Multiple Regression*
Checking Regression Model Assumptions
Prepared by Lee Revere and John Large
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Estimating the Variance of the Error Terms
Presentation transcript:

Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

Regression Used when both the response and the explanatory variable are continuous Apply when a scatter plot is the appropriate graphic. Four main types: –Linear regression (straight line) –Polynomial regression (non-linear) –Non-linear regression (in general) –Non-parametric regression (no obvious functional form)

Linear Regression Worked example from book (128ff) reg.data<-read.table(“tannin.txt”,header=T) attach(reg.data) names(reg.data) plot(tannin,growth,pch=16) Uses the lm() function and a simple model growth~tannin abline(lm(growth~tannin)) fitted<-predict(lm(growth~tannin)) model… (141ff)

Tannin Data Set reg.data<- read.table("tannin.txt",header=T) attach(reg.data) names(reg.data) [1] "growth" "tannin” plot(tannin,growth,pch=16) (dots)

Tannin Plot

Linear Regression model<-lm(growth~tannin) model Call: lm(formula = growth ~ tannin) Coefficients: (Intercept) tannin abline(model)

Abline

Fitting fitted<-predict(model) fitted for(i in 1:9)lines(c(tannin[i],tannin[i]),c(growth[i],fitted[i]))

Fitted

Summary summary(model) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** tannin *** --- Residual standard error: on 7 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 7 DF, p-value:

Summary.aov summary.aov(model) Df Sum Sq Mean Sq F value Pr(>F) tannin *** Residuals <- the error variance Report summary(model) and resist the temptation to include summary.aov(model). Include the p-value from (last slide) and error variance (here) in a figure caption. Finally plot(model)

First Plot (don’t want structure here)

Second Plot (qqnorm)

Third Plot (also don’t want structure here)

Fourth Plot (influence)

Key Definitions SSE—the sum of the squares of the residuals (or error sum of squares)—this is to be minimised for the best fit SSX—∑x 2 -(∑x) 2 /n, the corrected sum of the squares of x SSY—∑y 2 -(∑y) 2 /n, the corrected sum of the squares of y. SSXY—∑xy-(∑x)(∑y)/n, the corrected sum of the products b—SSXY/SSX, the maximum likelihood estimate of the slope of the linear regression. SSR—SSXY 2 /SSX, the explained variation or the regression sum of squares. Note SSY = SSR + SSE. r—the correlation coefficient, SSXY/√(SSX  SSY)

Analysis of Variance Start with SSR, SSE, and SSY. SSY has df = n-1. SSE uses two estimated parameters (slope and intercept), so df = n-2. SSR uses a single degree of freedom since fitting the regression model to this simple data set estimated only one extra parameter (beyond the mean value of y), the slope, b. Remember SSY = SSR + SSE.

Continuing Regression variance = SSR/1. Error variance s 2 = SSE/(n-2) F = Regression variance/s 2 The null hypothesis is that the slope (b) is zero, so there is no dependence of the response on the explanatory variable. s 2 then allows us to work out the standard errors of the slope and intercept. s.e. b = √(s 2 /SSX) s.e. a = √(s 2 ∑x 2 /n  SSX)

Doing it in R model<-lm(growth~tannin) summary(model) –This produces all of the parameters and their standard errors If you want to see the analysis of variance, use summary.aov(model) Report summary(model) and resist the temptation to include summary.aov(model). Include the p-value and error variance in a figure caption. The degree of fit or coefficient of determination (r 2 ) is SSR/SSY. r (or  ) is the correlation coefficient.

Critical Appraisal Check constancy of variance and normality of errors plot(model) –Plot 1 should show no pattern –Plot 2 should show a straight line –Plot 3 repeats Plot 1 on a different scale. You don’t want to see a triangular shape. –Plot 4 shows Cook’s distance, showing those points with the most influence. You may want to investigate them to look for error or systematic effects. Remodel, removing those points and assess whether they dominate your results unduly. mcheck(model)

Be Aware! interv<-1:100/100 theta<-2*pi*interv x<-cos(theta) y<-sin(theta) plot(y,x) What's the correct functional form? regress<-lm(y~x) plot(regress)

Polynomial Regression A simple way to investigate non-linearity. Worked example. (146ff)

Non-Linear Regression Perhaps the science constrains the functional form of the relationship between a response variable and an explanatory variable, but the relationship cannot be linearized by transformations. What to do? Use nls instead of lm, precisely specify the form of the model, and define initial guesses for any parameters. summary(model) still reports the statistics, while anova(model1, model2) is used to compare models. summary.aov(model) reports the analysis of variance.

Generalised Additive Models If you see that the relationship is non-linear, but you don’t have a theory, use a generalised additive model (gam). library(mgcv) –by the way, this is not gam() from core R. model<-gam(y~s(x)) –s(x) is the default smoother, a thin plate regression spline basis. Worked example.