1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot.

Slides:



Advertisements
Similar presentations
Linear regression models in R (session 1) Tom Price 3 March 2009.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Objectives 10.1 Simple linear regression
Inference for Linear Regression (C27 BVD). * If we believe two variables may have a linear relationship, we may find a linear regression line to model.
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Regression Inferential Methods
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
CHAPTER 24: Inference for Regression
Objectives (BPS chapter 24)
Chapter 12 Simple Regression
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Multivariate Data Analysis Chapter 4 – Multiple Regression.
The Simple Regression Model
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Linear Regression.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Simple Linear Regression
Inferences for Regression
A.P. STATISTICS LESSON 14 – 2 ( DAY 2) PREDICTIONS AND CONDITIONS.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Exploratory Data Analysis Observations of a single variable.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Linear Models Alan Lee Sample presentation for STATS 760.
Example x y We wish to check for a non zero correlation.
1 Stat 6601 Presentation Presented by: Xiao Li (Winnie) Wenlai Wang Ke Xu Nov. 17, 2004 V & R 6.6.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
Linear Regression Hypothesis testing and Estimation.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 12 Simple Linear Regression and Correlation
Regression.
AP Statistics Chapter 14 Section 1.
Inference for Regression
Correlation and regression
Statistical Methods For Engineers
Chapter 12 Simple Linear Regression and Correlation
BASIC REGRESSION CONCEPTS
Simple Linear Regression
CHAPTER 12 More About Regression
Inference for Regression
Presentation transcript:

1 Statistics & R, TiP, 2011/12 Linear Models & Smooth Regression  Linear models  Diagnostics  Robust regression  Bootstrapping linear models  Scatterplot smoothing  Spline regression  Non-linear regression

2 Statistics & R, TiP, 2011/12  Linear Models  Linear in parameters not necessarily fitting a straight line  General R statement is lm(formula) lm for l inear m odel  formula : Dependent variable ~ linear function of independent variables  ~ means here “is related to”

3 Statistics & R, TiP, 2011/12  e.g. lm(time~dist) fits model time i =  +  dist i + error i and produces estimates of  and   e.g. lm(time~dist + climb) fits model time i =  +  1 dist i +  2 climb i + error i  lm() produces an object which can by examined in usual way with summary() and other special commands

4 Statistics & R, TiP, 2011/12  e.g. hillslm<-lm(time~dist)  produces object hillslm  hillslm$coefficients gives vector of coefficients  hillslm$fitted gives fitted values  hillslm$residuals gives vector of residuals i.e. estimates of the errors in model time i =  +  dist i + errror i residual = time i – fitted i

5 Statistics & R, TiP, 2011/12  Regression Diagnostics  analysis of the residuals can indicate whether model is satisfactory:  if model is appropriate for the data then Residuals should look like a random sample from a Normal distribution Residuals should be independent of fitted values  easy to check these graphically plot.lm(hillslm) will give basic checks

6 Statistics & R, TiP, 2011/12  Example: Scottish Hill Races  in data set hills in the MASS library > library(MASS) > data(hills) > attach(hills) > plot(dist,time) > identify(dist,time,row.names(hills)) [1]  Note use of identify() which allows interactive identification of points good to identify obvious outliers

7 Statistics & R, TiP, 2011/12

8  Now fit model and look at results  Note five summary statistics of residuals and estimates of coefficients with st. errs. Also a lot more detail — more than usually needed on any single occasion > hillslm<- lm(time~dist) > summary(hillslm) Call: lm(formula = time ~ dist) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) dist e-15 ***

9 Statistics & R, TiP, 2011/12  Look at basic diagnostics with > plot.lm(hillslm)

10 Statistics & R, TiP, 2011/12 Outliers and so plot does not look random Points not on a straight line so deviation from Normality (Other two plots of less interest here)

11 Statistics & R, TiP, 2011/12  Next Steps:  Remove outliers from data > lm(time~dist,data=hills[-7,]) Removes 7 th observation > lm(time~dist,data=hills[-c(7,18),]) Removes both 7 th and 18 th  Use a robust method for estimating linear relationship which is not so greatly affect by outliers

12 Statistics & R, TiP, 2011/12  Robust Regression  Available routines: rlm() in MASS library and lqs() in lqs library  Example on hills data: hillslm1<-lm(time~dist, data=hills[-c(7,18),]) Gives intercept and slope as –5.81 and 7.91 hillsrlm<-rlm(time~dist) Gives intercept and slope as –6.36 and (and with smaller standard errors of estimates)

13 Statistics & R, TiP, 2011/12

14 Statistics & R, TiP, 2011/12 outliers

15 Statistics & R, TiP, 2011/12 Data: phone calls in Belgium > plot(year,calls) > lines(abline(lm(calls~year))) > lines(abline(rlm(calls~year))) robust regression line ordinary regression line outliers

16 Statistics & R, TiP, 2011/12  Bootstrapping linear models  May not want to assume a Normal error structure and if so then we may want some bootstrap estimate of (say) a confidence interval for the slope  Not appropriate to sample points {(x i,y i );i=1,…n} Since usually the x-values are fixed  Instead have to bootstrap residuals

17 Statistics & R, TiP, 2011/12  Steps:-  Fit regression (with lm() or rlm() or …)  Extract residuals with obj$residuals  Extract coefficients with obj$coefficients  Take bootstrap samples of residuals and construct new bootstrap data sets with actual x-values + estimated coefficients + bootstrapped residuals  Calculate quantity of interest

18 Statistics & R, TiP, 2011/12  Scatterplot smoothing  tool for informal investigation of structure in scatterplot  can see increase in distance with speed but is this constant or does it tail upwards? use lowess

19 Statistics & R, TiP, 2011/12 Suggests some curvature upwards

20 Statistics & R, TiP, 2011/12  Spline regression  Polynomial regression often not satisfactory since behaviour of a polynomial depends upon values over its entire range  Instead, spline regression uses ‘local piecewise polynomials’ which adapt to local values better and do not influence distant ones

21 Statistics & R, TiP, 2011/12 Data Default lowess Cubic regression 8 th degree polynomial regression wiggles

22 Statistics & R, TiP, 2011/12 Spline regression with different smoothing parameters df=15 df=5 df=10 df=20 df=10 looks best choice

23 Statistics & R, TiP, 2011/12  Non-linear regression  May be external reasons for specifying a particular non-linear model  e.g of form  Can be estimated using routine nls() in library nls ( n on-linear l east s quares) Need to specify starting values

24 Statistics & R, TiP, 2011/12  Summary  Seen how regression diagnostics leads to dropping outliers or use of robust regression methods  Ideas of bootstrapping linear models  Scatterplot smoothing useful informal tool Use of lowess()  Smooth regression with splines  Availability of non-linear regression