Linear Regression Models

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Chapter 12 Simple Linear Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
BA 555 Practical Business Analysis
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Lecture 4 Page 1 CS 239, Spring 2007 Models and Linear Regression CS 239 Experimental Methodologies for System Software Peter Reiher April 12, 2007.
REGRESSION AND CORRELATION
Pertemua 19 Regresi Linier
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation & Regression
Correlation and Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
© 1998, Geoff Kuenning Comparison Methodology Meaning of a sample Confidence intervals Making decisions and comparing alternatives Special considerations.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
Statistics for Managers using Microsoft Excel 3rd Edition
Inference for Regression
Statistics for Business and Economics (13e)
Two-Factor Full Factorial Designs
Relationship with one independent variable
Chapter 13 Simple Linear Regression
Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Prepared by Lee Revere and John Large
Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
Relationship with one independent variable
Basic Practice of Statistics - 3rd Edition Inference for Regression
SIMPLE LINEAR REGRESSION
CHAPTER 14 MULTIPLE REGRESSION
Replicated Binary Designs
SIMPLE LINEAR REGRESSION
Simple Linear Regression
One-Factor Experiments
St. Edward’s University
Presentation transcript:

Linear Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis

Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions Verifying assumptions visually

What Is a (Good) Model? For correlated data, model predicts response given an input Model should be equation that fits data Standard definition of “fits” is least-squares Minimize squared error Keep mean error zero Minimizes variance of errors

Least-Squared Error If then error in estimate for xi is Minimize Sum of Squared Errors (SSE) Subject to the constraint

Estimating Model Parameters Best regression parameters are where Note error in book!

Parameter Estimation Example Execution time of a script for various loop counts: = 6.8, = 2.32, xy = 88.7, x2 = 264 b0 = 2.32  (0.30)(6.8) = 0.28

Graph of Parameter Estimation Example

Allocating Variation If no regression, best guess of y is Observed values of y differ from , giving rise to errors (variance) Regression gives better guess, but there are still errors We can evaluate quality of regression by allocating sources of errors

The Total Sum of Squares Without regression, squared error is

The Sum of Squares from Regression Recall that regression error is Error without regression is SST So regression explains SSR = SST - SSE Regression quality measured by coefficient of determination

Evaluating Coefficient of Determination Compute where R = R(x,y) = correlation(x,y)

Example of Coefficient of Determination For previous regression example y = 11.60, y2 = 29.88, xy = 88.7, SSE = 29.88-(0.28)(11.60)-(0.30)(88.7) = 0.028 SST = 29.88-26.9 = 2.97 SSR = 2.97-.03 = 2.94 R2 = (2.97-0.03)/2.97 = 0.99

Standard Deviation of Errors Variance of errors is SSE divided by degrees of freedom DOF is n2 because we’ve calculated 2 regression parameters from the data So variance (mean squared error, MSE) is SSE/(n2) Standard dev. of errors is square root: (minor error in book)

Checking Degrees of Freedom Degrees of freedom always equate: SS0 has 1 (computed from ) SST has n1 (computed from data and , which uses up 1) SSE has n2 (needs 2 regression parameters) So We’ve talked about DOF for all except SSR. The above equation defines the DOF for SSR (indirectly).

Example of Standard Deviation of Errors For regression example, SSE was 0.03, so MSE is 0.03/3 = 0.01 and se = 0.10 Note high quality of our regression: R2 = 0.99 se = 0.10 Our samples were generated by timing a shell script that looped for different x values.

Confidence Intervals for Regressions Regression is done from a single population sample (size n) Different sample might give different results True model is y = 0 + 1x Parameters b0 and b1 are really means taken from a population sample

Calculating Intervals for Regression Parameters Standard deviations of parameters: Confidence intervals are where t has n - 2 degrees of freedom Not divided by sqrt(n)

Example of Regression Confidence Intervals Recall se = 0.13, n = 5, x2 = 264, = 6.8 So Using 90% confidence level, t0.95;3 = 2.353

Regression Confidence Example, cont’d Thus, b0 interval is Not significant at 90% And b1 is Significant at 90% (and would survive even 99.9% test)

Confidence Intervals for Predictions Previous confidence intervals are for parameters How certain can we be that the parameters are correct? Purpose of regression is prediction How accurate are the predictions? Regression gives mean of predicted response, based on sample we took

Predicting m Samples Standard deviation for mean of future sample of m observations at xp is Note deviation drops as m  Variance minimal at x = Use t-quantiles with n–2 DOF for interval

Example of Confidence of Predictions Using previous equation, what is predicted time for a single run of 8 loops? Time = 0.28 + 0.30(8) = 2.68 Standard deviation of errors se = 0.10 90% interval is then

Prediction Confidence Red dot is mean; blue lines indicate confidence intervals for predictions. Prediction is most accurate at mean. As you get farther from the mean, confidence drops. If you try to predict very far outside the sampled data, confidence is very poor.

Verifying Assumptions Visually Regressions are based on assumptions: Linear relationship between response y and predictor x Or nonlinear relationship used in fitting Predictor x nonstochastic and error-free Model errors statistically independent With distribution N(0,c) for constant c If assumptions violated, model misleading or invalid

Testing Linearity Scatter plot x vs. y to see basic curve type Linear Piecewise Linear Outlier Nonlinear (Power)

Testing Independence of Errors Scatter-plot i versus Should be no visible trend Example from our curve fit:

More on Testing Independence May be useful to plot error residuals versus experiment number In previous example, this gives same plot except for x scaling No foolproof tests “Independence” test really disproves particular dependence Maybe next test will show different dependence

Testing for Normal Errors Prepare quantile-quantile plot of errors Example for our regression:

Testing for Constant Standard Deviation Tongue-twister: homoscedasticity Return to independence plot Look for trend in spread Example:

Linear Regression Can Be Misleading Regression throws away some information about the data To allow more compact summarization Sometimes vital characteristics are thrown away Often, looking at data plots can tell you whether you will have a problem

Example of Misleading Regression I II III IV x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.50 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89

What Does Regression Tell Us About These Data Sets? Exactly the same thing for each! N = 11 Mean of y = 7.5 Y = 3 + .5 X Standard error of regression is 0.118 All the sums of squares are the same Correlation coefficient = .82 R2 = .67

Now Look at the Data Plots I II III IV

White Slide