Lecture 4 Page 1 CS 239, Spring 2007 Models and Linear Regression CS 239 Experimental Methodologies for System Software Peter Reiher April 12, 2007
Lecture 4 Page 2 CS 239, Spring 2007 Introduction Models
Lecture 4 Page 3 CS 239, Spring 2007 Often desirable to predict how a system would behave –For situations you didn’t test One approach is to build a model of the behavior –Based on situations you did test How does one build a proper model? Modeling Data
Lecture 4 Page 4 CS 239, Spring 2007 Linear Models A simple type of model Based on assumption that phenomenon has linear behavior Of the form x is the stimulus is the response b 0 and b 1 are the modeling parameters
Lecture 4 Page 5 CS 239, Spring 2007 Building a Linear Model Gather data for some range of x’s Use mathematical methods to estimate b 0 and b 1 –Based on the data gathered Analyze resulting model to determine its accuracy
Lecture 4 Page 6 CS 239, Spring 2007 For correlated data, model predicts response given an input Model should be equation that fits data Standard definition of “fits” is least-squares –Minimize squared error –While keeping mean error zero –Minimizes variance of errors Building a Good Linear Model
Lecture 4 Page 7 CS 239, Spring 2007 Least Squared Error If then error in estimate for x i is Minimize Sum of Squared Errors (SSE) Subject to the constraint
Lecture 4 Page 8 CS 239, Spring 2007 Best regression parameters are where Estimating Model Parameters
Lecture 4 Page 9 CS 239, Spring 2007 Parameter Estimation Example Execution time of a script for various loop counts: = 6.8, = 2.32, xy = 88.54, x 2 = 264 b 0 = 2.32 (0.29)(6.8) = 0.35
Lecture 4 Page 10 CS 239, Spring 2007 Finding b 0 and b 1 =.348
Lecture 4 Page 11 CS 239, Spring 2007 Graph of Parameter Estimation Example y = x
Lecture 4 Page 12 CS 239, Spring 2007 If no regression, best guess of y is Observed values of y differ from, giving rise to errors (variance) Regression gives better guess, but there are still errors We can evaluate quality of regression by allocating sources of errors Allocating Variation in Regression
Lecture 4 Page 13 CS 239, Spring 2007 The Total Sum of Squares (SST) Without regression, squared error is
Lecture 4 Page 14 CS 239, Spring 2007 The Sum of Squares from Regression Recall that regression error is SSE = Error without regression is SST So regression explains SSR = SST - SSE Regression quality measured by coefficient of determination
Lecture 4 Page 15 CS 239, Spring 2007 Evaluating Coefficient of Determination (R 2 ) Compute
Lecture 4 Page 16 CS 239, Spring 2007 Example of Coefficient of Determination For previous regression example y = 11.60, y 2 = 29.79, xy = b 0 =.35 b 1 =.29
Lecture 4 Page 17 CS 239, Spring 2007 Continuing the Example SSE = - - SST = - SSR = - R 2 = / Σ y 2 b 0 Σ yb 1 Σ xy * *88.54 = Σ y = 2.89 SST SSE = 2.84 SSR2.84 SST2.89= 0.98 So regression explains most of variation
Lecture 4 Page 18 CS 239, Spring 2007 Standard Deviation of Errors Variance of errors is SSE divided by degrees of freedom –DOF is n 2 because we’ve calculated 2 regression parameters from the data –So variance (mean squared error, MSE) is SSE/(n 2)
Lecture 4 Page 19 CS 239, Spring 2007 Stdev of Errors, Con’t Standard deviation of errors is square root of mean squared error:
Lecture 4 Page 20 CS 239, Spring 2007 Checking Degrees of Freedom Degrees of freedom always equate: –SS0 has 1 (computed from ) –SST has n 1 (computed from data and, which uses up 1) –SSE has n 2 (needs 2 regression parameters) –So
Lecture 4 Page 21 CS 239, Spring 2007 Example of Standard Deviation of Errors For our regression example, SSE was 0.05, –MSE is 0.05/3 = and s e = 0.13 Note high quality of our regression: –R 2 = 0.98 –s e = 0.13 –Why such a nice straight-line fit?
Lecture 4 Page 22 CS 239, Spring 2007 Regression is done from a single population sample (size n) –Different sample might give different results –True model is y = 0 + 1 x –Parameters b 0 and b 1 are really means taken from a population sample How Sure Are We of Parameters?
Lecture 4 Page 23 CS 239, Spring 2007 Confidence Intervals of Regression Parameters Since b 0 and b 1 are only samples, How confident are we that they are correct? We express this with confidence intervals of the regression Statistical expressions of likely bounds for true parameters 0 and 1
Lecture 4 Page 24 CS 239, Spring 2007 Calculating Intervals for Regression Parameters Standard deviations of parameters: Confidence intervals are where t has n - 2 degrees of freedom
Lecture 4 Page 25 CS 239, Spring 2007 Example of Regression Confidence Intervals Recall s e = 0.13, n = 5, x 2 = 264, = 6.8 So Using a 90% confidence level, t 0.95;3 = 2.353
Lecture 4 Page 26 CS 239, Spring 2007 Regression Confidence Example, cont’d Thus, b 0 interval is And b 1 is
Lecture 4 Page 27 CS 239, Spring 2007 Are Regression Parameters Significant? Usually the question is “are they significantly different than zero?” If not, simpler model by dropping that term Answered in usual way: –Does their confidence interval include zero? –If so, not significantly different than zero At that level of confidence
Lecture 4 Page 28 CS 239, Spring 2007 Are Example Parameters Significant? b 0 interval is (-0.03, 0.73) –Not significantly different than zero at 90% confidence b 1 interval is (0.28,0.3) –Significantly different than zero at 90% confidence –Even significantly different at 99% confidence Maybe OK not to include b 0 term in model
Lecture 4 Page 29 CS 239, Spring 2007 Confidence Intervals for Predictions Previous confidence intervals are for parameters –How certain can we be that the parameters are correct? –They say the parameters are likely to be within a certain range
Lecture 4 Page 30 CS 239, Spring 2007 But What About Predictions? Purpose of regression is prediction –To predict system behavior for values we didn’t test –How accurate are such predictions? –Regression gives mean of predicted response, based on sample we took How likely is the true mean of the predicted response to be that?
Lecture 4 Page 31 CS 239, Spring 2007 An Example How long will eight loop iterations take? y = x *8 = 2.42 What is the 90% confidence interval for that prediction?
Lecture 4 Page 32 CS 239, Spring 2007 Predicting m Samples Standard deviation for mean of future sample of m observations at x p is Note deviation drops as m Variance minimal at x = Use t-quantiles with n–2 DOF for interval
Lecture 4 Page 33 CS 239, Spring 2007 Example of Confidence of Predictions Predicted time for single run of 8 loops? Time = (8) = 2.42 Standard deviation of errors s e = % interval is then
Lecture 4 Page 34 CS 239, Spring 2007 A Few Observations If you ran more tests, you’d predict a narrower confidence interval –Due to 1/m term Lowest confidence intervals closest to center of measured range –They widen as you get further out –Particularly beyond the range of what was actually measured
Lecture 4 Page 35 CS 239, Spring 2007 Regressions are based on assumptions: –Linear relationship between response y and predictor x –Or nonlinear relationship used to fit –Predictor x nonstochastic and error-free –Model errors statistically independent With distribution N(0,c) for constant c If these assumptions are violated, model misleading or invalid Verifying Assumptions Visually
Lecture 4 Page 36 CS 239, Spring 2007 How To Test For Validity? Statistical tests are possible But visual tests often helpful –And usually easier Basically, plot the data and look for obviously bogus assumptions
Lecture 4 Page 37 CS 239, Spring 2007 Testing Linearity Scatter plot x vs. y to see basic curve type LinearPiecewise Linear OutlierNonlinear (Power)
Lecture 4 Page 38 CS 239, Spring 2007 Testing Independence of Errors Scatter-plot i (errors) versus Should be no visible trend Example from our curve fit:
Lecture 4 Page 39 CS 239, Spring 2007 More Examples No obvious trend in errorsErrors appear to increase linearly with x value Suggests errors are not independent for this data set
Lecture 4 Page 40 CS 239, Spring 2007 More on Testing Independence May be useful to plot error residuals versus experiment number –In previous example, this gives same plot except for x scaling No foolproof tests And not all assumptions easily testable
Lecture 4 Page 41 CS 239, Spring 2007 Testing for Normal Errors Prepare quantile-quantile plot Example for our regression: Since plot is approximately linear, normality assumption looks OK
Lecture 4 Page 42 CS 239, Spring 2007 Testing for Constant Standard Deviation of Errors Property of constant standard deviation of errors is called homoscedasticity –Try saying that three times fast Look at previous error independence plot Look for trend in spread
Lecture 4 Page 43 CS 239, Spring 2007 Testing in Our Example No obvious trend in spread But we don’t have many points
Lecture 4 Page 44 CS 239, Spring 2007 Another Example Clear inverse trend of error magnitude vs. response Doesn’t display a constant standard deviation of errors In left part, stdev ~ 77 In right part, stdev ~ 33 No homoscedasticity, so linear regression not valid
Lecture 4 Page 45 CS 239, Spring 2007 So What Do You Do With Non- Homoscedastic Data? Spread of scatter plot of residual vs. predicted response is not homogeneous Then residuals are still functions of the predictor variables Transformation of response may solve the problem Transformations discussed in detail in book
Lecture 4 Page 46 CS 239, Spring 2007 Is Linear Regression Right For Your Data? Only if general trend of data is linear What if it isn’t? Can try fitting other types of curves instead Or can do a transformation to make it closer to linear
Lecture 4 Page 47 CS 239, Spring 2007 Transformations and Linear Regression For y = ae bx take logarithm of y, do regression on log(y) = b 0 +b 1 x, let b = b 1, For y = a+b log(x), take log of x before fitting parameters, let b = b 1, a = b 0 For y = ax b, take log of both x and y, let b = b 1,
Lecture 4 Page 48 CS 239, Spring 2007 Confidence Intervals for Nonlinear Regressions For nonlinear fits using exponential transformations: –Confidence intervals apply to transformed parameters –Not valid to perform inverse transformation on intervals
Lecture 4 Page 49 CS 239, Spring 2007 Linear Regression Can Be Misleading Regression throws away some information about the data –To allow more compact summarization Sometimes vital characteristics are thrown away –Often, looking at data plots can tell you whether you will have a problem
Lecture 4 Page 50 CS 239, Spring 2007 Example of Misleading Regression IIIIIIIV x y x y x y x y
Lecture 4 Page 51 CS 239, Spring 2007 What Does Regression Tell Us About These Data Sets? Exactly the same thing for each! N = 11 Mean of y = 7.5 y = x Standard error of regression is All the sums of squares are the same Correlation coefficient =.82 R 2 =.67
Lecture 4 Page 52 CS 239, Spring 2007 Now Look at the Data Plots III IIIIV
Lecture 4 Page 53 CS 239, Spring 2007 Other Regression Issues Multiple linear regression Categorical predictors Transformations Handling outliers Common mistakes in regression analysis
Lecture 4 Page 54 CS 239, Spring 2007 Multiple Linear Regression Models with more than one predictor variable But each predictor variable has a linear relationship to the response variable Conceptually, plotting a regression line in n-dimensional space, instead of 2- dimensional
Lecture 4 Page 55 CS 239, Spring 2007 Regression With Categorical Predictors Regression methods discussed so far assume numerical variables What if some of your variables are categorical in nature? Use techniques discussed later in the class if all predictors are categorical Levels - number of values a category can take
Lecture 4 Page 56 CS 239, Spring 2007 Handling Categorical Predictors If only two levels, define b i as follows –b i = 0 for first value –b i = 1 for second value Can use +1 and -1 as values, instead Need k-1 predictor variables for k levels –To avoid implying order in categories
Lecture 4 Page 57 CS 239, Spring 2007 Outliers Atypical observations might be outliers –Measurements that are not truly characteristic –By chance, several standard deviations out –Or mistakes might have been made in measurement Which leads to a problem Do you include outliers in analysis or not?
Lecture 4 Page 58 CS 239, Spring 2007 Handling Outliers 1. Find them (by looking at scatter plot) 2. Check carefully for experimental error 3. Repeat experiments at predictor values for the outlier 4. Decide whether to include or not include outliers Or do analysis both ways
Lecture 4 Page 59 CS 239, Spring 2007 Common Mistakes in Regression Generally based on taking shortcuts Or not being careful Or not understanding some fundamental principles of statistics
Lecture 4 Page 60 CS 239, Spring 2007 Not Verifying Linearity Draw the scatter plot If it isn’t linear, check for curvilinear possibilities Using linear regression when the relationship isn’t linear is misleading
Lecture 4 Page 61 CS 239, Spring 2007 Relying on Results Without Visual Verification Always check the scatter plot as part of regression –Examining the line regression predicts vs. the actual points Particularly important if regression is done automatically
Lecture 4 Page 62 CS 239, Spring 2007 Attaching Importance To Values of Parameters Numerical values of regression parameters depend on scale of predictor variables So just because a particular parameter’s value seems “small” or “large,” not necessarily an indication of importance E.g., converting seconds to microseconds doesn’t change anything fundamental –But magnitude of associated parameter changes
Lecture 4 Page 63 CS 239, Spring 2007 Not Specifying Confidence Intervals Samples of observations are random Thus, regression performed on them yields parameters with random properties Without a confidence interval, it’s impossible to understand what a parameter really means
Lecture 4 Page 64 CS 239, Spring 2007 Not Calculating Coefficient of Determination Without R 2, difficult to determine how much of variance is explained by the regression Even if R 2 looks good, safest to also perform an F-test The extra amount of effort isn’t that large, anyway
Lecture 4 Page 65 CS 239, Spring 2007 Using Coefficient of Correlation Improperly Coefficient of Determination is R 2 Coefficient of correlation is R R 2 gives percentage of variance explained by regression, not R E.g., if R is.5, R 2 is.25 –And the regression explains 25% of variance –Not 50%
Lecture 4 Page 66 CS 239, Spring 2007 Using Highly Correlated Predictor Variables If two predictor variables are highly correlated, using both degrades regression E.g., likely to be a correlation between an executable’s on-disk size and in-core size –So don’t use both as predictors of run time Which means you need to understand your predictor variables as well as possible
Lecture 4 Page 67 CS 239, Spring 2007 Using Regression Beyond Range of Observations Regression is based on observed behavior in a particular sample Most likely to predict accurately within range of that sample –Far outside the range, who knows? E.g., a run time regression on executables that are smaller than size of main memory may not predict performance of executables that require much VM activity
Lecture 4 Page 68 CS 239, Spring 2007 Using Too Many Predictor Variables Adding more predictors does not necessarily improve the model More likely to run into multicollinearity problems –Discussed in book –Interrelationship degrades quality of regression –Since one assumption is predictor independence So what variables to choose? –Subject of much of this course
Lecture 4 Page 69 CS 239, Spring 2007 Measuring Too Little of the Range Regression only predicts well near range of observations If you don’t measure the commonly used range, regression won’t predict much E.g., if many programs are bigger than main memory, only measuring those that are smaller is a mistake
Lecture 4 Page 70 CS 239, Spring 2007 Assuming Good Predictor Is a Good Controller Correlation isn’t necessarily control Just because variable A is related to variable B, you may not be able to control values of B by varying A E.g., if number of hits on a Web page and server bandwidth are correlated, you might not increase hits by increasing bandwidth Often, a goal of regression is finding control variables