Other Regression Models Andy Wang CIS 5930-03 Computer Systems Performance Analysis.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Polynomial Regression and Transformations STA 671 Summer 2008.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
LSRLs: Interpreting r vs. r2
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 4-1 © 2006 by Prentice Hall, Inc., Upper Saddle River, NJ Chapter 4 RegressionModels.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs., April 8th
Lecture 4 Page 1 CS 239, Spring 2007 Models and Linear Regression CS 239 Experimental Methodologies for System Software Peter Reiher April 12, 2007.
Chapter 11 Multiple Regression.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Inference for regression - Simple linear regression
Correlation and Linear Regression
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Simple Linear Regression Models
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Data Analysis Overview Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Workload parameters System Config.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
ANOVA, Regression and Multiple Regression March
ChE 452 Lecture 07 Statistical Tests Of Rate Equations 1.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Multiple Linear Regression
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Stats Methods at IC Lecture 3: Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Inference for Least Squares Lines
Other Regression Models
Linear Regression Models
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Prepared by Lee Revere and John Large
Replicated Binary Designs
Regression Forecasting and Model Building
One-Factor Experiments
Presentation transcript:

Other Regression Models Andy Wang CIS Computer Systems Performance Analysis

2 Regression With Categorical Predictors Regression methods discussed so far assume numerical variables What if some of your variables are categorical in nature? If all are categorical, use techniques discussed later in the course Levels - number of values a category can take

3 Handling Categorical Predictors If only two levels, define b i as follows –b i = 0 for first value –b i = 1 for second value This definition is missing from book in section 15.2 Can use +1 and -1 as values, instead Need k-1 predictor variables for k levels –To avoid implying order in categories

4 Categorical Variables Example Which is a better predictor of a high rating in the movie database, winning an Oscar,winning the Golden Palm at Cannes, or winning the New York Critics Circle?

5 Choosing Variables Categories are not mutually exclusive x 1 = 1 if Oscar 0 if otherwise x 2 = 1 if Golden Palm 0 if otherwise x 3 = 1 if Critics Circle Award 0 if otherwise y = b 0 +b 1 x 1 +b 2 x 2 +b 3 x 3

6 A Few Data Points TitleRatingOscarPalmNYC Gentleman’s Agreement7.5XX Mutiny on the Bounty7.6X Marty7.4XXX If7.8X La Dolce Vita8.1X Kagemusha8.2X The Defiant Ones7.5X Reds6.6X High Noon8.1X

7 And Regression Says... How good is that? R 2 is 34% of variation –Better than age and length –But still no great shakes Are regression parameters significant at 90% level?

8 Curvilinear Regression Linear regression assumes a linear relationship between predictor and response What if it isn’t linear? You need to fit some other type of function to the relationship

9 When To Use Curvilinear Regression Easiest to tell by sight Make a scatter plot –If plot looks non-linear, try curvilinear regression Or if non-linear relationship is suspected for other reasons Relationship should be convertible to a linear form

10 Types of Curvilinear Regression Many possible types, based on a variety of relationships: – Many others

11 Transform Them to Linear Forms Apply logarithms, multiplication, division, whatever to produce something in linear form I.e., y = a + b*something Or a similar form If predictor appears in more than one transformed predictor variable, correlation likely!

12 Sample Transformations For y = ae bx, take logarithm of y –ln(y) = ln(a) + bx –y’ = ln(y), b 0 = ln(a), b 1 = b –Do regression on y’ = b 0 +b 1 x For y = a+b ln(x), –x’ = e x –Do regression on y = a + bln(x’)

13 Sample Transformations For y = ax b, take log of both x and y –ln(y) = ln(a) + bln(x) –y’ = ln(y), b 0 = ln(a), b 1 = b, x’ = e x –Do regression on y’ = b 0 + b 1 ln(x’)

14 Corrections to Jain p. 257

15 General Transformations Use some function of response variable y in place of y itself Curvilinear regression is one example But techniques are more generally applicable

16 When To Transform? If known properties of measured system suggest it If data’s range covers several orders of magnitude If homogeneous variance assumption of residuals (homoscedasticity) is violated

17 Transforming Due To Homoscedasticity If spread of scatter plot of residual vs. predicted response isn’t homogeneous, Then residuals are still functions of the predictor variables Transformation of response may solve the problem

18 What Transformation To Use? Compute standard deviation of residuals at each y_hat –Assume multiple residuals at each predicted value Plot as function of mean of observations –Assuming multiple experiments for single set of predictor values Check for linearity: if linear, use a log transform

19 Other Tests for Transformations If variance against mean of observations is linear, use square-root transform If standard deviation against mean squared is linear, use inverse (1/y) transform If standard deviation against mean to a power is linear, use power transform More covered in the book

20 General Transformation Principle For some observed relation between standard deviation and mean, let transform to and regress on w

21 Example: Log Transformation If standard deviation against mean is linear, then So

22 Confidence Intervals for Nonlinear Regressions For nonlinear fits using general (e.g., exponential) transformations: –Confidence intervals apply to transformed parameters –Not valid to perform inverse transformation on intervals –Must express confidence intervals in transformed domain

23 Outliers Atypical observations might be outliers –Measurements that are not truly characteristic –By chance, several standard deviations out –Or mistakes might have been made in measurement Which leads to a problem: Do you include outliers in analysis or not?

24 Deciding How To Handle Outliers 1. Find them (by looking at scatter plot) 2. Check carefully for experimental error 3. Repeat experiments at predictor values for each outlier 4. Decide whether to include or omit outliers –Or do analysis both ways Question: Is first point in last lecture’s example an outlier on rating vs. age plot?

25 Rating vs. Age

26 Common Mistakes in Regression Generally based on taking shortcuts Or not being careful Or not understanding some fundamental principle of statistics

27 Not Verifying Linearity Draw the scatter plot If it’s not linear, check for curvilinear possibilities Misleading to use linear regression when relationship isn’t linear

28 Relying on Results Without Visual Verification Always check scatter plot as part of regression –Examine predicted line vs. actual points Particularly important if regression is done automatically

29 Some Nonlinear Examples

30 Attaching Importance To Values of Parameters Numerical values of regression parameters depend on scale of predictor variables So just because a particular parameter’s value seems “small” or “large,” not necessarily an indication of importance E.g., converting seconds to microseconds doesn’t change anything fundamental –But magnitude of associated parameter changes

31 Not Specifying Confidence Intervals Samples of observations are random Thus, regression yields parameters with random properties Without confidence interval, impossible to understand what a parameter really means

32 Not Calculating Coefficient of Determination Without R 2, difficult to determine how much of variance is explained by the regression Even if R 2 looks good, safest to also perform an F-test Not that much extra effort

33 Using Coefficient of Correlation Improperly Coefficient of determination is R 2 Coefficient of correlation is R R 2 gives percentage of variance explained by regression, not R E.g., if R is.5, R 2 is.25 –And regression explains 25% of variance –Not 50%!

34 Using Highly Correlated Predictor Variables If two predictor variables are highly correlated, using both degrades regression E.g., likely to be correlation between an executable’s on-disk and in-core sizes –So don’t use both as predictors of run time Means you need to understand your predictor variables as well as possible

35 Using Regression Beyond Range of Observations Regression is based on observed behavior in a particular sample Most likely to predict accurately within range of that sample –Far outside the range, who knows? E.g., regression on run time of executables smaller than size of main memory may not predict performance of executables that need VM activity

36 Using Too Many Predictor Variables Adding more predictors does not necessarily improve model! More likely to run into multicollinearity problems So what variables to choose? –Subject of much of this course

37 Measuring Too Little of the Range Regression only predicts well near range of observations If you don’t measure commonly used range, regression won’t predict much E.g., if many programs are bigger than main memory, only measuring those that are smaller is a mistake

38 Assuming Good Predictor Is a Good Controller Correlation isn’t necessarily control Just because variable A is related to variable B, you may not be able to control values of B by varying A E.g., if number of hits on a Web page correlated to server bandwidth, but might not boost hits by increasing bandwidth Often, a goal of regression is finding control variables

White Slide