Linear Regression Hypothesis testing and Estimation.

Slides:

Advertisements

Similar presentations

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Advertisements

Inference for Regression

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

11 Simple Linear Regression and Correlation CHAPTER OUTLINE

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The General Linear Model. The Simple Linear Model Linear Regression.

1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.

© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.

Chapter 12 Multiple Regression

The Simple Regression Model

SIMPLE LINEAR REGRESSION

Linear Regression and Correlation Analysis

REGRESSION AND CORRELATION

Inferences About Process Quality

SIMPLE LINEAR REGRESSION

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Correlation and Regression Analysis

Correlation & Regression

Regression and Correlation Methods Judy Zhong Ph.D.

SIMPLE LINEAR REGRESSION

Introduction to Linear Regression and Correlation Analysis

Inference for regression - Simple linear regression

Simple Linear Regression Models

Correlation and Regression

Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.

Applications The General Linear Model. Transformations.

1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

CHAPTER 14 MULTIPLE REGRESSION

Introduction to Linear Regression

Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.

Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.

Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?

Fitting Equations to Data. A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent.

1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.

Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

Hypothesis testing and Estimation

Correlation & Regression Analysis

Chapter 8: Simple Linear Regression Yang Zhenlin.

Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.

The Simple Linear Regression Model. Estimators in Simple Linear Regression and.

Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.

Correlation. The statistic: Definition is called Pearsons correlation coefficient.

Summary of the Statistics used in Multiple Regression.

Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.

Linear Regression Hypothesis testing and Estimation.

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Comparing k Populations Means – One way Analysis of Variance (ANOVA)

The simple linear regression model and parameter estimation

Regression and Correlation

Regression Analysis AGEC 784.

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Multivariate Data.

Logistic Regression.

Chapter 11: Simple Linear Regression

Hypothesis testing and Estimation

Comparing k Populations

Comparing k Populations

Hypothesis testing and Estimation

Comparing k Populations

Simple Linear Regression

SIMPLE LINEAR REGRESSION

Product moment correlation

SIMPLE LINEAR REGRESSION

Presentation transcript:

Linear Regression Hypothesis testing and Estimation

Assume that we have collected data on two variables X and Y. Let ( x 1, y 1 ) ( x 2, y 2 ) ( x 3, y 3 ) … ( x n, y n ) denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)

The Statistical Model

Each y i is assumed to be randomly generated from a normal distribution with mean  i =  +  x i and standard deviation . ( ,  and  are unknown) yiyi  +  x i  xixi Y =  +  X slope =  

The Data The Linear Regression Model The data falls roughly about a straight line. Y =  +  X unseen

The Least Squares Line Fitting the best straight line to “linear” data

Let Y = a + b X denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = x i (as for the i th case) then the predicted value of Y is:

The residual can be computed for each case in the sample, The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data

The optimal choice of a and b will result in the residual sum of squares attaining a minimum. If this is the case than the line: Y = a + bX is called the Least Squares Line

The equation for the least squares line Let

Linear Regression Hypothesis testing and Estimation

The Least Squares Line Fitting the best straight line to “linear” data

Computing Formulae:

Then the slope of the least squares line can be shown to be:

and the intercept of the least squares line can be shown to be:

The residual sum of Squares Computing formula

Estimating , the standard deviation in the regression model : This estimate of  is said to be based on n – 2 degrees of freedom Computing formula

Sampling distributions of the estimators

The sampling distribution slope of the least squares line : It can be shown that b has a normal distribution with mean and standard deviation

Thus has a standard normal distribution, and has a t distribution with df = n - 2

(1 –  )100% Confidence Limits for slope  : t  /2 critical value for the t-distribution with n – 2 degrees of freedom

Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.

The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible

The sampling distribution intercept of the least squares line : It can be shown that a has a normal distribution with mean and standard deviation

Thus has a standard normal distribution and has a t distribution with df = n - 2

(1 –  )100% Confidence Limits for intercept  : t  /2 critical value for the t-distribution with n – 2 degrees of freedom

Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.

The Critical Region Reject df = n – 2

Example

The following data showed the per capita consumption of cigarettes per month (X) in various countries in 1930, and the death rates from lung cancer for men in TABLE : Per capita consumption of cigarettes per month (X i ) in n = 11 countries in 1930, and the death rates, Y i (per 100,000), from lung cancer for men in Country (i)X i Y i Australia4818 Canada5015 Denmark3817 Finland11035 Great Britain11046 Holland4924 Iceland236 Norway259 Sweden3011 Switzerland5125 USA13020

Fitting the Least Squares Line

First compute the following three quantities:

Computing Estimate of Slope (  ), Intercept (  ) and standard deviation (  ),

95% Confidence Limits for slope  : t.025 = critical value for the t-distribution with 9 degrees of freedom to

95% Confidence Limits for intercept  : to t.025 = critical value for the t-distribution with 9 degrees of freedom

Y = (0.228)X 95% confidence Limits for slope to % confidence Limits for intercept to 17.85

Testing the positive slope The test statistic is:

The Critical Region Reject df = 11 – 2 = 9 A one tailed test

and conclude we reject

Confidence Limits for Points on the Regression Line The intercept  is a specific point on the regression line. It is the y – coordinate of the point on the regression line when x = 0. It is the predicted value of y when x = 0. We may also be interested in other points on the regression line. e.g. when x = x 0 In this case the y – coordinate of the point on the regression line when x = x 0 is  +  x 0

x0x0  +  x 0 y =  +  x

(1-  )100% Confidence Limits for  +  x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

Prediction Limits for new values of the Dependent variable y An important application of the regression line is prediction. Knowing the value of x (x 0 ) what is the value of y? The predicted value of y when x = x 0 is: This in turn can be estimated by:.

The predictor Gives only a single value for y. A more appropriate piece of information would be a range of values. A range of values that has a fixed probability of capturing the value for y. A (1-  )100% prediction interval for y.

(1-  )100% Prediction Limits for y when x = x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

Example In this example we are studying building fires in a city and interested in the relationship between: 1. X = the distance of the closest fire hall and the building that puts out the alarm and 2. Y = cost of the damage (1000$) The data was collected on n = 15 fires.

The Data

Scatter Plot

Computations

Computations Continued

95% Confidence Limits for slope  : t.025 = critical value for the t-distribution with 13 degrees of freedom 4.07 to 5.77

95% Confidence Limits for intercept  : 7.21 to t.025 = critical value for the t-distribution with 13 degrees of freedom

Least Squares Line y=4.92x+10.28

(1-  )100% Confidence Limits for  +  x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

95% Confidence Limits for  +  x 0 :

95% Confidence Limits for  +  x 0 Confidence limits

(1-  )100% Prediction Limits for y when x = x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

95% Prediction Limits for y when x = x 0

95% Prediction Limits for y when x =  x 0 Prediction limits

Linear Regression Summary Hypothesis testing and Estimation

(1 –  )100% Confidence Limits for slope  : t  /2 critical value for the t-distribution with n – 2 degrees of freedom

Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.

(1 –  )100% Confidence Limits for intercept  : t  /2 critical value for the t-distribution with n – 2 degrees of freedom

Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H 0 is true.

(1-  )100% Confidence Limits for  +  x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

(1-  )100% Prediction Limits for y when x = x 0 : t  /2 is the  /2 critical value for the t-distribution with n - 2 degrees of freedom

Correlation

The statistic: Definition is called Pearsons correlation coefficient

1.-1 ≤ r ≤ 1, |r| ≤ 1, r 2 ≤ 1 2.|r| = 1 (r = +1 or -1) if the points (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) lie along a straight line. (positive slope for +1, negative slope for -1) Properties

The test for independence (zero correlation) The test statistic: Reject H 0 if |t| > t a/2 (df = n – 2) H 0 : X and Y are independent H A : X and Y are correlated The Critical region This is a two-tailed critical region, the critical region could also be one-tailed

Example In this example we are studying building fires in a city and interested in the relationship between: 1. X = the distance of the closest fire hall and the building that puts out the alarm and 2. Y = cost of the damage (1000$) The data was collected on n = 15 fires.

The Data

Scatter Plot

Computations

Computations Continued

The correlation coefficient The test for independence (zero correlation) The test statistic: We reject H 0 : independence, if |t| > t = H 0 : independence, is rejected

Relationship between Regression and Correlation

Recall Also since Thus the slope of the least squares line is simply the ratio of the standard deviations × the correlation coefficient

The test for independence (zero correlation) Uses the test statistic: H 0 : X and Y are independent H A : X and Y are correlated Note: and

1.The test for independence (zero correlation) H 0 : X and Y are independent H A : X and Y are correlated are equivalent The two tests 2.The test for zero slope H 0 :  = 0. H A :  ≠ 0

1.the test statistic for independence:

Regression (in general)

In many experiments we would have collected data on a single variable Y (the dependent variable ) and on p (say) other variables X 1, X 2, X 3,..., X p (the independent variables). One is interested in determining a model that describes the relationship between Y (the response (dependent) variable) and X 1, X 2, …, X p (the predictor (independent) variables. This model can be used for –Prediction –Controlling Y by manipulating X 1, X 2, …, X p

The Model: is an equation of the form Y = f(X 1, X 2,...,X p |  1,  2,...,  q ) +  where  1,  2,...,  q are unknown parameters of the function f and  is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation .

Examples: 1. Y = Blood Pressure, X = age The model Y =  +  X +  thus  1 =  and  2 = . This model is called: the simple Linear Regression Model Y =  +  X

2. Y = average of five best times for running the 100m, X = the year The model Y =  e -  X +   thus  1 =  2 =  and  2 = . This model is called: the exponential Regression Model Y =  e -  X + 

2. Y = gas mileage ( mpg) of a car brand X 1 = engine size X 2 = horsepower X 3 = weight The model Y =  0 +  1 X 1 +  2 X 2 +  3 X 3 + . This model is called: the Multiple Linear Regression Model

The Multiple Linear Regression Model

In Multiple Linear Regression we assume the following model Y =  0 +  1 X 1 +  2 X  p X p +  This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where  0,  1,  2,...,  p are unknown parameters and  is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation .

The importance of the Linear model 1. It is the simplest form of a model in which each dependent variable has some effect on the independent variable Y. –When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. –The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.

2.In many instance a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. –This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.

3. Many non-Linear models can be Linearized (put into the form of a Linear model by appropriately transformation the dependent variables and/or any or all of the independent variables.) –This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non- linear models are linearizable.)

An Example The following data comes from an experiment that was interested in investigating the source from which corn plants in various soils obtain their phosphorous. –The concentration of inorganic phosphorous (X 1 ) and the concentration of organic phosphorous (X 2 ) was measured in the soil of n = 18 test plots. –In addition the phosphorous content (Y) of corn grown in the soil was also measured. The data is displayed below:

Inorganic Phosphorous X 1 Organic Phosphorous X 2 Plant Available Phosphorous Y Inorganic Phosphorous X 1 Organic Phosphorous X 2 Plant Available Phosphorous Y

Coefficients Intercept (  0 ) X1X (  1 ) X2X (  2 ) Equation: Y = X X 2

The Multiple Linear Regression Model

In Multiple Linear Regression we assume the following model Y =  0 +  1 X 1 +  2 X  p X p +  This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where  0,  1,  2,...,  p are unknown parameters and  is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation .

Summary of the Statistics used in Multiple Regression

The Least Squares Estimates: - the values that minimize

The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SS Total ) b) Residual Sum of Squares (SS Error ) c) Regression Sum of Squares (SS Reg ) Note: i.e. SS Total = SS Reg +SS Error

The Analysis of Variance Table SourceSum of Squaresd.f.Mean SquareF RegressionSS Reg pSS Reg /p = MS Reg MS Reg /s 2 ErrorSS Error n-p-1SS Error /(n-p-1) =MS Error = s 2 TotalSS Total n-1

Uses: 1.To estimate  2 (the error variance). - Use s 2 = MS Error to estimate  2. 2.To test the Hypothesis H 0 :  1 =  2 =...  =  p = 0. Use the test statistic - Reject H 0 if F > F  (p,n-p-1).

3.To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X 1, X 2,...,X p (the independent variables). a)R 2 = the coefficient of determination = SS Reg /SS Total = = the proportion of variance in Y explained by X 1, X 2,...,X p 1 - R 2 = the proportion of variance in Y that is left unexplained by X 1, X2,..., X p = SS Error /SS Total.

b)R a 2 = "R 2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X 1, X 2,..., X p adjusted for d.f.]

c) R=  R 2 = the Multiple correlation coefficient of Y with X 1, X 2,...,X p = = the maximum correlation between Y and a linear combination of X 1, X 2,...,X p Comment: The statistics F, R 2, R a 2 and R are equivalent statistics.

Using Statistical Packages To perform Multiple Regression

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS

After starting the SSPS program the following dialogue box appears:

If you select Opening an existing file and press OK the following dialogue box appears

The following dialogue box appears:

If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range: Once you “click OK”, two windows will appear

One that will contain the output:

The other containing the data:

To perform any statistical Analysis select the Analyze menu:

Then select Regression and Linear.

The following Regression dialogue box appears

Select the Dependent variable Y.

Select the Independent variables X 1, X 2, etc.

If you select the Method - Enter.

All variables will be put into the equation. There are also several other methods that can be used : 1.Forward selection 2.Backward Elimination 3.Stepwise Regression

Forward selection 1.This method starts with no variables in the equation 2.Carries out statistical tests on variables not in the equation to see which have a significant effect on the dependent variable. 3.Adds the most significant. 4.Continues until all variables not in the equation have no significant effect on the dependent variable.

Backward Elimination 1.This method starts with all variables in the equation 2.Carries out statistical tests on variables in the equation to see which have no significant effect on the dependent variable. 3.Deletes the least significant. 4.Continues until all variables in the equation have a significant effect on the dependent variable.

Stepwise Regression (uses both forward and backward techniques) 1.This method starts with no variables in the equation 2.Carries out statistical tests on variables not in the equation to see which have a significant effect on the dependent variable. 3.It then adds the most significant. 4.After a variable is added it checks to see if any variables added earlier can now be deleted. 5.Continues until all variables not in the equation have no significant effect on the dependent variable.

All of these methods are procedures for attempting to find the best equation The best equation is the equation that is the simplest (not containing variables that are not important) yet adequate (containing variables that are important)

Once the dependent variable, the independent variables and the Method have been selected if you press OK, the Analysis will be performed.

The output will contain the following table R 2 and R 2 adjusted measures the proportion of variance in Y that is explained by X 1, X 2, X 3, etc (67.6% and 67.3%) R is the Multiple correlation coefficient (the maximum correlation between Y and a linear combination of X 1, X 2, X 3, etc)

The next table is the Analysis of Variance Table The F test is testing if the regression coefficients of the predictor variables are all zero. Namely none of the independent variables X 1, X 2, X 3, etc have any effect on Y

The final table in the output Gives the estimates of the regression coefficients, there standard error and the t test for testing if they are zero Note: Engine size has no significant effect on Mileage

The estimated equation from the table below: Is:

Note the equation is: Mileage decreases with: 1.With increases in Engine Size (not significant, p = 0.432) With increases in Horsepower (significant, p = 0.000) With increases in Weight (significant, p = 0.000)

Logistic regression

Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y =  0 +  1 x 1 +  2 x 2 + … + +  p x p +  Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x 1, x 2, …, x p.

Now suppose the dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) This is the situation in which Logistic Regression is used We are interested in predicting a y from a continuous dependent variable x.

Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum

The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. The ratio: is called the odds ratio This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

Example: odds ratio, log odds ratio Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

The logisitic Regression Model i. e. : In terms of the odds ratio Assumes the log odds ratio is linearly related to x.

The logisitic Regression Model or Solving for p in terms x.

Interpretation of the parameter  0 (determines the intercept) p x

Interpretation of the parameter  1 (determines when p is 0.50 (along with  0 )) p x when

Also when is the rate of increase in p with respect to x when p = 0.50

Interpretation of the parameter  1 (determines slope when p is 0.50 ) p x

The data The data will for each case consist of 1.a value for x, the continuous independent variable 2.a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

Estimation of the parameters The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

Using SPSS to perform Logistic regression Open the data file:

Choose from the menu: Analyze -> Regression -> Binary Logistic

The following dialogue box appears Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

Here is the output The Estimates and their S.E.

The parameter Estimates

Interpretation of the parameter  0 (determines the intercept) Interpretation of the parameter  1 (determines when p is 0.50 (along with  0 ))

Another interpretation of the parameter  1 is the rate of increase in p with respect to x when p = 0.50

The dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) The Logistic Regression Model We are interested in predicting a y from a continuous dependent variable x.

The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. The ratio: is called the odds ratio This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

The logisitic Regression Model i. e. : In terms of the odds ratio Assumes the log odds ratio is linearly related to x.

The logisitic Regression Model In terms of p

The graph of p vs x p x

The Multiple Logistic Regression model

Here we attempt to predict the outcome of a binary response variable Y from several independent variables X 1, X 2, … etc

Multiple Logistic Regression an example In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X 1 = gestational Age and X 2 = Birthweight

For n = 223 infants in prenatal ward the following measurements were determined 1.X 1 = gestational Age (weeks), 2.X 2 = Birth weight (grams) and 3.Y = presence of BPD

The data

The results

Graph: Showing Risk of BPD vs GA and BrthWt

Non-Parametric Statistics