Hypothesis testing and Estimation

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Objectives (BPS chapter 24)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
SIMPLE LINEAR REGRESSION
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression and Correlation
Correlation & Regression
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
CHAPTER 14 MULTIPLE REGRESSION
Linear Regression Hypothesis testing and Estimation.
Fitting Equations to Data. A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Hypothesis testing and Estimation
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Summary of the Statistics used in Multiple Regression.
Linear Regression Hypothesis testing and Estimation.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Stats Methods at IC Lecture 3: Regression.
The simple linear regression model and parameter estimation
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Multiple Regression.
Correlation and Simple Linear Regression
Hypothesis testing and Estimation
Comparing k Populations
Comparing k Populations
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Comparing k Populations
Correlation and Simple Linear Regression
Comparing k Populations
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Simple Linear Regression
Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Hypothesis testing and Estimation Linear Regression Hypothesis testing and Estimation

The equation for the least squares line Let

Computing Formulae:

Then the slope of the least squares line can be shown to be: This is an estimator of the slope, b, in the regression model

and the intercept of the least squares line can be shown to be: This is an estimator of the intercept, a, in the regression model

The residual sum of Squares Computing formula

Estimating s, the standard deviation in the regression model : Computing formula This estimate of s is said to be based on n – 2 degrees of freedom

Confidence limits for the slope

(1 – a)100% Confidence Limits for slope b : ta/2 critical value for the t-distribution with n – 2 degrees of freedom

Testing for the slope

Testing the slope The test statistic is: - has a t distribution with df = n – 2 if H0 is true.

The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible

Testing for correlation

Recall: Let (x1,y1), (x2,y2), (x3,y3), … , (xn,yn) denote n observations on the variables X and Y. Then = Pearson’s regression coefficient

The test for zero correlation The test statistic is: - has a t distribution with df = n – 2 if H0 is true.

The Critical Region Reject df = n – 2 This is a two tailed tests. One tailed tests are also possible

The test for independence is equivalent to the test for zero slope Comment The test for independence is equivalent to the test for zero slope The test for zero slope The test for independence

The test for zero slope

= the test statistic for independence

Confidence limits for the intercept

(1 – a)100% Confidence Limits for intercept a : ta/2 critical value for the t-distribution with n – 2 degrees of freedom

Testing for the intercept

Testing the intercept The test statistic is: - has a t distribution with df = n – 2 if H0 is true.

The Critical Region Reject df = n – 2

(1- a)100% Confidence Limits for a + b x0 : ta/2 is the a/2 critical value for the t-distribution with n - 2 degrees of freedom

(1- a)100% Prediction Limits for y when x = x0: ta/2 is the a/2 critical value for the t-distribution with n - 2 degrees of freedom

The Multiple Linear Regression Model

Again we assume that we have a single dependent variable Y and p (say) independent variables X1, X2, X3, ... , Xp.   The equation (model) that generally describes the relationship between Y and the Independent variables is of the form: Y = f(X1, X2,... ,Xp | q1, q2, ... , qq) + e where q1, q2, ... , qq are unknown parameters of the function f and e is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation s).

In Multiple Linear Regression we assume the following model   Y = b0 + b1 X1 + b2 X2 + ... + bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation s.

The importance of the Linear model 1.     It is the simplest form of a model in which each dependent variable has some effect on the independent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.

In many instance a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.

3.     Many non-Linear models can be put into the form of a Linear model by appropriately transformation the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)

An Example The following data comes from an experiment that was interested in investigating the source from which corn plants in various soils obtain their phosphorous. The concentration of inorganic phosphorous (X1) and the concentration of organic phosphorous (X2) was measured in the soil of n = 18 test plots. In addition the phosphorous content (Y) of corn grown in the soil was also measured. The data is displayed below:

Inorganic Phosphorous X1 Organic X2 Plant Available Y 0.4 53 64 12.6 58 51 23 60 10.9 37 76 3.1 19 71 23.1 46 96 0.6 34 61 50 77 4.7 24 54 21.6 44 93 1.7 65 56 95 9.4 81 1.9 36 10.1 31 26.8 168 11.6 29 29.9 99

Coefficients Intercept 56.2510241 (b0) X1 1.78977412 (b1) X2   Coefficients Intercept 56.2510241 (b0) X1 1.78977412 (b1) X2 0.08664925 (b2) Equation: Y = 56.2510241 + 1.78977412 X1 + 0.08664925 X2

Summary of the Statistics used in Multiple Regression

The Least Squares Estimates: - the values that minimize

The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SSTotal) b) Residual Sum of Squares (SSError) c) Regression Sum of Squares (SSReg) Note: i.e. SSTotal = SSReg +SSError  

The Analysis of Variance Table Source Sum of Squares d.f. Mean Square F Regression SSReg p SSReg/p = MSReg MSReg/s2 Error SSError n-p-1 SSError/(n-p-1) =MSError = s2 Total SSTotal n-1

Uses of the ANOVA table: 1. To estimate s2 (the error variance). - Use s2 = MSError to estimate s2. To test the Hypothesis H0: b1 = b2= ... = bp = 0. Use the test statistic - Reject H0 if F > Fa(p,n-p-1).

3. To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X1, X2, ... ,Xp (the independent variables). a) R2 = the coefficient of determination = SSReg/SSTotal = = the proportion of variance in Y explained by X1, X2, ... ,Xp 1 - R2 = the proportion of variance in Y that is left unexplained by X1, X2, ... , Xp = SSError/SSTotal.

b) Ra2 = "R2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X1, X2,... , Xp adjusted for d.f.]

c). R= ÖR2 = the Multiple correlation coefficient of Y with X1, X2, c) R= ÖR2 = the Multiple correlation coefficient of Y with X1, X2, ... ,Xp = = the maximum correlation between Y and a linear combination of X1, X2, ... ,Xp Comment: The statistics F, R2, Ra2 and R are equivalent statistics.

Using Statistical Packages To perform Multiple Regression

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS

Example The example we will use to illustrate Multiple regression looks at n = 392 different automobiles. The variables measured are: mpg – mileage (The dependent variable Y) engine – engine size (Independent variable X1) horse - horsepower (Independent variable X2) weight (Independent variable X3) The objective will be to determine how the dependent variable Y (mileage), depends on the independent variables – engine size , horsepower and weight.

After starting the SSPS program the following dialogue box appears:

If you select Opening an existing file and press OK the following dialogue box appears

The following dialogue box appears:

If the variable names are in the file ask it to read the names If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range: Once you “click OK”, two windows will appear

One that will contain the output:

The other containing the data:

To perform any statistical Analysis select the Analyze menu:

Then select Regression and Linear.

The following Regression dialogue box appears

Select the Dependent variable Y.

Select the Independent variables X1, X2, etc.

If you select the Method - Enter.

All variables will be put into the equation. There are also several other methods that can be used : Forward selection Backward Elimination Stepwise Regression

Once the dependent variable, the independent variables and the Method have been selected if you press OK, the Analysis will be performed.

The output will contain the following table R2 and R2 adjusted measures the proportion of variance in Y that is explained by X1, X2, X3, etc (67.6% and 67.3%) R is the Multiple correlation coefficient (the maximum correlation between Y and a linear combination of X1, X2, X3, etc)

The next table is the Analysis of Variance Table The F test is testing if the regression coefficients of the predictor variables are all zero. Namely none of the independent variables X1, X2, X3, etc have any effect on Y

The final table in the output Gives the estimates of the regression coefficients, there standard error and the t test for testing if they are zero Note: Engine size has no significant effect on Mileage

The estimated equation from the table below: Is:

Note the equation is: Mileage decreases with: With increases in Engine Size (not significant, p = 0.432) With increases in Horsepower (significant, p = 0.000) With increases in Weight (significant, p = 0.000)