Fitting Equations to Data. A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 11 Multiple Regression.
Multiple Linear Regression
Multiple Regression and Correlation Analysis
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Objectives of Multiple Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Chapter 13: Inference in Regression
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Applications The General Linear Model. Transformations.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent.
CHAPTER 14 MULTIPLE REGRESSION
Linear Regression Hypothesis testing and Estimation.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 13 Multiple Regression
Discussion of time series and panel models
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Hypothesis testing and Estimation
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
The p-value approach to Hypothesis Testing
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Summary of the Statistics used in Multiple Regression.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 14 Introduction to Multiple Regression
Hypothesis testing and Estimation
Comparing k Populations
Comparing k Populations
Prepared by Lee Revere and John Large
Hypothesis testing and Estimation
Comparing k Populations
Simple Linear Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Fitting Equations to Data

A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent variables, X 1, X 2, X 3,... (also continuous numerical, although there are techniques that allow you to handle categorical independent variables). The objective will be to “fit” an equation to the data collected on these measurements that explains the dependence of Y on X 1, X 2, X 3,...

What is the value of these equations?

Equations give very precise and concise descriptions (models) of data explaining how dependent variables are related to independent variables.

Examples Linear models Y= Blood Pressure, X = age Y =  X +  +  Exponential growth or decay models Y = Average of 5 best times for the 100m during an Olympic year, X = the Olympic year. +  Another growth model. (The Gompertz model) Y = size of a cancerous tumor, X = time after implantation.

Note: the presence of the random error term, , (random noise). This is a important term in any statistical model. Without this term the model is deterministic and doesn’t require the statistical analysis

What is the value of these equations? 1.Equations give very precise and concise descriptions (models) of data and how dependent variables are related to independent variables. 2.The parameters of the equations usually have very useful interpretations relative to the phenomena that is being studied. 3.The equations can be used to calculate and estimate very useful quantities related to phenomena. Relative extrema, future or out-of-range values of the phenomena 4.Equations can provide the framework for comparison.

The Multiple Linear Regression Model An important statistical model

Again we assume that we have a single dependent variable Y and p (say) independent variables X 1, X 2, X 3,..., X p. The equation (model) that generally describes the relationship between Y and the Independent variables is of the form: Y = f(X 1, X 2,...,X p |  1,  2,...,  q ) +  where  1,  2,...,  q are unknown parameters of the function f and  is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation .

In Multiple Linear Regression we assume the following model Y =  0 +  1 X 1 +  2 X  p X p +  This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where  0,  1,  2,...,  p are unknown parameters and  is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation .

The importance of the Linear model 1. It is the simplest form of a model in which each independent variable has some effect on the.dependent variable Y. When fitting models to data one tries to find the simplest form of a model that still adequately describes the relationship between the dependent variable and the independent variables. The linear model is sometimes the first model to be fitted and only abandoned if it turns out to be inadequate.

2.In many instances a linear model is the most appropriate model to describe the dependence relationship between the dependent variable and the independent variables. This will be true if the dependent variable increases at a constant rate as any or the independent variables is increased while holding the other independent variables constant.

3. Many non-Linear models can be put into the form of a Linear model by appropriately transforming the dependent variables and/or any or all of the independent variables. This important fact ensures the wide utility of the Linear model. (i.e. the fact the many non-linear models are linearizable.)

An Example The following data comes from an experiment that was interested in investigating the source from which corn plants in various soils obtain their phosphorous. The concentration of inorganic phosphorous (X 1 ) and the concentration of organic phosphorous (X 2 ) was measured in the soil of n = 18 test plots. In addition the phosphorous content (Y) of corn grown in the soil was also measured. The data is displayed below:

Inorganic Phosphorous X 1 Organic Phosphorous X 2 Plant Available Phosphorous Y Inorganic Phosphorous X 1 Organic Phosphorous X 2 Plant Available Phosphorous Y

Coefficients Intercept (  0 ) X1X (  1 ) X2X (  2 ) Equation: Y = X X 2

Summary of the Statistics used in Multiple Regression

The Least Squares Estimates: - The values that minimize Note: = predicted value of y i

The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SS Total ) b) Residual Sum of Squares (SS Error ) c) Regression Sum of Squares (SS Reg ) Note: i.e. SS Total = SS Reg +SS Error

The Analysis of Variance Table Source Sum of Squares d.f.Mean SquareF Regression SS Reg pSS Reg /p = MS Reg MS Reg /s 2 ErrorSS Error n-p-1SS Error /(n-p-1) =MS Error = s 2 TotalSS Total n-1

Uses: 1.To estimate  2 (the error variance). - Use s 2 = MSError to estimate  2. 2.To test the Hypothesis H 0 :  1 =  1 =  2 =...  =  p = 0. Use the test statistic F = MS Reg / s 2 = [(1/p)SS Reg ]/[(1/(n-p-1))SS Error ]. - Reject H 0 if F > F a (p,n-p-1).

3.To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X 1, X 2,...,X p (the independent variables). a)R 2 = the coefficient of determination = SS Reg /SS Total = = the proportion of variance in Y explained by X 1, X2,...,X p 1 - R 2 = the proportion of variance in Y that is left unexplained by X 1, X2,..., X p = SSError/SSTotal.

b)R a 2 = "R 2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X 1, X 2,..., X p adjusted for d.f.] = 1 - [(1/(n-p-1))SS Error ]/[(1/(n-1))SS Total ]. = 1 - [(n-1)SS Error ]/[(n-p-1)SS Total ]. = 1 - [(n-1)/(n-p-1)] [1 - R 2 ].

c) R=  R 2 = the Multiple correlation coefficient of Y with X 1, X 2,...,X p = = the maximum correlation between Y and a linear combination of X 1, X 2,...,X p Comment: The statistics F, R 2, R a 2 and R are equivalent statistics.

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS

After starting the SSPS program the following dialogue box appears:

If you select Opening an existing file and press OK the following dialogue box appears

The following dialogue box appears:

If the variable names are in the file ask it to read the names. If you do not specify the Range the program will identify the Range: Once you “click OK”, two windows will appear

One that will contain the output:

The other containing the data:

To perform any statistical Analysis select the Analyze menu:

Then select Regression and Linear.

The following Regression dialogue box appears

Select the Dependent variable Y.

Select the Independent variables X 1, X 2, etc.

If you select the Method - Enter.

All variables will be put into the equation. There are also several other methods that can be used : 1.Forward selection 2.Backward Elimination 3.Stepwise Regression

Forward selection 1.This method starts with no variables in the equation 2.Carries out statistical tests on variables not in the equation to see which have a significant effect on the dependent variable. 3.Adds the most significant. 4.Continues until all variables not in the equation have no significant effect on the dependent variable.

Backward Elimination 1.This method starts with all variables in the equation 2.Carries out statistical tests on variables in the equation to see which have no significant effect on the dependent variable. 3.Deletes the least significant. 4.Continues until all variables in the equation have a significant effect on the dependent variable.

Stepwise Regression (uses both forward and backward techniques) 1.This method starts with no variables in the equation 2.Carries out statistical tests on variables not in the equation to see which have a significant effect on the dependent variable. 3.It then adds the most significant. 4.After a variable is added it checks to see if any variables added earlier can now be deleted. 5.Continues until all variables not in the equation have no significant effect on the dependent variable.

All of these methods are procedures for attempting to find the best equation The best equation is the equation that is the simplest (not containing variables that are not important) yet adequate (containing variables that are important)

Once the dependent variable, the independent variables and the Method have been selected if you press OK, the Analysis will be performed.

The output will contain the following table R 2 and R 2 adjusted measures the proportion of variance in Y that is explained by X 1, X 2, X 3, etc (67.6% and 67.3%) R is the Multiple correlation coefficient (the maximum correlation between Y and a linear combination of X 1, X 2, X 3, etc)

The next table is the Analysis of Variance Table The F test is testing if the regression coefficients of the predictor variables are all zero. Namely none of the independent variables X 1, X 2, X 3, etc have any effect on Y

The final table in the output Gives the estimates of the regression coefficients, there standard error and the t test for testing if they are zero Note: Engine size has no significant effect on Mileage

The estimated equation from the table below: Is:

Note the equation is: Mileage decreases with: 1.With increases in Engine Size (not significant, p = 0.432) With increases in Horsepower (significant, p = 0.000) With increases in Weight (significant, p = 0.000)

Properties of the Least Squares Estimators: 1.Normally distributed ( If there error terms are Normally distributed) 2.Unbiased Estimators of the Linear Parameters  0,  1,  2,...  p. 3.Minimum Variance (Minimum Standard Error) of all Unbiased Estimators of the Linear Parameters  0,  1,  2,...  p.

Comments: 1.The Error Variance s 2 (and s). 2.s X i, the standard deviation of X i (the i th independent variable). 3.The sample size n. 4.The correlations between all pairs of variables.

decreases as s decreases. decreases as s X i increases. decreases as n increases. increases as the correlation between pairs of independent variables increases. –In fact the standard error of the least squares estimates can be extremely high if there is a high correlation between one of the independent variables and a linear combination of the remaining independent variables. (the problem of Multicollinearity). The standard error of  ˆ i, S.E.   ˆ i   s  ˆ i

The Covariance Matrix,Correlation and X T X inverse matrix The Covariance Matrix where and

The Correlation Matrix

The X T X inverse matrix

If we multiply each entry in the X T X inverse matrix by s 2 = MS Error this matrix turns into the covariance matrix for :

These matrices can be used to compute standard Errors for linear combinations of the regression coefficients Namely

An Example Suppose one is interested in how the cost per month (Y) of heating a plant is determined the average atmospheric temperature in the Month (X 1 ) and the number of operating days in the month (X 2 ). The data on these variables was collected for n = 25 months selected at random and is given on the following page. Y = cost per month of heating a plant X 1 = average atmospheric temperature in the month X 2 = the number of operating days for the plant in the month.

The Least Squares Estimates: ConstantX1X1 X2X2 Estimate Standard Error The Covariance Matrix ConstantX1X1 X2X X1X X2X The Correlation Matrix ConstantX1X1 X2X X1X X2X The X T X Inverse matrix ConstantX1X1 X2X X1X x x10 -3 X2X

The Analysis of Variance Table SourcedfSSMSF Regression Error Total

Summary Statistics (R 2, R adjusted 2 = R a 2 and R) R 2 = / =.8491 (explained variance in Y %) R a 2 = 1 - [1 - R 2 ][(n-1)/(n-p-1)] = 1 - [ ][24/22] =.8354 (83.54 %) R = =.9215 = Multiple correlation coefficient

Three-dimensional Scatter-plot of Cost, Temp and Days.

Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.

Select Analysis->Regression->Linear

To print the correlation matrix or the covariance matrix of the estimates select Statistics

Check the box for the covariance matrix of the estimates.

Here is the table giving the estimates and their standard errors.

Here is the table giving the correlation matrix and covariance matrix of the regression estimates: What is missing in SPSS is covariances and correlations with the intercept estimate (constant).

This can be found by using the following trick 1.Introduce a new variable (called constnt) 2.The new “variable” takes on the value 1 for all cases

Select Transform->Compute

The following dialogue box appears Type in the name of the target variable - constnt Type in ‘1’ for the Numeric Expression

This variable is now added to the data file

Add this new variable (constnt) to the list of independent variables

Under Options make sure the box – Include constant in equation – is unchecked The coefficient of the new variable will be the constant.

Here are the estimates of the parameters with their standard errors Note the agreement with parameter estimates and their standard errors as previously calculated.

Here is the correlation matrix and the covariance matrix of the estimates.

Testing for Hypotheses related to Multiple Regression.

Testing for Hypotheses related to Multiple Regression. The General Linear Hypothesis H 0 :h 11  1 + h 12  2 + h 13  h 1p  p = h 1 h 21  1 + h 22  2 + h 23  h 2p  p = h 2... h q1  1 + h q2  2 + h q3  h qp  p = h q where h 11  h 12, h 13,..., h qp and h 1  h 2, h 3,..., h q are known coefficients.

Examples 1.H 0 :  1 = 0 2.H 0 :  1 = 0,  2 = 0,  3 = 0 3.H 0 :  1 =  2 4.H 0 :  1 =  2,  3 =  4 5.H 0 :  1 = 1/2(  2 +  3 ) 6.H 0 :  1 = 1/2(  2 +  3 ),  3 = 1/3(  4 +  5 +  6 )

1. The Complete Model Y =  0 +  1 X 1 +  2 X 2 +  3 X  p X p +  2. The Reduced Model The model implied by H 0. You are interested in knowing whether the complete model can be simplified to the reduced model. When testing hypotheses there are two models of interest.

Some Comments 1.The complete model contains more parameters and will always provide a better fit to the data than the reduced model. 2.The Residual Sum of Squares for the complete model will always be smaller than the R.S.S. for the reduced model. 3.If the reduction in the R.S,S. is small as we change from the reduced model to the complete model, the reduced model should be accepted as providing an adequate fit. 4.If the reduction in the R.S,S. is large as we change from the reduced model to the complete model, the reduced model should be rejected as providing an adequate fit and the complete model should be kept. These principles form the basis for the following test.

Testing the General Linear Hypothesis The F-test for H 0 is performed by carrying out two runs of a multiple regression package.

Run 1: Fit the complete model. Resulting in the following Anova Table: SourcedfSum of Squares RegressionpSS Reg Residual (Error)n-p-1SS Error Totaln-1SS Total

Run 2: Fit the reduced model (q parameters eliminated) Resulting in the following Anova Table: SourcedfSum of Squares Regressionp-qSS 1 Reg Residual (Error)n-p+q-1SS 1 Error Totaln-1SS Total

The Test: The Test is carried out using the Test Statistic where SS H 0 = SS 1 Error - SS Error = SS Reg - SS 1 Reg and s 2 = SS Error /(n-p-1). The test statistic, F, has an F-distribution with 1 = q d.f. in the numerator and 2 = n – p - 1 d.f. in the denominator if H 0 is true.

Distribution when H 0 is true

The Critical Region Reject H 0 if F > F  (q, n – p – 1) F  (q, n – p – 1)

The Anova Table for the Test: SourcedfSum of SquaresMean SquareF Regressionp-qSS 1 Reg [1/(p-q)]SS 1 Reg MS 1 Reg /s 2 (for the reduced model) DepartureqSS H0 (1/q)SS H0 MS H0 /s 2 from H 0 Residual n-p-1SS Error s 2 (Error) Totaln-1SS Total

Some Examples: Four independent Variables X 1, X 2, X 3, X 4 The Complete Model Y =  0 +  1 X 1 +  2 X 2 +  3 X 3 +  4 X 4 + 

1)a)H 0 :  3 = 0,  4 = 0 (q = 2) b)The Reduced Model: Y =  0 +  1 X 1 +  2 X 2 +  Dependent Variable:Y Independent Variables: X 1, X 2

2)a)H 0 :  3 = 4.5,  4 = 8.0 (q = 2) b)The Reduced Model: Y – 4.5X 3 – 8.0X 4 =  0 +  1 X 1 +  2 X 2 +  Dependent Variable:Y – 4.5X 3 – 8.0X 4 Independent Variables: X 1, X 2

Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.

Suppose we want to test: H 0 :  1 = 0 against H A :  1 ≠ 0 i.e. engine size(engine) has no effect on mileage(mpg). The Full model: Y =  0 +  1 X 1 +  2 X 2 +  1 X 3 +  (mpg) (engine)(horse) (weight) The reduced model: Y =  0 +  2 X 2 +  1 X 3 + 

The ANOVA Table for the Full model:

The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:

The ANOVA Table for testing H 0 :  1 = 0 against H A :  1 ≠ 0

Now suppose we want to test: H 0 :  1 = 0,  2 = 0 against H A :  1 ≠ 0 or  2 ≠ 0 i.e. engine size (engine) and horsepower (horse) have no effect on mileage (mpg). The Full model: Y =  0 +  1 X 1 +  2 X 2 +  1 X 3 +  (mpg) (engine)(horse) (weight) The reduced model: Y =  0 +  1 X 3 + 

The ANOVA Table for the Full model

The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:

The ANOVA Table for testing H 0 :  1 = 0,  2 = 0 against H A :  1 ≠ 0 or  2 ≠ 0

Testing the General Linear Hypothesis Another Example

In the following example: Weight Gain was being measured along with the amount of protein in the diet due to the following sources –Beef, –Pork, and –two types of cereals.

Dependent Variable Y = Weight Gain Independent Variables X 1 = the amount of protein in the diet due to the Beef source, X 2 = the amount of protein in the diet due to the Pork source, X 3 = the amount of protein in the diet due to the Cereal 1 source X 4 = the amount of protein in the diet due to the Cereal 2 source.

The Multiple Linear model Y =  0 +  1 X 1 +  2 X 2 +  3 X 3 +  4 X 4 +  or Weight Gain =  0 +  1 (Beef) +  2 (Pork) +  3 (Cereal 1) +  4 (Cereal 2) + 

caseBeefPorkCereal 1Cereal 2Weight Gain The weight gains are given in the following table below:

The Summary Statistics of Regression computation are given below: Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations14

The estimates of the regression coefficients and their standard errors are given below: CoefficientsStandard Error t StatP-valueLower 95%Upper 95% Intercept E X1X X2X X3X X4X

ANOVA dfSSMSFSignificance F Regression Residual Total

Note that  i is the rate of increase in weight gain due to increase in protein with respect to the given source of protein. One of course would be interested in whether weight gain increased with protein for any of the sources of protein. That is testing the Null Hypothesis H 0 :  1 = 0,  2 = 0,  3 = 0 and  4 = 0 against the alternative Hypothesis H A : at least one  i  0.

This can be achieved by using the Anova Table below: dfSSMSFSignificance F Regression Residual Total

Test statistic – F ratio F distribution Significance – p value F

F distribution describes the behaviour or the F statistics when H 0 is true. If associated p-value is small, H 0 should be rejected in favour of H A. The cut-off values are  =.05 or  =.01

However one would also be interested in making more specific comparisons. Namely, comparing effect on weight gain of –the two meat sources and –the two cereal sources on weight gain

In this case we would be interested in testing the Null Hypothesis H 0 :  1 =  2,  3 =  4 against the alternative Hypothesis H A :  1   2 or  3   4.

Then assuming H 0 :  1 =  2,  3 =  4 the reduced model becomes Y =  0 +  1 (X 1 + X 2 ) +  3 (X 3 + X 4 ) +  Dependent Variable: Y Independent Variables: (X 1 + X 2 ) and (X 3 + X 4 )

The Anova Table for the reduced model: dfSSMSFSignificance F Regression Residual Total

The Anova Table for the complete model: dfSSMSFSignificance F Regression Residual Total

the Anova Table to carrying out the test: dfSSMSFSignificance F  1 +  2 = 0,  3 +  4 =  1 =  2,  3 =  Residual Total

DUMMY VARIABLES