STAT E-150 Statistical Methods

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Chapter 4 Sampling Distributions and Data Descriptions.
Angstrom Care 培苗社 Quadratic Equation II
1
Ecole Nationale Vétérinaire de Toulouse Linear Regression
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
STATISTICS HYPOTHESES TEST (I)
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Chapter 7 Sampling and Sampling Distributions
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Simple Linear Regression 1. review of least squares procedure 2
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Chapter 4: Basic Estimation Techniques
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
PP Test Review Sections 6-1 to 6-6
Chi-Square and Analysis of Variance (ANOVA)
LIAL HORNSBY SCHNEIDER
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Hypothesis Tests: Two Independent Samples
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Adding Up In Chunks.
Lecture Unit Multiple Regression.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Essential Cell Biology
1 Interpreting a Model in which the slopes are allowed to differ across groups Suppose Y is regressed on X1, Dummy1 (an indicator variable for group membership),
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Chapter 8 Estimation Understandable Statistics Ninth Edition
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Experimental Design and Analysis of Variance
Essential Cell Biology
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Simple Linear Regression Analysis
Business Statistics, 4e by Ken Black
Correlation and Linear Regression
Multiple Linear Regression and Correlation Analysis
Multiple Regression and Model Building
Energy Generation in Mitochondria and Chlorplasts
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Correlation and Regression By Walden University Statsupport Team March 2011.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Linear Regression.  Uses correlations  Predicts value of one variable from the value of another  ***computes UKNOWN outcomes from present, known outcomes.
Multiple Regression continued… STAT E-150 Statistical Methods.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
رگرسیون چندگانه Multiple Regression
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Correlation and Simple Linear Regression
Inferences for Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Inferences for Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

STAT E-150 Statistical Methods Multiple Regression

Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the ages of 18 and 39, a healthy body fat percent is 8% to 19%. (For women it is 21% to 32%.) It is not easy to measure body fat percent, but we can find a model for the relationship between body fat percent and waist size and use it to find the body weight percent associated with a given waist size.

The scatterplot indicates a positive linear relationship between waist size and body fat percent:

The SPSS output shows a significant linear relationship between the two variables. R2 = .678, so we know that almost 68% of the variability in the body fat percentage is accounted for by the waist size. What other variables might be used to predict body fat percentage? Can we improve the prediction by including additional variables? Coefficientsa  Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -42.734 2.717   -15.731 .000 Waist 1.700 .074 .824 22.875 a. Dependent Variable: Pct BF Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .824a .678 .677 4.7126

The Multiple Linear Regression Model   We have n observations on k explanatory variables X1, X2, X3, …, Xk and a response variable, Y. The multiple regression model is:   Y = β0 + β1x1 + β2x2 +  + βkxk+ ε where ε ~ N(0, σε) and the errors are independent from one another. The predictor variables may be higher powers or other functions of quantitative variables, coded categorical variables, or interaction terms. The main restriction is that the model is linear; that is, each term is a constant multiple of a predictor.

Fitting a Multiple Linear Regression Model As we did in Simple Linear Regression, we will choose a possible set of predictors, estimate the coefficients based on sample data, and assess the fit. We will again use the sum of squared residuals, where the residuals are the differences between the actual Y values and the Y values predicted by the prediction equation and use SPSS to determine the estimates of the coefficients βi that minimize the sum of the squared residuals.

We will test the hypotheses   H0: β1 = β2 = β3 =  = βk = 0 Ha: The slopes are not all zero. Our assumptions are: - The y-values are independent of each other - Y has a constant variance for any combination of predictors - The values of y are normally distributed for any fixed set of values for the explanatory variables That is, the errors are independent values from a N(0, σε) distribution.

If the null hypothesis is rejected, then test a null hypothesis for each of the coefficients: H0: βj = 0 Ha: βj ≠ 0   Note: If the null hypothesis is not rejected, it does not mean that the corresponding predictor variable has no relationship to y; it means that the predictor variable contributes nothing to modeling y after allowing for all the other predictors.

The hypotheses for fitting a multiple linear regression model to predict body fat percentage based on waist size and height are   H0: βheight = βweight = 0 Ha: The slopes are not both zero.

Here are the scatterplots using the individual predictors: Although this suggests a linear relationship between waist size and body fat percentage, there doesn't appear to be a linear relationship between height and body fat percentage.

Here are some of the results for a multiple regression analysis with both height and waist as predictors: The p-value for height is close to 0, so we know that height does contribute to the multiple regression model. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

The graph shown below is called a scatterplot matrix The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using Which pair of variables shows a strong linear relationship?   Which pair of variables shows a weak linear relationship? Which pair of variables shows no linear relationship?

The graph shown below is called a scatterplot matrix The graph shown below is called a scatterplot matrix. It shows the scatterplots for all pairs of the variables we are using Which pair of variables shows a strong linear relationship?   Pct BF and Waist Which pair of variables shows a weak linear relationship?   Height and Waist   Which pair of variables shows no linear relationship? Pct BF and Height

Residual Analysis These plots tell us that there is no particular scatter to the residuals, and that the distribution of the residuals is close to normal.

Use the SPSS output provided to answer the questions below: What is the fitted regression equation? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: What is the fitted regression equation? %BodyFat = 1.773 waist - .601 height - 3.110 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: %BodyFat = 1.773 waist - .601 height - 3.110 What does the value 1.773 tell you? An increase of one inch in the waist measurement is associated with an increase of 1.773 in body fat percentage. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: %BodyFat = 1.773 waist - .601 height - 3.110 What does the value 1.773 tell you? An increase of one inch in the waist measurement is associated with an increase of 1.773 in body fat percentage for men of a particular height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: %BodyFat = 1.773 waist - .601 height - 3.110 What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: %BodyFat = 1.773 waist - .601 height - 3.110 What change in Body Fat Percentage is associated with each additional inch of height? An increase of one inch of height is associated with an decrease of .601 in body fat percentage for men of a particular weight. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: What is the value of R2 ? What does it tell you? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS output provided to answer the questions below: What is the value of R2 ? What does it tell you? R2 = .713 which tells us that height and waist size together account for about 71.3% of the variation in the body fat percentage for men. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Height, Waist b. Dependent Variable: Pct BF  

Use the SPSS results to complete the hypothesis test: The value the test statistic is: 307.096 p = 0+ What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test: The value the test statistic is: 307.096 p = 0+ What can you conclude? is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

Use the SPSS results to complete the hypothesis test: The value the test statistic is: 307.096 p = 0+ What can you conclude? Since p is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables waist and height.is close to zero, the null hypothesis is rejected. This data indicates that there is a linear relationship between body fat percentage and the predictor variables Waist and Height.

We also want to estimate the standard deviation of the error term, σε As we add a new predictor to the model, we have a new coefficient to estimate, and so we lose one more degree of freedom. The estimate for the standard error of the multiple regression model with k predictors is  

Use the SPSS output to find the standard error of this regression model:  

Use the SPSS output to find the standard error of this regression model:  

Assessing a Multiple Regression Model Individual t-Tests for Coefficients in Multiple Regression In order to determine whether any one of the predictor variables is helpful to include in the model, we test the coefficient for that predictor: H0: βi = 0 Ha: βi ≠ 0   The test statistic is with n - k - 1 degrees of freedom.

It is important to remember that the meaning of each coefficient depends on all of the predictors in the regression model. If we fail to reject the null hypothesis, it means that the corresponding predictor variable contributes nothing to the multiple regression model after allowing for all other predictors. 

Use the SPSS output to test the coefficients in our model: H0: βheight = 0 Ha: βheight ≠ 0 t = p = What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model: H0: βheight = 0 Ha: βheight ≠ 0 t = -5.47 p = 0+ What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model: H0: βheight = 0 Ha: βheight ≠ 0 t = -5.47 p = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the height. We can conclude that the body fat percentage changes as the height changes, for men with the same waist size. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model: H0: βwaist = 0 Ha: βwaist ≠ 0 t = p = What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model: H0: βwaist = 0 Ha: βwaist ≠ 0 t = 24.768 p = 0+ What is your conclusion? Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Use the SPSS output to test the coefficients in our model: H0: βwaist = 0 Ha: βwaist ≠ 0 t = 24.768 p = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test? H0: βwaist = 0 Ha: βwaist > 0 t = 24.768 p = What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test? H0: βwaist = 0 Ha: βwaist > 0 t = 24.768 p = .000/2 = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage changes as the waist size changes, for men of the same height. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Can we do a one-tailed test? H0: βwaist = 0 Ha: βwaist > 0 t = 24.768 p = .000/2 = 0+ What is your conclusion? Since p is close to 0, we will reject the null hypothesis. There is evidence that the percent of body fat is related to the waist size. We can conclude that the body fat percentage increases as the waist size changes, for men of the same height.to 0, we will Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Waist 1.773 .072 .859 24.768 .000 Height -.601 .110 -.190 -5.470 a. Dependent Variable: Pct BF

Adjusted R2   The adjusted R2 is an adjustment to R2 that takes the sample size and the number of parameters (βj) into consideration. The adjusted R2 increases as more predictors are added to the model, and so it can be useful in comparing regression models with different numbers of predictor variables.

Creating a Scatterplot Matrix   Click on Graphs > Chart Builder.   Select Scatter/Dot from the list of charts. Drag the Scatterplot Matrix to the window.

Drag the matrix variables to the horizontal axis. Click on OK. The scatterplot matrix will appear in the Output Viewer.

Estimating the Model Click on Analyze > Regression > Linear Drag the dependent variable and all independent variables to the appropriate locations. Click on OK.

This will produce several tables: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .845a .713 .711 4.4598 a. Predictors: (Constant), Waist, Height   Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -3.110 7.687   -.405 .686 Height -.601 .110 -.190 -5.470 .000 Waist 1.773 .072 .859 24.768 a. Dependent Variable: Pct BF ANOVAb Sum of Squares df Mean Square F Regression 12216.077 2 6108.038 307.096 .000a Residual 4912.743 247 19.890 Total 17128.820 249 a. Predictors: (Constant), Waist, Height b. Dependent Variable: Pct BF

If you click on Plots in the Linear Regression dialog box, you will get this dialog box: Plot the *ZRESIDS on the Y axis against the *ZPRED values on the X axis. You may also choose to create a Normal Probability Plot and/or histogram of the residuals.

Click on Continue and then OK. Here are the results: