Analysis of Variance: Some Review and Some New Ideas

Slides:



Advertisements
Similar presentations
Chapter 9: Simple Regression Continued
Advertisements

Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
NOTATION & ASSUMPTIONS 2 Y i =  1 +  2 X 2i +  3 X 3i + U i Zero mean value of U i No serial correlation Homoscedasticity Zero covariance between U.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
AP Statistics Chapter 16. Discrete Random Variables A discrete random variable X has a countable number of possible values. The probability distribution.
Bivariate Regression Analysis
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Classical Regression III
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
1 Regression Econ 240A. 2 Outline w A cognitive device to help understand the formulas for estimating the slope and the intercept, as well as the analysis.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 5: Week 20: 15 th February OLS (2): assessing goodness of fit, extension to multiple regression.
Regression and Correlation
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Simple Linear Regression Analysis
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression and Correlation
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Example of Simple and Multiple Regression
Sec 10.3 Coefficient of Determination and Standard Error of the Estimate.
Measures of Regression and Prediction Intervals
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Equations in Simple Regression Analysis. The Variance.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Correlation and Regression Chapter 9. § 9.3 Measures of Regression and Prediction Intervals.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Regression Models Residuals and Diagnosing the Quality of a Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Multivariate Models Analysis of Variance and Regression Using Dummy Variables.
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Regression Analysis Relationship with one independent variable.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Environmental Modeling Basic Testing Methods - Statistics III.
AP Statistics Chapter 16. Discrete Random Variables A discrete random variable X has a countable number of possible values. The probability distribution.
Standard Deviation A Measure of Variation in a set of Data.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
EXCEL: Multiple Regression
Regression Analysis AGEC 784.
Relationship with one independent variable
Section 9-3   We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of determination.
Analysis of Variance and Regression Using Dummy Variables
Residuals and Diagnosing the Quality of a Model
CHAPTER 29: Multiple Regression*
Analysis of Variance: Some Review and Some New Ideas
Goodness of Fit The sum of squared deviations from the mean of a variable can be decomposed as follows: TSS = ESS + RSS This decomposition can be used.
Chapter 13 Group Differences
Relationship with one independent variable
Simple Linear Regression
Section 9-3   We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of determination.
Measures of Regression Prediction Interval
Statistical Inference for the Mean: t-test
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Analysis of Variance: Some Review and Some New Ideas

Remember the concepts of variance and the standard deviation… Variance is the square of the standard deviation Standard deviation (s) - the square root of the sum of the squared deviations from the mean divided by the number of cases. See p. 47 in the text. We now want to use these concepts in regression analysis. We will be learning a new statistical test, the F test, which we will use to assess the statistical significance of a regression equation (not just the coefficients)

We will also use Analysis of Variance (ANOVA)… To compare difference of more than two means…. Which we’ve done to date with a T test.

Equations Mean Variance Standard Deviation Coefficient of Variation

Steps for calculating variance 1. Calculate the mean of a variable 2. Find the deviations from the mean: subtract the variable mean from each case 3. Square each of the deviations of the mean 4. The variance is the mean of the squared deviations from the mean, so sum the squared deviations from step 3 and divide by the number of cases. (When we did these steps before we were interested in going on to calculate a standard deviation and coefficient of variation. Now we’ll just stick with variance.)

Calculating Variance 1. Calculate the mean of a variable 2. Find the deviations from the mean: subtract the variable mean from each case

Calculating Variance, cont. 3. Square each of the deviations of the mean 4. The variance is the mean of the squared deviations from the mean, so sum the squared deviations from step 3 and divide by the number of cases The Sum of the squared deviations = 198.950 Variance = 198.950/20 = 9.948

A New Concept: Sum of Squares The sum of the square deviations from the mean is called the Sum of Squares Remember when we know nothing else about an interval variable, the best estimate of it is its mean. By extension, the sum of squares is the best estimate of the sum of squared deviations if we know nothing else about the variable. But….when we have more information, for example in a statistically significant bivariate regression model, we can improve on the best estimate of the dependent variable by using the information from the independent variable to estimate it.

The regression equation is a better estimator of food costs than the mean of food costs.

Calculating Total Sum of Squares Multiply the variance by N-1, so Total Sum of Squares = 8127.019*(638-1) Statistics TOTAL FOOD COSTS N Valid 638 Missing 0 Mean 270.2310 Variance 8127.019

Calculations for the Regression sum of Squares Regression sum of squares equals the sum of the squares of the deviations between yhat (predicted y) and ymean, RSS = Ʃ (yhat – ymean)2 Residual Sum of Squares = TSS - RSS

Now we want to estimate how much better To do that, we use the sum of squares calculations We partition the total sum of squares (TSS), e.g., the sum of square deviations from the mean, into two parts The first part is the sum of squared deviations using the regression equation (Regression Sum of Squares). The second part is the sum of squared deviations left over, e.g., not accounted for by the regression equation, or more formally, the TSS- Regression Sum of Squares = the Residual Sum of Squares.

Now let’s look at what we’ve accomplished… To do that, we’ll calculate an F test We need to add information about degrees of freedom. Remember the concept…how many parameters can one change and still calculate the statistic. If we want to know the mean, and the know the values, we can calculate the mean. If we know the mean, and we know all the values but one, we can calculate that last value. So there is 1 degree of freedom. For the F test, we need information about the degrees of freedom in the regression model. The formula is k-1 (the number of parameters to be estimated). For the bivariate model, that is a and b, so 2-1=1

Degrees of freedom continued… For the Residual Sum of Squares, the degrees of freedom is N-k, so for this model, 638-2 = 636. We then calculate a mean squares, by dividing the degrees of freedom into the Sum of squares. The F statistic is the regression mean square divided by the residual mean square. The probability of the F statistic is drawn from the probability table.

Another Way to Think about R Square The Regression Sum of Squares divided by the Total Sum of Squares is a measure of the proportion of variance explained by the model. So 2070301.432/5176911.308 = .399991049 or ~40%.