26134 Business Statistics Week 5 Tutorial

26134 Business Statistics Mahrita.Harahap@uts.edu.au Week 5 Tutorial
Multiple Linear Regression Key concepts in this tutorial are listed below 1. Multiple regression. 2. Interpreting parameter estimate (or coefficients). 3. Hypothesis testing of the model. 4. Interpreting significance (t-stats and F-stats). 5. Interpreting R2 and Adjusted R2: 6. Using the multiple linear regression for prediction.

In statistics we usually want to statistically analyse a population but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make inferences about the population parameters using the statistics of the sample (inferencing) with some level of accuracy (confidence level). A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a subset of the population of interest.

Multiple Linear Regression
A single metric dependent variable with two or more independent variables. Regression Equation: Interpretation of coefficients: For a one unit increase in Xi, Y increases/decreases by Bi units on average, holding other variables constant. NOTE: The interpretation of the intercept may be nonsensical since it is often not reasonable for the explanatory variable to be zero. As “x” is zero, the response variable is ….. If zero is not in the given sample x range then the intercept cannot be interpreted because 0 is outside of the sample range. Avoid trying to apply a regression line to predict values far from those that were used to create it.

Hypothesis Testing We use hypothesis testing to infer conclusions about the population parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter. The null hypothesis, denoted H0 is a statement or claim about a population parameter that is initially assumed to be true. Is always an equality. (Eg. H0: β1=0) The alternative hypothesis, denoted by H1 is the competing claim. What we are trying to prove. (Eg. H1: β1 ≠ 0) Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H0 is true. Each test statistic has a corresponding p-value. If p-value≤0.05 reject Ho If p-value>0.05 do not reject Ho Conclusion: Make your conclusion in context of the problem.

Assessing significance of individual independent variables
H0: βi=0 (no linear relationship) H1 : βi≠0 (linear relationship does exist between xi and y) Test Statistic: t-stat= t-test* indicates significance of individual independent variables Decision Criteria: If p-value≤0.05 reject Ho If p-value>0.05 do not reject Ho Conclusion: If p-value ≤0.05, reject Ho, there is significant evidence that βi is not equal to zero. Thus, the independent variable is linearly related to the dependent variable. If p-value >0.05, do not reject Ho, there is no significant evidence that βi is not equal to zero. Thus, the independent variable is NOT linearly related to the dependent variable.

Assessing significance of model.
H0: β1 = β2 = … = βk = 0 (no linear relationship) H1 : at least one βi ≠ 0 (at least one independent variable affects y) Test Statistic: F-Stat= Also mention the adjusted R2. Decision Criteria: If p-value≤0.05 reject Ho If p-value>0.05 do not reject Ho Conclusion: If p-value ≤0.05, reject Ho, there is significant evidence that at least one of the βi is not equal to zero. Thus, at least one independent variable is linearly related to y. Hence the the model does have some validity and it is useful. If p-value >0.05, do not reject Ho, there is no significant evidence that at least one of the βi is not equal to zero. Hence the the model is not valid or useful. R2 F Assessment of model 1 ∞ Perfect Close to 1 Large Good Close to 0 Small Poor Useless

R2 and Adjusted R2 The R2 is a numerical value between 0 and 1 which explains the variation in the dependent variable as explained by all independent variables. R2 always increases with addition of independent variables in the model irrespective of whether these variables contribute to the overall fit of the model. This can be misleading about the model assessment… The adjusted R2 recalculates the R2 based on the number of independent variables in the model and the sample size. In layman terms – this value tells us how useful the model is. Interpretation: …% of the variation in the dependent variable is explained by variation in the independent variables, taking into account the sample size and number of independent variables

26134 Business Statistics Week 5 Tutorial

Similar presentations

Presentation on theme: "26134 Business Statistics Week 5 Tutorial"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

26134 Business Statistics Week 5 Tutorial

Similar presentations

Presentation on theme: "26134 Business Statistics Week 5 Tutorial"— Presentation transcript:

Similar presentations

About project

Feedback