SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)

Slides:



Advertisements
Similar presentations
Ecole Nationale Vétérinaire de Toulouse Linear Regression
Advertisements

Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
Multiple Regression.
SADC Course in Statistics Analysis of Variance for comparing means (Session 11)
SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Analysis of Variance with two factors (Session 13)
SADC Course in Statistics Simple Linear Regression (Session 02)
SADC Course in Statistics Multiple Linear Regression: Introduction (Session 06)
The Poisson distribution
SADC Course in Statistics Comparing several proportions (Session 15)
Overview of Sampling Methods II
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Trends in time series (Session 02)
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
Assumptions underlying regression analysis
SADC Course in Statistics Meaning and use of confidence intervals (Session 05)
Basic Life Table Computations - I
SADC Course in Statistics The binomial distribution (Session 06)
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Importance of the normal distribution (Session 09)
SADC Course in Statistics Revision of key regression ideas (Session 10)
Correlation & the Coefficient of Determination
SADC Course in Statistics Confidence intervals using CAST (Session 07)
SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.
SADC Course in Statistics Exploratory Data Analysis (EDA) in the data analysis process Module B2 Session 13.
SADC Course in Statistics Graphical summaries for quantitative data Module I3: Sessions 2 and 3.
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Basic Life Table Computations - II (Session 13)
SADC Course in Statistics Revision using CAST (Session 04)
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
SADC Course in Statistics (Session 09)
SADC Course in Statistics Goodness-of-fit tests (and further issues) (Session 16)
SADC Course in Statistics General approaches to sample size determinations (Session 12)
SADC Course in Statistics Review of ideas of general regression models (Session 15)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
SADC Course in Statistics Revision on tests for means using CAST (Session 17)
SADC Course in Statistics Analysing Data Module I3 Session 1.
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Probability Distributions
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
1 Revisiting salary Acme Bank: Background A bank is facing a discrimination suit in which it is accused of paying its female employees.
Module 16: One-sample t-tests and Confidence Intervals
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
1 Interpreting a Model in which the slopes are allowed to differ across groups Suppose Y is regressed on X1, Dummy1 (an indicator variable for group membership),
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Simple Linear Regression Analysis
Multiple Regression and Model Building
STAT E-150 Statistical Methods
Topic 12: Multiple Linear Regression
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Multiple Regression in Practice The value of outcome variable depends on several explanatory variables. The value of outcome variable depends on several.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Lecture 6: Multiple Regression
SADC Course in Statistics Comparing Regressions (Session 14)
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Chapter 13: Inference in Regression
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
ANOVA, Regression and Multiple Regression March
Chapter 20 Linear and Multiple Regression
CHAPTER 29: Multiple Regression*
Linear Regression.
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to appreciate requirements and limitations of variables used in a multiple regression recognise the dependence of anova results on the order of fitting variables interpret results of anova results when terms are fitted sequentially understand the difference between interpretation of t-probabilities and anova F- probabilities when there are 2 or more xs.

To put your footer here go to View > Header and Footer 3 The crimes example again! Recall that in the example on relating number of acts regarded as crimes to age, college years and parents income, the college variable was non-significant. Although a quantitative variable, college had only 3 possible values! This is NOT a problem since college is an x variable, and there were many observations at each of these values. It is a problem if the y-variable had only a few distinct values – normality assumption is then violated.

To put your footer here go to View > Header and Footer 4 Points to note about the variables In the regression analyses so far considered, 1.the y-variable is a quantitative measurement, assumed to have an approximate normal distribution. 2.The x-variables are quantitative variates, each contributing 1 d.f. to the model. However, some xs could be categorical factors, each contributing d.f.=number of levels -1 to the model. The latter case will be discussed later!

To put your footer here go to View > Header and Footer 5 But – care is sometimes needed… If an x-variable has only a few values, pay attention to the number of observations for each. In practical 6, variable empl was highly significant (p=0.006) The residual plot looked OK, apart from one outlier (where just 1 HH had 3 employed members). But… will empl remain significant if the outlier was removed?

To put your footer here go to View > Header and Footer 6 Results after deleting outlier lnexpdf| Coef. Std. Err. t P>|t| hhsize| empl| const.| Note that empl is now non-significant! Dangerous to use a model where conclusions depend on just 1 observation!

To put your footer here go to View > Header and Footer 7 ANOVA for 2-variables (sequential) We return again to the crimes example to show the effect of the order of fitting terms Source | df Seq.SS MS F Prob>F age | college | Residual | Total | Here, age is fitted first, then college, hence F- probs need to be interpreted accordingly.

To put your footer here go to View > Header and Footer 8 ANOVA for 2-variables (sequential) Consider now the anova with the order of fitting terms changed… Source | df Seq.SS MS F Prob>F college | age | Residual | Total | Here, college is fitted first, then age. Note change in F-probs from previous slide. Why is this?

To put your footer here go to View > Header and Footer 9 Discussion… What is the same and what is different aross slides 7 and 8 above? Order of fitting seems to matter! What do the results mean? How do the F-probs from above and the t- probs below for model estimates compare? crimes | Coef. P>|t| age | college | const. |

To put your footer here go to View > Header and Footer 10 Exercise: 2 nd example: Q2, Pract. 6 Open penrain.dta from Q2 of previous practical. Note down anova results below from a regression of rain on elevation, then altitude. Sourced.f.S.S.M.S.FProb. Elevation1 Altitude1 Residual13 Total15 Interpretation of F-probs:

To put your footer here go to View > Header and Footer 11 Changing order of fitting: Now fit altitude, then elevation. Note down the results below. Sourced.f.S.S.M.S.FProb. Altitude1 Elevation1 Residual13 Total15 Interpretation of F-probs:

To put your footer here go to View > Header and Footer 12 Model parameter estimates: Finally, note down the parameter estimates and the corresponding t-probabilities: Parameter Estimate of model parameter t-Prob. Altitude Elevation Constant Overall conclusions:

To put your footer here go to View > Header and Footer 13 Adjusted sums of squares Some software packages present adjusted sums of squares, taking results from anova tables in slides 10 and 11 into one single anova: SourcedfAdj. SSAdj MSFProb. Altitude Elevation Residual Total Note that the sums of squares now do not add to the total S.S. What do the F-probabilities now represent?

To put your footer here go to View > Header and Footer 14 Key Points Recognise the type of variable (y) being modelled. Methods discussed apply when y is quantitative The explanatory variables (the xs) can be variables of any type – but so far we have only considered quantitative xs Take care when interpreting anova F-probs to check whether the sums of squares are sequential or adjusted Note that all t-probabilities (associated with the parameter estimates) are adjusted for all other terms in the model

To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…