Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.

Slides:



Advertisements
Similar presentations
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Advertisements

Basic Data Analysis IV Regression Diagnostics in SPSS
What is MPC? Hypothesis testing.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 6: Multiple Regression
Lecture 24 Multiple Regression (Sections )
Regression Diagnostics Checking Assumptions and Data.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Dr. Andy Field.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Linear Regression Chapter 8. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
12/17/ lecture 111 STATS 330: Lecture /17/ lecture 112 Outliers and high-leverage points  An outlier is a point that has a larger.
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
ANOVA, Regression and Multiple Regression March
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Multiple Regression 8 Oct 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download from “Example Datasets”: Record2.sav Domene.sav.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 2: Modeling Distributions of Data
Multiple Regression Prof. Andy Field.
Regression Analysis Simple Linear Regression
INFERENTIAL STATISTICS: REGRESSION ANALYSIS AND STANDARDIZATION
Chapter 12: Regression Diagnostics
Chapter 2: Modeling Distributions of Data
بحث في التحليل الاحصائي SPSS بعنوان :
Lecture Slides Elementary Statistics Thirteenth Edition
Diagnostics and Transformation for SLR
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Checking Assumptions Primary Assumptions Secondary Assumptions
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Diagnostics and Transformation for SLR
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Presentation transcript:

Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha

Previous Lecture Summary SPSS application for different Methods of Multiple Regression Multiple regression equation formulation Beta values Interpretation Reporting the model Assumptions underlying multiple regression

How to Interpret Beta Values Beta values: the change in the outcome associated with a unit change in the predictor. Standardised beta values: tell us the same but expressed as standard deviations.

Beta Values b 1 = So, as advertising increases by £1, album sales increase by units. b 2 = So, each time (per week) a song is played on the radio its sales increase by units.

Constructing a Model

Standardised Beta Values  1 = As advertising increases by 1 standard deviation, album sales increase by of a standard deviation.  2 = When the number of plays on the radio increases by 1 SD its sales increase by standard deviations.

Interpreting Standardised Betas As advertising increases by £485,655, album sales increase by  80,699 = 42,206. If the number of plays on the radio per week increases by 12, album sales increase by  80,699 = 44,062.

Reporting the Model

How well does the Model fit the data? There are two ways to assess the accuracy of the model in the sample: Residual Statistics Standardized Residuals Influential cases Cook’s distance

Standardized Residuals In an average sample, 95% of standardized residuals should lie between  2. 99% of standardized residuals should lie between  2.5. Outliers Any case for which the absolute value of the standardized residual is 3 or more, is likely to be an outlier.

Cook’s Distance Measures the influence of a single case on the model as a whole. Absolute values greater than 1 may be cause for concern.

Generalization When we run regression, we hope to be able to generalize the sample model to the entire population. To do this, several assumptions must be met. Violating these assumptions stops us generalizing conclusions to our target population.

Multicollinearity Multicollinearity exists if predictors are highly correlated. This assumption can be checked with collinearity diagnostics.

Tolerance should be more than 0.2 VIF should be less than 10

Checking Assumptions about Errors Homoscedacity/Independence of Errors: Plot ZRESID against ZPRED. Normality of Errors: Normal probability plot.

Homoscedasticity: ZRESID vs. ZPRED

Normality of Errors: Histograms and P-P plots

Outliers and Residuals The normal or unstandardized residuals are measured in the same units as the outcome variable and so are difficult to interpret across different models we cannot define a universal cut-off point for what constitutes a large residual we use standardized residuals, which are the residuals divided by an estimate of their standard deviation

Outliers and Residuals Some general rules for standardized residuals are derived from these facts: (1) standardized residuals with an absolute value greater than 3.29 (we can use 3 as an approximation) are cause for concern because in an average sample case a value this high is unlikely to happen by chance; (2) if more than 1% of our sample cases have standardized residuals with an absolute value greater than 2.58 (we usually just say 2.5) there is evidence that the level of error within our model is unacceptable (the model is a fairly poor fit of the sample data)

Outliers and Residuals (3) if more than 5% of cases have standardized residuals with an absolute value greater than 1.96 (we can use 2 for convenience) then there is also evidence that the model is a poor representation of the actual data. Studentized residual, which is the unstandardized residual divided by an estimate of its standard deviation that varies point by point. These residuals have the same properties as the standardized residuals but usually provide a more precise estimate of the error variance of a specific case.

Influential Cases There are several residual statistics that can be used to assess the influence of a particular case. Adjusted predicted value for a case when that case is excluded from the analysis. The computer calculates a new model without a particular case and then uses this new model to predict the value of the outcome variable for the case that was excluded If a case does not exert a large influence over the model then we would expect the adjusted predicted value to be very similar to the predicted value when the case is included

Influential Cases The difference between the adjusted predicted value and the original predicted value is known as DFFit We can also look at the residual based on the adjusted predicted value: that is, the difference between the adjusted predicted value and the original observed value. This is the deleted residual. The deleted residual can be divided by the standard deviation to give a standardized value known as the Studentized deleted residual. The deleted residuals are very useful to assess the influence of a case on the ability of the model to predict that case.

Influential Cases One statistic that does consider the effect of a single case on the model as a whole is Cook’s distance. Cook’s distance is a measure of the overall influence of a case on the model and Cook and Weisberg (1982) have suggested that values greater than 1 may be cause for concern.

Lecture Summary Outliers and Residuals Example of Model analysis for multiple regression