HMI 7530– Programming in R STATISTICS MODULE: Multiple Regression

Slides:

Advertisements

Similar presentations

Kin 304 Regression Linear Regression Least Sum of Squares

Advertisements

Correlation and Linear Regression.

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.

Correlation and Linear Regression

Chapter 13 Multiple Regression

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.

MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.

LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

Chapter 12 Multiple Regression

Predictive Analysis in Marketing Research

Multiple Regression Research Methods and Statistics.

Simple Linear Regression Analysis

Correlation & Regression

Objectives of Multiple Regression

Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept

Chapter 11 Simple Regression

Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.

Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.

Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

Chapter 7 Relationships Among Variables What Correlational Research Investigates Understanding the Nature of Correlation Positive Correlation Negative.

Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.

Chapter 16 Data Analysis: Testing for Associations.

Simple Linear Regression (SLR)

Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.

Multiple Regression INCM 9102 Quantitative Methods.

 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.

ANOVA, Regression and Multiple Regression March

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.

Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.

Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.

Multiple Regression Scott Hudson January 24, 2011.

Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.

رگرسیون چندگانه Multiple Regression

Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.

Data Analysis Module: One Way Analysis of Variance (ANOVA)

Regression Analysis AGEC 784.

Lecture 10 Regression Analysis

Bivariate Testing (ttests and proportion tests)

REGRESSION G&W p

STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis

Review 1. Describing variables.

Data Analysis Module: Correlation and Regression

Chapter 9 Multiple Linear Regression

Bivariate Testing (ANOVA)

B&A ; and REGRESSION - ANCOVA B&A ; and

Understanding Regression Analysis Basics

Regression Diagnostics

Multiple Regression Analysis and Model Building

STAT 4030 – Programming in R STATISTICS MODULE: Multiple Regression

Multiple Regression.

Regression Analysis.

Correlation and Regression Basics

STAT 4030 – Programming in R STATISTICS MODULE: Confidence Intervals

BIVARIATE REGRESSION AND CORRELATION

Correlation and Regression Basics

Understanding Research Results: Description and Correlation

Bivariate Testing (ANOVA)

CHAPTER 29: Multiple Regression*

Bivariate Testing (Chi Square)

HMI 7530– Programming in R STATISTICS MODULE: Confidence Intervals

HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis

Bivariate Testing (ttests and proportion tests)

Bivariate Testing (Chi Square)

I271b Quantitative Methods

Data Analysis Module: Chi Square

Correlation and Regression

Regression Analysis.

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

HMI 7530– Programming in R STATISTICS MODULE: Multiple Regression Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample Proportion Testing ANOVA Chi Square and Odds Regression Basics 2 2 2

STATISTICS MODULE: Multiple Regression Previously, we learned that a simple linear equation of a line takes the general form of y=mx+b, where: Y is the dependent variable m is the slope of the line X is the independent variable or predictor b is the Y-intercept. When we discussion regression models, we transform this equation to be: Y = bo + b1x1 Where bo is the y-intercept and b1 is the slope of the line. The “slope” is also the effect of a one unit change of x on y. 3

STATISTICS MODULE: Multiple Regression This was fine…but typically we don’t have just one predictor – we have lots of predictors. When we discussion multiple regression models, the general form of the equation is like this: Y = bo + b1x1 + b2x2 + b3x3 … bnxn Where bo is still the y-intercept and “bi “ is the effect of a unit change of each of the individual predictors on the y (dependent) variable. Lets discuss the general form of different hypothetical multiple regression models… 4

STATISTICS MODULE: Multiple Regression The requirements for Multiple Regression are general the same as they were for Linear Regression: The relationship of the dependent and the independent (s) variables is assumed to be linear. The relationship of the dependent and the independent (s) variables will have some (hopefully) significant correlation. There should be no extreme values that influence (usually negatively) the results. Results are homoscedastic. All observations are independent. 5

STATISTICS MODULE: Multiple Regression But…there are some issues in Multiple Regression which are not present in Linear Regression: Multicollinearity amongst predictors “Ingredient” variables Selection Methods/Model Parsimony Lets explore each of these in turn… 6

STATISTICS MODULE: Multiple Regression Consider the VIF (Variance Inflation Factor). VIF = 1/(1-R2)…where the R2 value here is the value when the predictor in question is set as the dependent variable. For example, if the VIF = 10, then the respective R2 would be 90%. This would mean that 90% of the variance in the predictor in question can be explained by the other independent variables. Because so much of the variance is captured elsewhere, removing the predictor in question should not cause a substantive decrease in overall R2. The rule of thumb is to remove variables with VIF scores greater than 10. 7

STATISTICS MODULE: Multiple Regression What is an “ingredient” variable? If the dependent variable is comprised of one of the predictor variables (or vice versa), the results are not reliable. One or both of the following will happen: You will generate an incredibly high R2 value The predictor in question will have a DOMINATING t-statistic 8

STATISTICS MODULE: Multiple Regression What are the different selection methods and what are the differences? “All In” “Forward” “Backward” “Stepwise” Model Parsimony = less is more. You are better off with an R2 of .75 and 3 predictors than with an R2 of .80 and 10 predictors. 9

STATISTICS MODULE: Multiple Regression mod1<-lm(y~x1+x2+x3+x4+x5+x6, data=data) summary(mod1) confint(mod1, level=0.99) #this will generate the conf intervals around the beta coefficients vif(mod1) #this will generate the variance inflation factor values ncvTest(mod1) #this will test if the distribution of the predictions is consistent with what is expected (we want this to “fail” with a high p-value) test<-step(mod1, direction="backward", trace=TRUE) #this will execute a selection method to exclude any non-significant predictors 10