Simple and Multiple Regression

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Quantitative Techniques
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics for Managers Using Microsoft® Excel 5th Edition
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Correlation and Regression Analysis
Assumption of Homoscedasticity
Simple Linear Regression Analysis
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Data Analysis.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
CHAPTER 12 More About Regression
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Review 1. Describing variables.
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Correlation and Simple Linear Regression
Regression Analysis Simple Linear Regression
CHAPTER 12 More About Regression
Chapter 12: Regression Diagnostics
Elementary Statistics
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
The Practice of Statistics in the Life Sciences Fourth Edition
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Undergraduated Econometrics
The greatest blessing in life is
Chapter 4, Regression Diagnostics Detection of Model Violation
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
Product moment correlation
Regression Forecasting and Model Building
Indicator Variables Response: Highway MPG
15.1 The Role of Statistics in the Research Process
CHAPTER 12 More About Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple and Multiple Regression

2.1 Simple Linear Regression Let's examine the relationship between the size of school and academic performance to see if the size of the school is related to academic performance.  For this example, api00 is the dependent variable and enroll is the predictor.

Dependent variable Independent variable api00/academic performance of the school Independent variable Enroll/number of students

F-test: R-squared T-test Coefficient 44.83 which means that the model is statistically significant. R-squared approximately 10% of the variance of api00 is accounted for by the model, in this case, enroll. T-test for enroll equals -6.70, and is statistically significant, meaning that the regression coefficient for enroll is significantly different from zero. Coefficient for enroll is -.1998674, or approximately -.2, meaning that for a one unit increase in enroll, we would expect a .2-unit decrease in api00. 

Predicted Value After you run a regression, you can create a variable that contains the predicted values using the predict command.  For this example, our new variable name will be fv

Below we can show a scatterplot of the outcome variable, api00 and the predictor, enroll.

We can combine scatter with lfit to show a scatterplot with fitted values.

If you use the mlabel (snum) option on the scatter command, you can see the school number for each point.  This allows us to see, for example, that one of the outliers is school 2910.

2. 2 Multiple Regression Dependent variable Independent variable api00/academic performance of the school Independent variable ell/english language learners meals/pct free meals yr_rnd/year round school mobility/pct 1st year in school

Independent variable acs_k3/avg class size k-3 acs_46/avg class size 4-6 full/pct full credential emer/pct emer credential enroll/number of students

F statistics R-square, Adjusted R-square T values Coefficients

But how to compare the relative importance of coefficients? Regress with beta command

Let us compare the regress output with the listcoef output Let us compare the regress output with the listcoef output. You will notice that the values listed in the Coef., t, and P>|t| values are the same in the two outputs. The bStdX column gives the unit change in Y expected with a one standard deviation change in X. The bStdY column gives the standard deviation change in Y expected with a one unit change in X. The SDofX column gives that standard deviation of each predictor variable in the model.

2. 3 Hypothesis Testing Single coefficient Mutiple coefficients

Correlation As part of doing a multiple regression analysis you might be interested in seeing the correlations among the variables in the regression model.  You can use correlate command as shown below. You can also use pwcorr handle missing values options: sig

2.4 Examine Distribution Assumption Classical regression assumption requires that the outcome (dependent) to be normally distributed. In large sample, this assumption is not that important because of Central Limit Theory In small sample, however, the distribution assumption could be relevant We will investigate issues concerning normality.

Here we check the normality of enroll We start with making some graphs Hisgram Kdesnity

We can use the normal option to superimpose a normal curve on this graph and the bin(20) option to use 20 bins.  The distribution looks skewed to the right.

An alternative to histograms is the kernel density plot, which approximates the probability density of the variable. Kernel density plots have the advantage of being smooth and of being independent of the choice of origin, unlike histograms. Stata implements kernel density plots with the kdensity command.

Having concluded that enroll is not normally distributed, how should we address this problem?  We may try to transform enroll to make it more normally distributed.  Potential transformations include taking the log, the square root or raising the variable to a power. Stata includes the ladder and gladder commands to help selecting the right transformation. Ladder reports numeric results and gladder produces a graphic display.

This indicates that the log transformation would help to make enroll more normally distributed.  Let's use the generate command with the log function to create the variable lenroll which will be the log of enroll. Note that log in Stata will give you the natural log, not log base 10. To get log base 10, type log10(var)

2. 5 Summary Simple Regression Multiple Regression Hypothesis Testing Examine the normality assumption

Quiz I Make graphs of api99: histogram, kdensity plot What is the correlation between api99 and meals?  Regress api99 on meals. Create and list the fitted (predicted) values. Graph meals and api99 with and without the regression line.

Quiz II Look at the correlations among the variables api99 meals ell avg_ed using the corr and pwcorr commands. Perform a regression predicting api99 from meals and ell. Interpret the output.