Simple and Multiple Regression

Simple and Multiple Regression

2.1 Simple Linear Regression
Let's examine the relationship between the size of school and academic performance to see if the size of the school is related to academic performance. For this example, api00 is the dependent variable and enroll is the predictor.

Dependent variable Independent variable
api00/academic performance of the school Independent variable Enroll/number of students

F-test: R-squared T-test Coefficient
44.83 which means that the model is statistically significant. R-squared approximately 10% of the variance of api00 is accounted for by the model, in this case, enroll. T-test for enroll equals -6.70, and is statistically significant, meaning that the regression coefficient for enroll is significantly different from zero. Coefficient for enroll is , or approximately -.2, meaning that for a one unit increase in enroll, we would expect a .2-unit decrease in api00.

Predicted Value After you run a regression, you can create a variable that contains the predicted values using the predict command. For this example, our new variable name will be fv

Below we can show a scatterplot of the outcome variable, api00 and the predictor, enroll.

We can combine scatter with lfit to show a scatterplot with fitted values.

If you use the mlabel (snum) option on the scatter command, you can see the school number for each point. This allows us to see, for example, that one of the outliers is school 2910.

2. 2 Multiple Regression Dependent variable Independent variable
api00/academic performance of the school Independent variable ell/english language learners meals/pct free meals yr_rnd/year round school mobility/pct 1st year in school

Independent variable acs_k3/avg class size k-3
acs_46/avg class size 4-6 full/pct full credential emer/pct emer credential enroll/number of students

F statistics R-square, Adjusted R-square T values Coefficients

But how to compare the relative importance of coefficients?
Regress with beta command

Let us compare the regress output with the listcoef output
Let us compare the regress output with the listcoef output. You will notice that the values listed in the Coef., t, and P>|t| values are the same in the two outputs. The bStdX column gives the unit change in Y expected with a one standard deviation change in X. The bStdY column gives the standard deviation change in Y expected with a one unit change in X. The SDofX column gives that standard deviation of each predictor variable in the model.

2. 3 Hypothesis Testing Single coefficient Mutiple coefficients

Correlation As part of doing a multiple regression analysis you might be interested in seeing the correlations among the variables in the regression model. You can use correlate command as shown below. You can also use pwcorr handle missing values options: sig

2.4 Examine Distribution Assumption
Classical regression assumption requires that the outcome (dependent) to be normally distributed. In large sample, this assumption is not that important because of Central Limit Theory In small sample, however, the distribution assumption could be relevant We will investigate issues concerning normality.

Here we check the normality of enroll We start with making some graphs
Hisgram Kdesnity

We can use the normal option to superimpose a normal curve on this graph and the bin(20) option to use 20 bins. The distribution looks skewed to the right.

An alternative to histograms is the kernel density plot, which approximates the probability density of the variable. Kernel density plots have the advantage of being smooth and of being independent of the choice of origin, unlike histograms. Stata implements kernel density plots with the kdensity command.

Having concluded that enroll is not normally distributed, how should we address this problem?
We may try to transform enroll to make it more normally distributed. Potential transformations include taking the log, the square root or raising the variable to a power. Stata includes the ladder and gladder commands to help selecting the right transformation. Ladder reports numeric results and gladder produces a graphic display.

This indicates that the log transformation would help to make enroll more normally distributed.
Let's use the generate command with the log function to create the variable lenroll which will be the log of enroll. Note that log in Stata will give you the natural log, not log base 10. To get log base 10, type log10(var)

2. 5 Summary Simple Regression Multiple Regression Hypothesis Testing
Examine the normality assumption

Quiz I Make graphs of api99: histogram, kdensity plot
What is the correlation between api99 and meals? Regress api99 on meals. Create and list the fitted (predicted) values. Graph meals and api99 with and without the regression line.

Quiz II Look at the correlations among the variables api99 meals ell avg_ed using the corr and pwcorr commands. Perform a regression predicting api99 from meals and ell. Interpret the output.

Simple and Multiple Regression

Similar presentations

Presentation on theme: "Simple and Multiple Regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simple and Multiple Regression

Similar presentations

Presentation on theme: "Simple and Multiple Regression"— Presentation transcript:

Similar presentations

About project

Feedback