The Basics of Regression continued

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Regression and correlation methods
Lesson 10: Linear Regression and Correlation
Here we add more independent variables to the regression.
Objectives 10.1 Simple linear regression
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
1 The Basics of Regression. 2 Remember back in your prior school daze some algebra? You might recall the equation for a line as being y = mx + b. Or maybe.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Chapter 10 Simple Regression.
1 More Regression Information. 2 3 On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing.
The Simple Regression Model
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
1 The Basics of Regression Regression is a statistical technique that can ultimately be used for forecasting.
SIMPLE LINEAR REGRESSION
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
1 Relationships We have examined how to measure relationships between two categorical variables (chi-square) one categorical variable and one measurement.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Correlation & Regression
Chapter 8: Bivariate Regression and Correlation
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Linear Regression and Correlation
Correlation and Linear Regression
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
STA291 Statistical Methods Lecture 27. Inference for Regression.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Simple Linear Regression Model: Specification and Estimation  Theory suggests many relationships between variables  These relationships suggest that.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
The simple linear regression model and parameter estimation
Regression and Correlation
Chapter 4 Basic Estimation Techniques
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
SIMPLE LINEAR REGRESSION
Regression & Correlation (1)
Presentation transcript:

The Basics of Regression continued

Overview The text uses one example to guide you, but I will use a different example. Remember we use statistics to try to understand more about a variable in the world. Let’s focus on income as the variable of interest. Obviously, not everyone has the same income. We might want to understand why. Another thing we might do is try to predict someone’s income. The easiest way to predict a person’s income is just take the average income of the group. But regression techniques are an attempt to improve on prediction over just picking the average. It is thought that by including another variable in the study we can improve on prediction.

graph y - income note here in the scatterplot that the higher the schooling, the higher the income. Thus, knowing schooling will permit better prediction of income. y x - years of schooling In a graph we put the variable of interest on the y axis. Here it is thought that knowing the years of schooling for a person will better help us understand income and schooling is put on the x axis. In other words, certain values of income are ‘matched’ with schooling amounts.

graph y - income regression line y x - years of schooling A point of interest in regression is to come up with the mathematical formula for a line that best describes the data points. The line would then be used to make predictions about the y values given x values.

Math form It is thought that in the population the variable x and y are related in the following general form: y = B0 + B1 x + e, where e is an error term that captures all those influences on y not picked up by x, B0 is the y intercept of the line, and B1 is the slope of the line. When we have a sample of data from a population we will say in general the regression line is estimated to be ^ y = b0 + b1 x, where the ‘hats’ refer to estimated values.

ordinary least squares The typical method used to pick the line through the data is called the ordinary least squares line. This method is the one that minimizes the sum of squared deviations of the data points to the line. The line has desirable properties(not proven here): 1) It is unbiased - if many samples were taken, the average of the intercepts and slopes from the samples would be the population intercept and slope. 2) It is consistent - ‘large’ samples would give the population intercept and slope as well.

confidence interval When we have a sample from a population and we use OLS to get the slope, we know that value is dependant on the sample. If we had a different sample the slope estimate would be different. So instead of using a point estimate of the population slope we often use a confidence interval. A 95% confidence interval would mean we could be 95% confident the true unknown slope is in this interval. We form the interval by ^ b1 - (1.96 times the standard error of b1) and b1 + (1.96 times the standard error of b1)

microsoft excel Many computer programs will give the standard error of the slope estimate and/or give the confidence interval as well. Note if the confidence interval includes the value 0, then this is a sign the x variable really does not help us understand the y variable.

hypothesis test about the slope In regression analysis if the slope value is zero we know the x variable doesn’t help us understand the y variable. So in our test of hypothesis we assume the slope is zero. If we reject this null hypothesis then we can conclude that the x variable does help us understand the y variable. The slope estimate does not have a normal distribution, but has a t distribution, which is close to the normal. We would use the t distribution to test the null hypothesis.

p value Microsoft Excel gives a p-value for the slope estimate. This value is the probability of getting this slope estimate, or a more extreme one under the assumption the true population slope is zero. The logic here is 1) values far from 0 are less likely to occur or have low p-values, 2) we arbitrarily choose a p-value of .05 as a cut-off value. This means if we get a p-value of less than .05 for the slope estimate then we will say we reject the null of no influence because in one sample it doesn’t seem right that we obtained a low probability value in our one sample.

Goodness of fit R2 When we look at the data points and their relationship to the line we talk about how good is the fit of the line. R2 is a numerical summary of the goodness of fit. Its value has a range of 0 to 1, being closer to 1 being the better fit. In fact R2 has the interpretation of indicating the % of variation in the y variable that is explained by the x variable.

Forecasting Once we have the regression estimate and we have done a hypothesis test to be confident x does help in explaining y, then we may want to forecast values of y given some x values. Say we have income = -1.75 + 1.75years of schooling. Then if years of schooling is 16 income is predicted to be –1.75 + 1.75(16) = 26.25 Note here that the intercept of the line is minus 1.75 and the slope is 1.75. This is only a coincidence.