EDUC 200C Section 3 October 12, 2012. Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.

Slides:



Advertisements
Similar presentations
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Advertisements

Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Heteroskedasticity The Problem:
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Regression Example Using Pop Quiz Data. Second Pop Quiz At my former school (Irvine), I gave a “pop quiz” to my econometrics students. The quiz consisted.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
POD 09/19/ B #5P a)Describe the relationship between speed and pulse as shown in the scatterplot to the right. b)The correlation coefficient, r,
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
Regression and Residual Plots
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Presentation transcript:

EDUC 200C Section 3 October 12, 2012

Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict a student’s score Talk more about regression Import data set into Stata Use Stata to come up with regression formula and use that to predict a student’s score Scatterplot in Stata Introduce concept of Standard Error

Correlation prediction formula z y ’ = r xy z x Simple formula Easy to use But must use z-scores

High School and Beyond data (Data found in Coursework) Open data, look at it, get a sense about it. Choose two variables (let’s do RDG and MATH) Scatterplot Calculate z-scores Calculate r Write prediction formula Use the formula to predict one z-score, given another.

Regression In regression, you don’t need z-scores. You can remain in your original data. Use to predict how one variable changes in response to another variable. In the high school and beyond data, we examine what we might expect a student’s math score to be given that we know the student’s reading score. Computers and mathematical formulas help us calculate a regression formula.

We calculate regression lines by minimizing the total squared difference between the line representing our prediction and the actual data.

Do you remember y = mx+b? This is the slope-intercept equation for line. m is slope b is y-intercept The regression line is given in this format. We’re given the slope of the line and the y-intercept. Y’ = b YX X + a YX What does each part mean?

A little Stata…. Reg Y X Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] X | _cons |

A little Stata…. Reg Y X Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] X | _cons | Regression slope Regression intercept

Regression line notation and formulas Y’ = b YX X + a YX Regression line slope: b YX = r YX (σ y / σ x ) Regression line intercept: a YX = Y - b YX X

Stata activity Import High School and Beyond data into Stata For fun, run correlation on Reading and Math: Corr rdg math (Isn’t it so much easier in Stata!?!) Run regression: Regress math rdg Write out regression line and interpret what it means. Create Scatterplot: graph twoway (scatter math rdg) (lfit math rdg)

How do we know if we’ve explained the data well? We want, for example, average SAT score to tell us a lot about a school’s graduation rate—how do we know if it does? We look at the standard error. Standard error is the same as standard deviation except that we look at deviation from the prediction rather than deviation from the mean

How do we interpret standard error? In all cases, the closer the standard error is to zero, the better our predictions are. ( What is this again? And why do we want it to be small? What units is it in?)

More Stata…. Reg Y X Source | SS df MS Number of obs = F( 1, 48) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = Y | Coef. Std. Err. t P>|t| [95% Conf. Interval] X | _cons | Note that these values have bias corrections that make them more like s than σ Standard error, s Y’

Questions?