Download presentation
Presentation is loading. Please wait.
1
Regression III
2
The regression model has both constants (, b) and variables (X, Y)
The “fit” of the regression equation to the data is numerically expressed by the r2 statistic. The b for each indep var can be tested for statistical significance using a t test. The overall model is tested for statistical significance using the F ratio.
3
Assumptions of the regression model
Like all statistics, the regression model has a number of underlying assumptions. T-test assumes a t distribution z scores assumes data is normally distributed We will discuss some of the more common ones.
4
Multicollinearity When 2 or more independent variables in the model are highly correlated with one another. Result: bias in the partial regression coefficients Test: by correlating each variable with the others Fix: drop all but one of highly correlated variables or combine into a single variable
5
“Dummy” variables Regression analysis assumes the use of continuous, interval level data Two types dichotomous variables (two possible states) polychotomous variables may be nominal or ordinal
6
Dummy variables Dichotomous variable
male/female; Republican/Democrat yes/no Essentially, a case has or does not have a particular characteristic Example: last week’s regression model predicting entry GS grade field of education veterans’ preference minority female
7
Polychotomous variables - a number of possible states
often, sometimes, rarely, never region of country (South, Midwest, East, West) When using exclude one of the categories Include three 0/1 variables; eliminate one category the excluded variable becomes the reference category
8
Autocorrelation A nonrandom relationship among a variable’s values at different time periods consistent patterns such as seasonal data Often found in time series data
9
Autocorrelation Result: biased t-ratios, confidence limits, and hypotheses tests Test: plot the residuals - look for distinctive patterns Fix: introduce another independent variable that explains some of the unexplained variance more commonly: use a statistical model other than OLS
10
Nonlinear relationships
OLS assumes a linear relationship (remember the straight line we drew based on the regression equation?) Some of out data does not provide a linear relationship economic data population data data with built-in growth factor
11
Nonlinear relationships
We test for this using a scatterplot. Does the relationship appear linear? Fix: transform one of the variables
12
Nonlinear relationship
13
Nonlinear relationship
14
Heteroskedasticity When the effect of X on Y is not equal across all ranges of Y Result: affects size of standard error, thus biasing hypothesis test results.
15
Outliers Extreme values Problem: can bias the regression parameters
when a particular (or number of them) don’t seem to fit in with the other data. Problem: can bias the regression parameters
16
Outliers (Hong Kong and Singapore)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.