Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.

Shonda Kuiper Grinnell College

Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory variable. Response variable measures the outcome of a study. Explanatory variable explain changes in the response variable.

Each variable can be classified as either categorical or quantitative. Categorical Quantitative Chi-Square test Two proportion test Two-sample t-test ANOVA Logistic Regression Regression Categorical data place individuals into one of several groups (such as red/blue/white, male/female or yes/no). Quantitative data consists of numerical values for which most arithmetic operations make sense.

=+ 7080-10 82802 908010 78=80+-2 7585-10 85 0 958510 85 0 where i =1,2 j = 1,2,3,4

The theoretical model used in the two-sample t-test is designed to account for these two group means ( µ 1 and µ 2 ) and random error. observed mean random value response error = + where i =1,2 j = 1,2,3,4 where i =1,2 j = 1,2,3,4

7082.5-2.5-10 8282.5-2.52 9082.5-2.510 78=82.5+-2.5+-2 7582.52.5-10 8582.52.50 9582.52.510 8582.52.50 where i = 1,2 and j = 1,2,3,4

+ observed mean random value response error = + where i =1,2 j = 1,2,3,4

observed mean random value response error = + where i =1,2 j = 1,2,3,4 where i = 1,2, …, 8

70800-10 828002 9080010 78=80+0+-2 75805-10 858050 9580510 858050 where i = 1,2,…,8

80 0 0 0 = +0 85805 85805 85805 85805 where i = 1,2,…,8

When there are only two groups (and we have the same assumptions), all three models are algebraically equivalent. where i =1,2 j = 1,2,3,4 where i =1,2 j = 1,2,3,4 where i = 1,2, …, 8

Multiple regression analysis can be used to serve different goals. The goals will influence the type of analysis that is conducted. The most common goals of multiple regression are to: Describe: A model may be developed to describe the relationship between multiple explanatory variables and the response variable. Predict: A regression model may be used to generalize to observations outside the sample. Confirm: Theories are often developed about which variables or combination of variables should be included in a model. Hypothesis tests can be used to evaluate the relationship between the explanatory variables and the response.

Build a multiple regression model to predict retail price of cars Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004) Questions:  What happens to Price as Mileage increases?

Build a multiple regression model to predict retail price of cars Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = -0.22 is small can we conclude it is unimportant?

Build a multiple regression model to predict retail price of cars Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = -0.22 is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?

Build a multiple regression model to predict retail price of cars Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = -0.22 is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?  Does mileage help you predict price? What does the R-Sq value tell you?

Build a multiple regression model to predict retail price of cars Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004) Questions:  What happens to Price as Mileage increases?  Since b 1 = -0.22 is small can we conclude it is unimportant?  Does mileage help you predict price? What does the p-value tell you?  Does mileage help you predict price? What does the R-Sq value tell you?  Are there outliers or influential observations?

What happens when all the points fall on the regression line? 0

What happens when the regression line does not help us estimate Y?

R 2 adj includes a penalty when more terms are included in the model. n is the sample size and p is the number of coefficients (including the constant term β 0, β 1, β 2, β 3,…, β p-1 ) When many terms are in the model: p is larger R 2 adj is smaller (n – 1)/(n-p) is larger

Price = 35738 – 0.22 Mileage R-Sq: 4.1% Slope coefficient (b1): t = -2.95 (p-value = 0.004)

Build a multiple regression model to predict retail price of cars R 2 = 2%

Build a multiple regression model to predict retail price of cars R 2 = 2% Mileage Cylinder Liter Leather Cruise Doors Sound

Build a multiple regression model to predict retail price of cars R 2 = 2% Mileage Cylinder Liter Leather Cruise Doors Sound Price = 6759 + 6289Cruise + 3792Cyl -1543Doors + 3349Leather - 787Liter -0.17Mileage - 1994Sound R 2 = 44.6%

Step Forward Regression (Forward Selection): Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56% Price = -17.06 + 4054.2CylR 2 = 32.39%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56% Price = -17.06 + 4054.2CylR 2 = 32.39% Price = 24764.6 – 0.17MileageR 2 = 2.04%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56% Price = -17.06 + 4054.2CylR 2 = 32.39% Price = 24764.6 – 0.17MileageR 2 = 2.04% Price = 6185.8.6 + 4990.4LiterR 2 = 31.15%

Step Forward Regression: Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56% Price = -17.06 + 4054.2CylR 2 = 32.39% Price = 24764.6 – 0.17MileageR 2 = 2.04% Price = 6185.8.6 + 4990.4LiterR 2 = 31.15% Price = 23130.1 – 2631.4SoundR 2 = 1.55% Price = 18828.8 + 3473.46LeatherR 2 = 2.47% Price = 27033.6 -1613.2DoorsR 2 = 1.93%

Step Forward Regression: Which combination of two terms best predicts Price? Price = - 17.06 + 4054.2CylR 2 = 32.39% Price = -1046.4 + 3392.6Cyl + 6000.4CruiseR 2 = 38.4% (38.2%)

Step Forward Regression: Which combination of two terms best predicts Price? Price = - 17.06 + 4054.2CylR 2 = 32.39% Price = 3145.8 + 4027.6Cyl – 0.152MileageR 2 = 34% (33.8)

Step Forward Regression: Which combination of two terms best predicts Price? Price = -17.06 + 4054.2CylR 2 = 32.39% Price = 1372.4 + 2976.4Cyl + 1412.2LiterR 2 = 32.6% (32.4%)

Step Forward Regression: Which combination of terms best predicts Price? Price = -17.06 + 4054.2CylR 2 = 32.39% Price = -1046.4 + 3393Cyl + 6000.4CruiseR 2 = 38.4% (38.2%) Price = -2978.4 + 3276Cyl +6362Cruise + 3139Leather R 2 = 40.4% (40.2%) Price = 412.6 + 3233Cyl +6492Cruise + 3162Leather -0.17Mileage R 2 = 42.3% (42%) Price = 5530.3 + 3258Cyl +6320Cruise + 2979Leather -0.17Mileage – 1402Doors R 2 = 43.7% (43.3%) Price = 7323.2 + 3200Cyl + 6206Cruise + 3327Leather -0.17Mileage – 1463Doors – 2024Sound R 2 = 44.6% (44.15%) Price = 6759 + 3792Cyl + 6289Cruise + 3349Leather -787Liter -0.17Mileage -1543Doors - 1994Sound R 2 = 44.6% (44.14%)

Step Forward Regression: Which single explanatory variable best predicts Price? Price = 13921.9 + 9862.3CruiseR 2 = 18.56% Price = -17.06 + 4054.2CylR 2 = 32.39% Price = 24764.6 – 0.17MileageR 2 = 2.04% Price = 6185.8.6 + 4990.4LiterR 2 = 31.15% Price = 23130.1 – 2631.4SoundR 2 = 1.55% Price = 18828.8 + 3473.46LeatherR 2 = 2.47% Price = 27033.6 -1613.2DoorsR 2 = 1.93%

Step Backward Regression (Backward Elimination): Price = 7323.2 + 3200Cyl + 6206Cruise + 3327Leather -0.17Mileage – 1463Doors – 2024Sound R 2 = 44.6% (44.15%) Price = 6759 + 3792Cyl + 6289Cruise + 3349Leather -787Liter -0.17Mileage -1543Doors - 1994Sound R 2 = 44.6% (44.14%) Other techniques, such as Akaike information criterion, Bayesian information criterion, Mallows’ Cp, are often used to find the best model. Bidirectional stepwise procedures

Best Subsets Regression: Here we see that Liter is the second best single predictor of price.

Important Cautions: Stepwise regression techniques can often ignore very important explanatory variables. Best subsets is often preferable. Both best subsets and stepwise regression methods only consider linear relationships between the response and explanatory variables. Residual graphs are still essential in validating whether the model is appropriate. Transformations, interactions and quadratic terms can often improve the model. Whenever these iterative variable selections techniques are used, the p-values corresponding to the significance of each individual coefficient are not reliable.

Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.

Similar presentations

Presentation on theme: "Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.

Similar presentations

Presentation on theme: "Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory."— Presentation transcript:

Similar presentations

About project

Feedback