Presenter: Georgi Nalbantov

Presenter: Georgi Nalbantov
Summer Course: Data Mining Regression Analysis Presenter: Georgi Nalbantov August 2009

Structure Regression analysis: definition and examples
Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted)

Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process
U.M.Fayyad, G.Patetsky-Shapiro and P.Smyth (1995)

Common Data Mining tasks
Clustering Classification Regression + X 2 X 2 + + + + + + + + + + + - + + + + + + - + + + - - + + + + + + + + - + - + X 1 X 1 X 1 k-th Nearest Neighbour Parzen Window Unfolding, Conjoint Analysis, Cat-PCA Linear Discriminant Analysis, QDA Logistic Regression (Logit) Decision Trees, LSSVM, NN, VS Classical Linear Regression Ridge Regression NN, CART

Linear regression analysis: examples

The Regression task Given: ( x1, y1 ), … , ( xm , ym )  n X  1
Given data on m explanatory variables and 1 explained variable, where the explained variable can take real values in 1, find a function that gives the “best” fit: Given: ( x1, y1 ), … , ( xm , ym )  n X  1 Find:  : n   1 “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is minimal

Classical Linear Regression (OLS)
Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model: b1 > 0  Positive Association b1 < 0  Negative Association b1 = 0  No Association

b0  Mean response when x=0 (y-intercept) b1  Change in mean response when x increases by 1 unit (slope) b0, b1 are unknown parameters (like m) b0+b1x  Mean response when explanatory variable takes on the value x Task: Minimize the sum of squared errors:

Parameter: Slope in the population model (b1) Estimator: Least squares estimate: Estimated standard error: Methods of making inference regarding population: Hypothesis tests (2-sided or 1-sided) Confidence Intervals x1 y

Coefficient of determination (r2) : proportion of variation in y “explained” by the regression on x. where

Classical Linear Regression (OLS): Multiple regression
Numeric Response variable (y) p Numeric predictor variables Model: Y = b0 + b1x1 +  + bpxp + e Partial Regression Coefficients: bi  effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant

Classical Linear Regression (OLS): Ordinary Least Squares estimation
Population Model for mean response: Least Squares Fitted (predicted) equation, minimizing SSE:

Classical Linear Regression (OLS): Ordinary Least Squares estimation
Model: OLS estimation: LASSO estimation: Ridge regression estimation:

LASSO and Ridge estimation of model coefficients
sum(|beta|) sum(|beta|)

Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers

Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers How to Choose k or h? When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h.

Linear Support Vector Regression
Expenditures Age ● middle-sized area Expenditures Age ● small area biggest area ● ● ● ● Expenditures ● ● ● “Support vectors” Age “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalisation) “Lazy case” (underfitting) The thinner the “tube”, the more complex the model

Nonlinear Support Vector Regression
Map the data into a higher-dimensional space: Expenditures Age ●

Nonlinear Support Vector Regression: Technicalities
The SVR function: To find the unknown parameters of the SVR function, solve: Subject to: How to choose , , = RBF kernel: Find , , , and from a cross-validation procedure

SVR Technicalities: Model Selection
Do 5-fold cross-validation to find and for several fixed values of .

SVR Study : Model Training, Selection and Prediction
CVMSE (IR*, HR*, CR*) True returns (red) and raw predictions (blue) CVMSE (IR*, HR*, CR*)

SVR: Individual Effects

SVR Technicalities: SVR vs. OLS
Performance on the test set Performance on the test set SVR MSE= 0.04 OLS MSE= 0.23

Technical Note: Number of Training Errors vs. Model Complexity
Min. number of training errors, Model complexity test errors training errors complexity Functions ordered in increasing complexity Best trade-off MATLAB video here…

Variable selection for regression
Akaike Information Criterion (AIC). Final prediction error:

Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final prediction error: BIC tends to choose simpler models than AIC.

R^2-adjusted:

Conclusion / Summary / References
Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted) (any introductory statistical/econometric book) , Bishop, 2006 Alpaydin, 2004, Hastie et. el., 2001 Smola and Schoelkopf, 2003 Hastie et. el., 2001, (any statistical/econometric book)

Presenter: Georgi Nalbantov

Similar presentations

Presentation on theme: "Presenter: Georgi Nalbantov"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Georgi Nalbantov

Similar presentations

Presentation on theme: "Presenter: Georgi Nalbantov"— Presentation transcript:

Similar presentations

About project

Feedback