Download presentation
Presentation is loading. Please wait.
1
Presenter: Georgi Nalbantov
Summer Course: Data Mining Regression Analysis Presenter: Georgi Nalbantov August 2009
2
Structure Regression analysis: definition and examples
Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted)
3
Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process
U.M.Fayyad, G.Patetsky-Shapiro and P.Smyth (1995)
4
Common Data Mining tasks
Clustering Classification Regression + X 2 X 2 + + + + + + + + + + + - + + + + + + - + + + - - + + + + + + + + - + - + X 1 X 1 X 1 k-th Nearest Neighbour Parzen Window Unfolding, Conjoint Analysis, Cat-PCA Linear Discriminant Analysis, QDA Logistic Regression (Logit) Decision Trees, LSSVM, NN, VS Classical Linear Regression Ridge Regression NN, CART
5
Linear regression analysis: examples
6
Linear regression analysis: examples
7
The Regression task Given: ( x1, y1 ), … , ( xm , ym ) n X 1
Given data on m explanatory variables and 1 explained variable, where the explained variable can take real values in 1, find a function that gives the “best” fit: Given: ( x1, y1 ), … , ( xm , ym ) n X 1 Find: : n 1 “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is minimal
8
Classical Linear Regression (OLS)
Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model: b1 > 0 Positive Association b1 < 0 Negative Association b1 = 0 No Association
9
Classical Linear Regression (OLS)
b0 Mean response when x=0 (y-intercept) b1 Change in mean response when x increases by 1 unit (slope) b0, b1 are unknown parameters (like m) b0+b1x Mean response when explanatory variable takes on the value x Task: Minimize the sum of squared errors:
10
Classical Linear Regression (OLS)
Parameter: Slope in the population model (b1) Estimator: Least squares estimate: Estimated standard error: Methods of making inference regarding population: Hypothesis tests (2-sided or 1-sided) Confidence Intervals x1 y
11
Classical Linear Regression (OLS)
12
Classical Linear Regression (OLS)
13
Classical Linear Regression (OLS)
Coefficient of determination (r2) : proportion of variation in y “explained” by the regression on x. where
14
Classical Linear Regression (OLS): Multiple regression
Numeric Response variable (y) p Numeric predictor variables Model: Y = b0 + b1x1 + + bpxp + e Partial Regression Coefficients: bi effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant
15
Classical Linear Regression (OLS): Ordinary Least Squares estimation
Population Model for mean response: Least Squares Fitted (predicted) equation, minimizing SSE:
16
Classical Linear Regression (OLS): Ordinary Least Squares estimation
Model: OLS estimation: LASSO estimation: Ridge regression estimation:
17
LASSO and Ridge estimation of model coefficients
sum(|beta|) sum(|beta|)
18
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
19
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
20
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers
21
Nonparametric (local) regression estimation:
k-NN, Decision trees, smoothers How to Choose k or h? When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h.
22
Linear Support Vector Regression
Expenditures Age ● middle-sized area Expenditures Age ● small area biggest area ● ● ● ● Expenditures ● ● ● “Support vectors” Age “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalisation) “Lazy case” (underfitting) The thinner the “tube”, the more complex the model
23
Nonlinear Support Vector Regression
Map the data into a higher-dimensional space: Expenditures Age ●
24
Nonlinear Support Vector Regression
Map the data into a higher-dimensional space: Expenditures Age ●
25
Nonlinear Support Vector Regression: Technicalities
The SVR function: To find the unknown parameters of the SVR function, solve: Subject to: How to choose , , = RBF kernel: Find , , , and from a cross-validation procedure
26
SVR Technicalities: Model Selection
Do 5-fold cross-validation to find and for several fixed values of .
27
SVR Study : Model Training, Selection and Prediction
CVMSE (IR*, HR*, CR*) True returns (red) and raw predictions (blue) CVMSE (IR*, HR*, CR*)
28
SVR: Individual Effects
29
SVR Technicalities: SVR vs. OLS
Performance on the test set Performance on the test set SVR MSE= 0.04 OLS MSE= 0.23
30
Technical Note: Number of Training Errors vs. Model Complexity
Min. number of training errors, Model complexity test errors training errors complexity Functions ordered in increasing complexity Best trade-off MATLAB video here…
31
Variable selection for regression
Akaike Information Criterion (AIC). Final prediction error:
32
Variable selection for regression
Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final prediction error: BIC tends to choose simpler models than AIC.
33
Variable selection for regression
R^2-adjusted:
34
Conclusion / Summary / References
Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted) (any introductory statistical/econometric book) , Bishop, 2006 Alpaydin, 2004, Hastie et. el., 2001 Smola and Schoelkopf, 2003 Hastie et. el., 2001, (any statistical/econometric book)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.