Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Methods.

Similar presentations


Presentation on theme: "Regression Methods."— Presentation transcript:

1 Regression Methods

2 Linear Regression Simple linear regression (one predictor)
Multiple linear regression (multiple predictors) Ordinary Least Squares estimation Computed directly from the data Lasso regression selects features by setting parameters to 0

3 Coefficient of Determination
Indicates how well a model fits the data R2 (R squared) R2 = 1−SSres/SStot SSres = Σ(yi−fi)2 difference between actual and predicted SStot = Σ(yi−y)2 difference between actual and horizontal line Between 0 and 1, if least squares model. Bigger range if other models are used Explained variance what percentage of the variance is explained by the model Linear least squares regression: R2 = r2

4 R Squared visual interpretation of R2 SStot SSres
Source Wikipedia CC BY-SA 3.0 SStot SSres

5 Regression Trees Regression variant of decision tree
Top-down induction 2 options: Constant value in leaf (piecewise constant) regression trees Local linear model in leaf (piecewise linear) model trees

6 M5 algorithm (Quinlan, Wang)
M5’, M5P in Weka (classifiers > trees > M5P) Offers both regression trees and model trees Model trees are default -R option (buildRegressionTree) for piecewise constant

7 M5 algorithm (Quinlan, Wang)
Splitting criterion: Standard Deviation Reduction SDR = sd(T) – Σ sd(Ti)|Ti|/|T| Stopping criterion: Standard deviation below some threshold (0.05sd(D)) Too few examples in node (e.g. ≤ 4) Pruning (bottom-up): Estimate error: (n+v)/(n−v)×absolute error in node n is examples in node, v is parameters in the model

8 Binary Splits All splits are binary Numeric as normal (in C4.5)
Nominal: order all values according to average (prior to induction) introduce k-1 indicator variables in this order Example: database of skiing slopes avg(color = green) = 2.5% avg(color = blue) = 3.2% avg(color = red) = 7.7% avg(color = black) = 13.5% binary features: Green, GreenBlue, GreenBlueRed,

9 Regression tree on Servo dataset (UCI)

10 Model tree on Servo dataset (UCI)
LM1: * motor=B,A * screw=B,A * screw=A * pgain=4,3 * pgain=3 − * vgain=1,2 smaller tree because more expressivity in leafs

11 Regression in Cortana Regression a natural setting in Subgroup Discovery Local models, no prediction model Subgroups are piecewise constant subsets h = 3100 h = 2200

12 Subgroup Discover: regression
A subgroup is a step-function (inside subgroup vs. outside) R2 of step function is an interesting quality measure (next to z-score) available in Cortana as Explained Variance

13 Other regression models
Functions LinearRegression MultiLayerPerceptron (artificial neural network) SMOreg (Support Vector Machine) Lazy IBK (k-Nearest Neigbors) Rules M5Rule (decision list)

14 Approximating a smooth function
Experiment: take a mathematical function f (with infinite precision) generate a dataset by sampling x and y, and computing z = f(x,y) learn f by M5’ (regression tree)

15 k-Nearest Neighbor k-Nearest Neighbor can also be used for regression
with all advantages and disadvantages


Download ppt "Regression Methods."

Similar presentations


Ads by Google