Regression Methods.

Regression Methods

Linear Regression Simple linear regression (one predictor)
Multiple linear regression (multiple predictors) Ordinary Least Squares estimation Computed directly from the data Lasso regression selects features by setting parameters to 0

Coefficient of Determination
Indicates how well a model fits the data R2 (R squared) R2 = 1−SSres/SStot SSres = Σ(yi−fi)2 difference between actual and predicted SStot = Σ(yi−y)2 difference between actual and horizontal line Between 0 and 1, if least squares model. Bigger range if other models are used Explained variance what percentage of the variance is explained by the model Linear least squares regression: R2 = r2

R Squared visual interpretation of R2 SStot SSres
Source Wikipedia CC BY-SA 3.0 SStot SSres

Regression Trees Regression variant of decision tree
Top-down induction 2 options: Constant value in leaf (piecewise constant) regression trees Local linear model in leaf (piecewise linear) model trees

M5 algorithm (Quinlan, Wang)
M5’, M5P in Weka (classifiers > trees > M5P) Offers both regression trees and model trees Model trees are default -R option (buildRegressionTree) for piecewise constant

M5 algorithm (Quinlan, Wang)
Splitting criterion: Standard Deviation Reduction SDR = sd(T) – Σ sd(Ti)|Ti|/|T| Stopping criterion: Standard deviation below some threshold (0.05sd(D)) Too few examples in node (e.g. ≤ 4) Pruning (bottom-up): Estimate error: (n+v)/(n−v)×absolute error in node n is examples in node, v is parameters in the model

Binary Splits All splits are binary Numeric as normal (in C4.5)
Nominal: order all values according to average (prior to induction) introduce k-1 indicator variables in this order Example: database of skiing slopes avg(color = green) = 2.5% avg(color = blue) = 3.2% avg(color = red) = 7.7% avg(color = black) = 13.5% binary features: Green, GreenBlue, GreenBlueRed,

Regression tree on Servo dataset (UCI)

Model tree on Servo dataset (UCI)
LM1: * motor=B,A * screw=B,A * screw=A * pgain=4,3 * pgain=3 − * vgain=1,2 smaller tree because more expressivity in leafs

Regression in Cortana Regression a natural setting in Subgroup Discovery Local models, no prediction model Subgroups are piecewise constant subsets h = 3100 h = 2200

Subgroup Discover: regression
A subgroup is a step-function (inside subgroup vs. outside) R2 of step function is an interesting quality measure (next to z-score) available in Cortana as Explained Variance

Other regression models
Functions LinearRegression MultiLayerPerceptron (artificial neural network) SMOreg (Support Vector Machine) Lazy IBK (k-Nearest Neigbors) Rules M5Rule (decision list)

Approximating a smooth function
Experiment: take a mathematical function f (with infinite precision) generate a dataset by sampling x and y, and computing z = f(x,y) learn f by M5’ (regression tree)

k-Nearest Neighbor k-Nearest Neighbor can also be used for regression
with all advantages and disadvantages

Regression Methods.

Similar presentations

Presentation on theme: "Regression Methods."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regression Methods.

Similar presentations

Presentation on theme: "Regression Methods."— Presentation transcript:

Similar presentations

About project

Feedback