Presenter: Georgi Nalbantov

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Brief introduction on Logistic Regression
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear regression models
Indian Statistical Institute Kolkata
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Model assessment and cross-validation - overview
© McGraw-Hill Higher Education. All Rights Reserved. Chapter 2F Statistical Tools in Evaluation.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 10 Simple Regression.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Chapter 11 Multiple Regression.
Ch. 14: The Multiple Regression Model building
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Linear Regression/Correlation
Classification and Prediction: Regression Analysis
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Regression Maarten Buis Outline Recap Estimation Goodness of Fit Goodness of Fit versus Effect Size transformation of variables and effect.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Statistical Methods Statistical Methods Descriptive Inferential
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
Support Vector Regression in Marketing Georgi Nalbantov.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
The simple linear regression model and parameter estimation
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Regression Analysis Module 3.
Notes on Logistic Regression
Boosting and Additive Trees (2)
Multiple Regression.
Multivariate Regression
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Machine Learning – Regression David Fenyő
Simple Linear Regression - Introduction
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
LESSON 24: INFERENCES USING REGRESSION
Linear Regression/Correlation
Support Vector Machines
Linear Model Selection and regularization
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Model generalization Brief summary of methods
Linear Regression and Correlation
Linear Regression and Correlation
Introduction to Regression
Regression Part II.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
MGS 3100 Business Analysis Regression Feb 18, 2016
Support Vector Machines 2
Presentation transcript:

Presenter: Georgi Nalbantov Summer Course: Data Mining Regression Analysis Presenter: Georgi Nalbantov August 2009

Structure Regression analysis: definition and examples Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted)

Feature Selection, Dimensionality Reduction, and Clustering in the KDD Process U.M.Fayyad, G.Patetsky-Shapiro and P.Smyth (1995)

Common Data Mining tasks Clustering Classification Regression + X 2 X 2 + + + + + + + + + + + - + + + + + + - + + + - - + + + + + + + + - + - + X 1 X 1 X 1 k-th Nearest Neighbour Parzen Window Unfolding, Conjoint Analysis, Cat-PCA Linear Discriminant Analysis, QDA Logistic Regression (Logit) Decision Trees, LSSVM, NN, VS Classical Linear Regression Ridge Regression NN, CART

Linear regression analysis: examples

Linear regression analysis: examples

The Regression task Given: ( x1, y1 ), … , ( xm , ym )  n X  1 Given data on m explanatory variables and 1 explained variable, where the explained variable can take real values in 1, find a function that gives the “best” fit: Given: ( x1, y1 ), … , ( xm , ym )  n X  1 Find:  : n   1 “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k ) is minimal

Classical Linear Regression (OLS) Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model: b1 > 0  Positive Association b1 < 0  Negative Association b1 = 0  No Association

Classical Linear Regression (OLS) b0  Mean response when x=0 (y-intercept) b1  Change in mean response when x increases by 1 unit (slope) b0, b1 are unknown parameters (like m) b0+b1x  Mean response when explanatory variable takes on the value x Task: Minimize the sum of squared errors:

Classical Linear Regression (OLS) Parameter: Slope in the population model (b1) Estimator: Least squares estimate: Estimated standard error: Methods of making inference regarding population: Hypothesis tests (2-sided or 1-sided) Confidence Intervals x1 y

Classical Linear Regression (OLS)

Classical Linear Regression (OLS)

Classical Linear Regression (OLS) Coefficient of determination (r2) : proportion of variation in y “explained” by the regression on x. where

Classical Linear Regression (OLS): Multiple regression Numeric Response variable (y) p Numeric predictor variables Model: Y = b0 + b1x1 +  + bpxp + e Partial Regression Coefficients: bi  effect (on the mean response) of increasing the ith predictor variable by 1 unit, holding all other predictors constant

Classical Linear Regression (OLS): Ordinary Least Squares estimation Population Model for mean response: Least Squares Fitted (predicted) equation, minimizing SSE:

Classical Linear Regression (OLS): Ordinary Least Squares estimation Model: OLS estimation: LASSO estimation: Ridge regression estimation:

LASSO and Ridge estimation of model coefficients sum(|beta|) sum(|beta|)

Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers

Nonparametric (local) regression estimation: k-NN, Decision trees, smoothers How to Choose k or h? When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h.

Linear Support Vector Regression Expenditures Age ● middle-sized area Expenditures Age ● small area biggest area ● ● ● ● Expenditures ● ● ● “Support vectors” Age “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalisation) “Lazy case” (underfitting) The thinner the “tube”, the more complex the model

Nonlinear Support Vector Regression Map the data into a higher-dimensional space: Expenditures Age ●

Nonlinear Support Vector Regression Map the data into a higher-dimensional space: Expenditures Age ●

Nonlinear Support Vector Regression: Technicalities The SVR function: To find the unknown parameters of the SVR function, solve: Subject to: How to choose , , = RBF kernel: Find , , , and from a cross-validation procedure

SVR Technicalities: Model Selection Do 5-fold cross-validation to find and for several fixed values of .

SVR Study : Model Training, Selection and Prediction CVMSE (IR*, HR*, CR*) True returns (red) and raw predictions (blue) CVMSE (IR*, HR*, CR*)

SVR: Individual Effects

SVR Technicalities: SVR vs. OLS Performance on the test set Performance on the test set SVR MSE= 0.04 OLS MSE= 0.23

Technical Note: Number of Training Errors vs. Model Complexity Min. number of training errors, Model complexity test errors training errors complexity Functions ordered in increasing complexity Best trade-off MATLAB video here…

Variable selection for regression Akaike Information Criterion (AIC). Final prediction error:

Variable selection for regression Bayesian Information Criterion (BIC), also known as Schwarz criterion. Final prediction error: BIC tends to choose simpler models than AIC.

Variable selection for regression R^2-adjusted:

Conclusion / Summary / References Classical Linear Regression LASSO and Ridge Regression (linear and nonlinear) Nonparametric (local) regression estimation: kNN for regression, Decision trees, Smoothers Support Vector Regression (linear and nonlinear) Variable/feature selection (AIC, BIC, R^2-adjusted) (any introductory statistical/econometric book) http://www-stat.stanford.edu/~tibs/lasso.html , Bishop, 2006 Alpaydin, 2004, Hastie et. el., 2001 Smola and Schoelkopf, 2003 Hastie et. el., 2001, (any statistical/econometric book)