Support Vector Regression in Marketing Georgi Nalbantov.

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Introduction to Support Vector Machines (SVM)
ECG Signal processing (2)
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Hazırlayan NEURAL NETWORKS Least Squares Estimation PROF. DR. YUSUF OYSAL.
Lecture 14 – Neural Networks
Introduction to Predictive Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Sparse Kernels Methods Steve Gunn.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Lecture 10: Support Vector Machines
Theory Simulations Applications Theory Simulations Applications.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Efficient Model Selection for Support Vector Machines
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
An Introduction to Support Vector Machines (M. Law)
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
CpSc 881: Machine Learning
Support Vector Machines Tao Department of computer science University of Illinois.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Data Mining and Decision Support
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
SVMs in a Nutshell.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Computacion Inteligente Least-Square Methods for System Identification.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
Machine learning, pattern recognition and statistical data modelling
Overview of Supervised Learning
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Hyperparameters, bias-variance tradeoff, validation
Presenter: Georgi Nalbantov
Support Vector Machines
Identification of Wiener models using support vector regression
Support Vector Machines 2
Presentation transcript:

Support Vector Regression in Marketing Georgi Nalbantov

2/20 Contents Purpose Linear Support Vector Regression Nonlinear Support Vector Regression Marketing Examples Conclusion and Q & A (some extensions)

3/20 The Regression Task Given training data: { ( x 1, y 1 ), …, ( x n, y n ) }   d   Find function:  :  d   “best function” = the expected error on unseen data ( x n+1, y n+1 ), …, ( x n+k, y n+k ) is minimal Existing techniques to solve the classification task: Classical (Linear) Regression Ridge Regression NN

4/20 Linear Support Vector Regression Marketing Problem Given variables: person’s age income group season holiday duration location number of children etc. (12 variables) Predict: the level of holiday Expenditures Data collected by Erasmus University Rotterdam in 2003 Expenditures Age ● ● ● ● ● ● ●

5/20 Linear Support Vector Regression “Suspiciously smart case” (overfitting) “Lazy case” (underfitting) Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● “Compromise case”, SVR (good generalizability)

6/20 ● ● ● ● ● ● Linear Support Vector Regression error, penalty The epsilon-insensitive loss function penalty ● 45° ●

7/20 Linear Support Vector Regression “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalizability) “Lazy case” (underfitting) Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● The thinner the “tube”, the more complex the model biggest area small area Expenditures Age ● ● ● ● ● ● ● middle-sized area “Support vectors”

8/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Map the data into a higher-dimensional space:

9/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Map the data into a higher-dimensional space:

10/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Finding the value of a new point:

11/20 Linear SVR: derivation Given training data Find:, such that optimally describes the data: Expenditures Age ● ● ● ● ● ● ●

12/20 ● ● Linear SVR: derivation ● ● ● ● ● Complexity Sum of errors vs. Case I: Case II: “tube”complexity “tube”complexity

13/20 Linear SVR: derivation Case I: Case II: “tube”complexity “tube”complexity The role of C ● ● ● ● ● ● ● C is small ● ● ● ● ● ● ● C is big

14/20 ● ● Linear SVR: derivation ● ● ● ● ● Subject to:

15/20 Non-linear SVR: derivation Subject to:

16/20 Non-linear SVR: derivation Subject to: Saddle point of L has to be found: min with respect to max with respect to

17/20 Non-linear SVR: derivation...

18/20 Strengths of SVR: No local minima It scales relatively well to high dimensional data Trade-off between classifier complexity and error can be controlled explicitly via C and epsilon Overfitting is avoided (for any fixed C and epsilon) Robustness of the results The “curse of dimensionality” is avoided “[Huber (1964) demonstrated that the best cost function over the worst model over any pdf of y given x is the linear cost function. Therefore, if the pdf p(y/x) is unknown the best cost function is the linear penalization over the errors” (Perez-Cruz et al., 2003) Weaknesses of SVR: What is the best trade-off parameter C and best epsilon? What is a good transformation of the original space Strengths and Weaknesses of SVR

19/20 Experiments and Results The vacation problem (again) Given training data of input-output pairs where output is “Expenditures” and inputs are “Age”, “Duration” of holiday, “Income group”, “Number of children”, etc. Predict on the basis of, The training set consists of 600, and the test set of 108 observations

20/20 Experiments and Results The SVR function: Subject to: To find the unknown parameters of the SVR function, solve: How to choose,, ? = RBF kernel: Find,, and from a cross-validation procedure

21/20 Experiments and Results Do 5-fold cross-validation to find and for several fixed values of.

22/20 Experiments and Results The effect changes in : as it increases, the functional relationship gets flatter in the higher-dimensional space, but also in the original space = 0.45 = 0.15

23/20 Experiments and Results Performance on the test set MSE= 0.04 MSE= 0.23

24/20 Conclusion Support Vector Regression (SVR) can be applied for marketing problems SVR behave robustly in multivariate problems Further research in various Marketing areas is needed to justify or refute the applicability of SVR Interpretability of the results can be facilitated by reporting confidence intervals for the estimates