Support Vector Regression in Marketing Georgi Nalbantov
2/20 Contents Purpose Linear Support Vector Regression Nonlinear Support Vector Regression Marketing Examples Conclusion and Q & A (some extensions)
3/20 The Regression Task Given training data: { ( x 1, y 1 ), …, ( x n, y n ) } d Find function: : d “best function” = the expected error on unseen data ( x n+1, y n+1 ), …, ( x n+k, y n+k ) is minimal Existing techniques to solve the classification task: Classical (Linear) Regression Ridge Regression NN
4/20 Linear Support Vector Regression Marketing Problem Given variables: person’s age income group season holiday duration location number of children etc. (12 variables) Predict: the level of holiday Expenditures Data collected by Erasmus University Rotterdam in 2003 Expenditures Age ● ● ● ● ● ● ●
5/20 Linear Support Vector Regression “Suspiciously smart case” (overfitting) “Lazy case” (underfitting) Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● “Compromise case”, SVR (good generalizability)
6/20 ● ● ● ● ● ● Linear Support Vector Regression error, penalty The epsilon-insensitive loss function penalty ● 45° ●
7/20 Linear Support Vector Regression “Suspiciously smart case” (overfitting) “Compromise case”, SVR (good generalizability) “Lazy case” (underfitting) Expenditures Age ● ● ● ● ● ● ● Expenditures Age ● ● ● ● ● ● ● The thinner the “tube”, the more complex the model biggest area small area Expenditures Age ● ● ● ● ● ● ● middle-sized area “Support vectors”
8/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Map the data into a higher-dimensional space:
9/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Map the data into a higher-dimensional space:
10/20 Non-linear Support Vector Regression Expenditures Age ● ● ● ● ● ● ● Finding the value of a new point:
11/20 Linear SVR: derivation Given training data Find:, such that optimally describes the data: Expenditures Age ● ● ● ● ● ● ●
12/20 ● ● Linear SVR: derivation ● ● ● ● ● Complexity Sum of errors vs. Case I: Case II: “tube”complexity “tube”complexity
13/20 Linear SVR: derivation Case I: Case II: “tube”complexity “tube”complexity The role of C ● ● ● ● ● ● ● C is small ● ● ● ● ● ● ● C is big
14/20 ● ● Linear SVR: derivation ● ● ● ● ● Subject to:
15/20 Non-linear SVR: derivation Subject to:
16/20 Non-linear SVR: derivation Subject to: Saddle point of L has to be found: min with respect to max with respect to
17/20 Non-linear SVR: derivation...
18/20 Strengths of SVR: No local minima It scales relatively well to high dimensional data Trade-off between classifier complexity and error can be controlled explicitly via C and epsilon Overfitting is avoided (for any fixed C and epsilon) Robustness of the results The “curse of dimensionality” is avoided “[Huber (1964) demonstrated that the best cost function over the worst model over any pdf of y given x is the linear cost function. Therefore, if the pdf p(y/x) is unknown the best cost function is the linear penalization over the errors” (Perez-Cruz et al., 2003) Weaknesses of SVR: What is the best trade-off parameter C and best epsilon? What is a good transformation of the original space Strengths and Weaknesses of SVR
19/20 Experiments and Results The vacation problem (again) Given training data of input-output pairs where output is “Expenditures” and inputs are “Age”, “Duration” of holiday, “Income group”, “Number of children”, etc. Predict on the basis of, The training set consists of 600, and the test set of 108 observations
20/20 Experiments and Results The SVR function: Subject to: To find the unknown parameters of the SVR function, solve: How to choose,, ? = RBF kernel: Find,, and from a cross-validation procedure
21/20 Experiments and Results Do 5-fold cross-validation to find and for several fixed values of.
22/20 Experiments and Results The effect changes in : as it increases, the functional relationship gets flatter in the higher-dimensional space, but also in the original space = 0.45 = 0.15
23/20 Experiments and Results Performance on the test set MSE= 0.04 MSE= 0.23
24/20 Conclusion Support Vector Regression (SVR) can be applied for marketing problems SVR behave robustly in multivariate problems Further research in various Marketing areas is needed to justify or refute the applicability of SVR Interpretability of the results can be facilitated by reporting confidence intervals for the estimates