Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

Similar presentations


Presentation on theme: "Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin."— Presentation transcript:

1 Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin H. Law, Kwok-Ching Tsui Intelligent Systems Research Group, BT Laboratories Hong Kong Baptist University

2 What is a Recommender System? Recommender System... Records of other customers (possibly with ratings)

3 Product Recommendation in E-commerce Products Recommendations www.amazon.com

4 Product Recommendation in E-commerce Products Recommendations www.cdnow.com

5 Overview Content-based Recommender System Personal Profile Collaborative Recommender System... Records of other customers (possibly with ratings) Ratings The Support Vector Machine (SVM) The Support Vector Machine (SVM) The Extended Latent Class Model (ELCM) The Extended Latent Class Model (ELCM)

6 Presentation Outline Content-based Recommendation –Existing Solutions and Their Limitations –Our Proposed Solution - the SVM Collaborative Recommendation –Existing Solutions and Their Limitations –Our Proposed Solution - the Extended LCM Experimental Evaluation Conclusion and Future Works

7 Content-based Recommendation Matching between the personal profile and the features extracted from product descriptions. Assumptions: –Customer personal profiles are available. –Detailed product descriptions are available so that a set of representative features can be extracted. –Both the profiles and the product descriptions share the same representation. Content-based Recommender System Personal Profile

8 Some Existing Solutions Keyword Matching –problems of synonymy and polysemy. Pattern Classification Approaches –f={ f 1 (y), f 2 (y), … f m (y) } the set of features for product y –a x (f(y)) the classifier output for customer x s interest obtained via training, such that –Examples of classifiers: Naïve Bayes, k-NN, C4.5 (decision tree)

9 Feature Selection Problem The performance of content-based recommendation depends heavily on the discriminative power of the features selected to be extracted. –Too few features => hard to learn useful profiles (shallow analysis) –Too many features => hard to estimate the classifiers parameters with good generalisation performance.

10 Our Proposed Solution - the use of SVM The Support Vector Machine has been shown to be able to achieve good generalisation performance for classification of high- dimensional data sets and its training can be framed as solving a quadratic programming problem. => ones can simply use all extracted features for the input and there is no need for feature selection at all.

11 Pattern Classification...

12 Which line is the best? (Training and Generalization)

13 Intuitively, maximize the margin between classes Theoretically sound –related to minimizing the VC-dimension under the theory of structural risk minimization Support Vector Machine (SVM) margin

14 Solving for the line Computationally, this leads to a quadratic programming problem –maximize a quadratic objective function subject to some linear constraints –no local maximum (cf neural networks)

15 Support Vectors The line depends only on a small number of training examples.

16 Nonlinear Cases use another coordinates system such that the curve becomes a line

17 Kernels Only inner products, (x) T (y), are involved in the calculation Under certain conditions, there exists a kernel K such that K(x,y)= (x) T (y) –e.g. Polynomial of degree d : K(x,y)=(x T y+1) d replace x T y by (x) T (y)

18 Overlapping Cases Impossible to perfectly separates the two classes –Include an error term Instead of maximizing margin, minimize error + / margin Again, involves only quadratic programming

19 Collaborative Recommendation Matching between the customers ratings with the ratings of others (the word-of-mouth approach). Assumptions: –Customer ratings of a reasonably large group of customers are available. –Each product has been rated by some of the customers. –The product ratings are overlapping to certain degrees. Collaborative Recommender System... Records of other customers (possibly with ratings) Product Ratings Product Ratings

20 Some Existing Solutions Memory-based Approach –Pearson Correlation Coefficient –… and its variants –suffer from the sparsity and the first-rater problems. Model-based Approach –solve the sparsity problem by incorporating a priori models. –E.g., Naïve Bayes Classifier, Bayesian Network, Latent Class Model

21 Limitations The sparsity problem (lacking sufficient ratings) The first-rater problem (encountering new products) A New Customer x n Customer x 1 Customer x 2 Customer x 3 54 - ----- - 5-4--- - 1--44-- - 5- ----- -

22 Grouping Preference Ratings A New Customer x n Customer x 1 Customer x 2 Customer x 3 54 - ----- - 5-4--- - 1--44-- - 5- ----- - - to solve the sparsity problem Preference Pattern #1 Preference Pattern #2 Recommended !

23 Integrating Product Contents A New Customer x n Customer x 1 Customer x 2 Customer x 3 54 - ----- - 5-4--- - 1--44-- - 5- ----- - Preference Pattern #1 Preference Pattern #2 - to solve the first-rater problem Recommended !

24 Our Proposed Solution - the use of LCM The latent class model has been proposed by Thomas Hofmann et al. in IJCAI99 for clustering preference ratings with promising results. Limitation: only capable of recommending products to customers in the training set. We extend their model so that –a) Existing products can be recommended to the customers not in the training set –b) New products can be recommended to the existing customers (not described in the paper).

25 Latent Class Model Customer X Product Y Preference Pattern Z ObservedHidden Model Training: Learn P(z), P(x|z) and P(y|z) using the EM algorithm. The model initialization is done by the K-means clustering.

26 Existing Products to Existing Customers Compute the probabilities that x is interested in y Products can then be sorted according to the values of P(y|x) for recommendation.

27 Extension 1: Existing Products to New Customers x n is not inside the training set. Thus, we dont have P(z|x n ). Inner product of the pdf of pattern z and the ratings of x n.

28 Extension 2: New Products to Existing Customers y n is not inside the training set. Thus, we dont have P(y n |z). distance between y n and z in the feature space

29 Performance Measures accuracy: the percentage of correct recommendations recall: the percentage of interesting products that can be located in the output list precision: the percentage of products in the output list which are really interesting to the customer. break-even point: The point where recall = precision expected utility: –its value is high if the products rated high appear early in the output list.

30 Experiment One: Setup (content-based by SVM) Product ratings data set –EachMovie (from DEC) Product description data set –Internet Movie Database (http://www.imdb.com) –Size of feature set = 6620, including Release date, Runtime, Language, Director, Producer, Original music, Writing credit,... No. of products = 1628 –5-fold cross-validation –~1200 for training and remaining for testing No. of customers = 100

31 Experiment One: Results (content-based by SVM)

32 Experiment Two: Setup (collaborative by ELCM) Ratings data set –EachMovie (from DEC) Training –No. of products = 500 –No. of customers = 90 Testing –No. of customers = 10 –No. of products = 250 –Size of the product set where ratings are considered for matching, L = {10, 63, 83, 125, 250}

33 Experiment Two: Results (collaborative by ELCM)

34 Conclusion and Future Works SVM and ELCM are empirically shown to be promising for content-based recommendation and collaborative recommendation, respectively. Future works –ELCM Model Enhancement - BiELCM, hierarchical,... Scalability issue of the EM algorithm for ELCM Modelling dynamic preference patterns Applications to cross-selling? –Integration of SVM and ELCM for improvement


Download ppt "Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin."

Similar presentations


Ads by Google