Learning user preferences for 2CP-regression for a recommender system Alan Eckhardt, Peter Vojtáš Department of Software Engineering, Charles University.

Slides:



Advertisements
Similar presentations
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Advertisements

Random Forest Predrag Radenković 3237/10
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Uncertainty Quantification & the PSUADE Software
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Boğaziçi University Artificial Intelligence Lab. Artificial Intelligence Laboratory Department of Computer Engineering Boğaziçi University Techniques for.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Part I: Classification and Bayesian Learning
Computational Analysis of USA Swimming Data Junfu Xu School of Computer Engineering and Science, Shanghai University.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Evaluating Performance for Data Mining Techniques
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Graph-RAT Overview By Daniel McEnnis. 2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly.
Issues with Data Mining
A nonlinear hybrid fuzzy least- squares regression model Olga Poleshchuk, Evgeniy Komarov Moscow State Forest University, Russia.
Machine Learning CSE 681 CH2 - Supervised Learning.
1 Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan,
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Implementing Query Classification HYP: End of Semester Update prepared Minh.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
Prediction of Influencers from Word Use Chan Shing Hei.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
U.S. Department of the Interior U.S. Geological Survey Automatic Generation of Parameter Inputs and Visualization of Model Outputs for AGNPS using GIS.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Proximity based one-class classification with Common N-Gram dissimilarity for authorship verification task Magdalena Jankowska, Vlado Kešelj and Evangelos.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
Experience Report: System Log Analysis for Anomaly Detection
Machine Learning for the Quantified Self
Semi-Supervised Clustering
Semi-supervised Machine Learning Gergana Lazarova
Outlier Processing via L1-Principal Subspaces
Pawan Lingras and Cory Butz
Collaborative Filtering Nearest Neighbor Approach
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Machine Learning in Practice Lecture 23
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Somi Jacob and Christian Bach
CSE 491/891 Lecture 25 (Mahout).
Nearest Neighbors CSC 576: Data Mining.
Model generalization Brief summary of methods
By Sandeep Patil, Department of Computer Engineering, I²IT
Relevance and Reinforcement in Interactive Browsing
Tel Hope Foundation’s International Institute of Information Technology, (I²IT). Tel
Presentation transcript:

Learning user preferences for 2CP-regression for a recommender system Alan Eckhardt, Peter Vojtáš Department of Software Engineering, Charles University in Prague, Czech Republic

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Outline Motivation User model Peak and 2CP Experiments Conclusion and future work

SOFSEM 2010, Špindlerův mlýv, Czech Republic, User preference learning Helping the user to find what she looks for E.g. notebooks A small amount of information required from the user Ratings of notebooks,... Construction of a general user preference model Each user has his/her own preference model Recommendation of the top k notebooks to the user Which the preference model has chosen as the most preferred for the user

SOFSEM 2010, Špindlerův mlýv, Czech Republic, User preference learning Recommendation process Initial set Centers of clusters of objects Construction of user model Recommendation More iterations possible In each iteration the user model is refined

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Two step user model User model learning is divided into two steps 1.Local preferences - normalization of the attribute values of notebooks to their preference degrees Transforms the space into [0,1] N 2.Global preferences - aggregation of preference degrees of attribute values into the predicted rating

SOFSEM 2010, Špindlerův mlýv, Czech Republic, User model Fuzzy sets Normalize the space to monotone space [0,1] N Define pareto front Set of incomparable objects Candidates for the best object (1,…,1) is the best object 1 1 0

SOFSEM 2010, Špindlerův mlýv, Czech Republic, User model Aggregation Resolves the best object from pareto front The second best object may not be on pareto front Two methods – Statistical and Instances st best 2nd best

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Normalization of numerical attributes Linear regression Preference of the smallest or the largest value Quadratic regression Can detect ideal values, but often fails in experiments

SOFSEM 2010, Špindlerův mlýv, Czech Republic, CP regression Preference dependence between attributes This is not a dependence in the dataset (e.g. the resolution of display influences the price) The influence of the value of attribute A 1 on the preference of attribute A 2 E.g. the value of the producer (IBM) of a notebook influence the preference of the price of the notebook (for IBM, the ideal price is 2200$).

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Peak Motivation User often prefer once particular value of attribute Finding the peak value Traversing the training set Which is small Testing the error of linear regressions on both sides of the peak We know exactly which value is the most preferred Useful for visual representation

SOFSEM 2010, Špindlerův mlýv, Czech Republic, CP regression+Peak Dependence of price on the value of manufacturer ACER => High price ASUS => Lower price

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Experiment settings Dataset of 200 notebooks Artificial user preferences The preference of price was dependent on the value of producer Training sets of sizes 2-60 The rest of the dataset was used as testing set Error measures RMSE Kendall  coefficient

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Experiment settings Tested methods Support Vector Machines from Weka Mean – returns the average rating from the training set Instances – classification, uses objects from the training as boundaries on rating Statistical – weighted average with learned weights 2CP Both Instances and Statistical can use local preference normalization – Linear, Quadratic, Peak 2CP serves to find the relation between the preference of an attribute value and the value of another

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Experiment results

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Experiment results

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Experiment results

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Conclusion Proposal of method Peak Combination with 2CP Experimental evaluation with very good results Using rank correlation measure

SOFSEM 2010, Špindlerův mlýv, Czech Republic, Future work nCP-regression Clustering of similar values for better robustness Degree of relation between two attributes