G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

1) New Paths to New Machine Learning Science 2) How an Unruly Mob Almost Stole the Grand Prize at the Last Moment Jeff Howbert February 6, 2012.
By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.
Online Recommendations
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
The Pragmatic Theory solution to the Netflix Grand Prize Rizwan Habib CSCI 297 April 15 th, 2010.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Reduced Support Vector Machine
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Recommender systems Ram Akella November 26 th 2008.
Chapter 12 (Section 12.4) : Recommender Systems Second edition of the book, coming soon.
Recommender Systems Eric Nalisnick CSE 435. … How can businesses direct customers to groups of similar, interesting, relevant, and undiscovered items?
Performance of Recommender Algorithms on Top-N Recommendation Tasks
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Netflix Prize and Heritage Health Prize Philip Chan.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Shanda Innovations Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Kevin Y. W. Chen.
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Data Mining and Decision Support
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo! Israel KDD 2009.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Recommendation Systems By: Bryan Powell, Neil Kumar, Manjap Singh.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Collaborative Deep Learning for Recommender Systems
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Matrix Factorization and Collaborative Filtering
Statistics 202: Statistical Aspects of Data Mining
Recommender Systems & Collaborative Filtering
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
Ensembles.
Matrix Factorization & Singular Value Decomposition
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
CSE 491/891 Lecture 25 (Mahout).
Collaborative Filtering Non-negative Matrix Factorization
Data Mining Ensembles Last modified 1/9/19.
Artificial Intelligence 10. Neural Networks
Recommendation Systems
Presentation transcript:

G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit Topic 4: Applications Lecture 1: The Netflix Challenge Some material taken from and

Outline The challenge and its assessment Timeline of progress Recommendation methods Matrix Factorisation techniques Ensemble methods Lessons learnt Resources

The Netflix Challenge Netflix is an online video rental company One of its most relevant components is its move recommendation system – Suggest movies to users based on their past ratings In 2006 netflix made its recommendation database public Challenged the community to produce a new recommender that was 10% better than their own method Winner would get $1M

Training data Movie ratings collected from 1998 to ,480,507 ratings that 480,189 users gave to 17,770 movies. Training data divided in – Training set (99,072,112 ratings) – Probe set (1,408,395 ratings) Each rating was a quadruplet Very sparse data: the number of ratings is a very small fraction of users x movies

Test data Qualifying data were triplets Qualifying set (2,817,131 ratings) consisting of: – Test set (1,408,789 ratings), used to determine winners – Quiz set (1,408,342 ratings), used to calculate leaderboard scores Participants did not know which instances were part of the test set and which part of the quiz set Test, quiz and probe set were created to have similar statistical properties

Assessment Error on the quiz and test set was computed as Root Mean Squared Error (RMSE), rounded to 4 digits RMSE of the Cinematch system (Netflix own predictor) = – Target RMSE = Once a participant improves the target RMSE, a “last call” period of 30 days start At the end of the 30 days, the participant with lowest test RMSE is declared the winner In case of ties, the prize goes to the earliest entry

Progress in the challenge Data released on October 2 nd, 2006 On October 8 th a participant already had better RMSE than Cinematch The 2007 progress prize was awarded to BellKor with an improvement of 8.43% The 2008 progress prize was awarded to “BellKor in BigChaos” with an improvement of 9.44% In June 26 th, 2009, the team "BellKor's Pragmatic Chaos" achieved an improvement of 10.05%. The “last call” period started

Progress: Last call period On July 25, 2009 the team "The Ensemble", a merger of the teams "Grand Prize Team" and "Opera Solutions and Vandelay United", achieved a 10.09% improvementOpera Solutions After the last call period ended, two teams leaded the quiz leaderboard: – "The Ensemble" with a 10.10% improvement – "BellKor's Pragmatic Chaos" with a 10.09% improvement On the test set both teams were tied with an improvement of 10.06% BellKor's Pragmatic Chaos was declared the winner because they had submitted their entry 20 minutes before The Ensemble

Recommender systems: Content Filtering Collect background information from users and movies to generate a profile of each of them – Users: demographic information – Movies: genre, actors, box office results Produce recommendations by matching the profiles of users and movies Costly as many times it’s difficult to collect all this information or it’s simply not available

Recommender systems: collaborative filtering Generate predictions of ratings only based on the past behavior of the users No background domain knowledge required Easier to generate the models Faces difficulties to start up: when not enough ratings are available

Collaborative filtering: neighbourhood methods Compute relationship between items or users Identify which movies are similar to each other, based on receiving similar ratings from the same user Hierarchical clustering showing the similarities of 5000 movies Hierarchical clustering showing the similarities of 5000 movies

Collaborative filtering: latent factor models Automatically map users and movies into a new space of factors (same for both of them)

Matrix Factorisation methods Most successful of the latent factors methods These methods generate a vector q i  f for each item and a vector p u  f for each user A prediction is a linear combination of both vectors The problem of finding the vectors q and p for each movie and user is defined as the following optimisation problem Training set Actual rating Predicted rating Regularisation term (avoid overfitting)

Optimisation methods Stochastic gradient descend – Iteratively samples training examples, computes the prediction errors and adjusts the vectors of the involved user and item accordingly Alternating least squares – The original definition of the optimization problem is not convex, and hence cannot be solved to optimality – If either p or q is fixed, the problem is convex and can be solved using least squares methods – This method alternates between two states, where in each state it fixes either p or q

Bias in the models Not all movies receive the same distribution of ratings – Some are more popular Not all users give the same distribution of ratings – Some users are more strict than others Refinement of the model introducing bias terms Average overall rating Bias of item i Bias of user u

Additional input sources Implicit feedback – User has preference over certain movies, therefore it will not produce ratings for anything Demographic information – If available

Temporal dynamics Ratings change through time Users – May change tastes – May produce more/less strict ratings in different periods of time Movies – Blockbusters may fade in popularity – Cult movies may become more popular

Impact of all components of the model (BellKor)

Ensemble methods All top participants methods combined (blended) the predictions of hundreds of models of many types – Matrix Factorisation – Neighbourhood methods – Restircted Boltzmann Machines methods Many ways of combining the models – Linear combinations – Neural networks – Regression trees

Basic linear regression method Need to optimize the vector of weights associated to each method Can use e.g. least squares method for this, optimizing over the probe set How to choose the models to include in the ensemble? – Forward method: start with one, keep adding until the probe set error degrades – Backward method: start with all, keep removing while the probe set error improves

Feature-Weighted Linear Stacking Method from “The Ensemble” Not all models are suitable for all kind of movies/users Generate a set of “meta-features” for each instance that are used to calibrate the linear combination of weights specifically for each case v ij = weight associated to feature j for model I f j (x) = value of feature j for instance x g i (x) = prediction of model I for instance x

Top 10 features (out of 25)

Lessons learnt from the challenge Well defined competition (clear rules, instant feedback of progress, forums to discuss) Great collaboration between participants, sharing ideas, combining efforts Widen the awareness of statistical and machine learning in the mainstream society It has provided a big challenge to the ML community, and hence, new science was done

Resources Challenge web page Very nice article about Matrix Factorisation Article on Feature-Weighted Linear StackingFeature-Weighted Linear Stacking Progress reports of BellKor BigChaos PragmaticTheory Web page of “The Ensemble” Web page of “The Ensemble”