Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
The best indicator that a passenger will show up to board the flight is that she called in for a special meal Filtering and Recommender Systems Content-based.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Active Learning and Collaborative Filtering
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
CS345 Data Mining Recommendation Systems Netflix Challenge Anand Rajaraman, Jeffrey D. Ullman.
Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
RECOMMENDATION SYSTEMS
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Chapter 8 Collaborative Filtering Stand
Agent Technology for e-Commerce
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
If you want to change the world, change the metaphor
Filtering and Recommender Systems Content-based and Collaborative
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1 Introduction to Recommendation System Presented by HongBo Deng Nov 14, 2006 Refer to the PPT from Stanford: Anand Rajaraman, Jeffrey D. Ullman.
Collaborative Filtering CMSC498K Survey Paper Presented by Hyoungtae Cho.
Recommender systems Ram Akella November 26 th 2008.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Collaborative Filtering & Content-Based Recommending
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Toward the Next generation of Recommender systems
1 Recommender Systems Collaborative Filtering & Content-Based Recommending.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Objectives Objectives Recommendz: A Multi-feature Recommendation System Matthew Garden, Gregory Dudek, Center for Intelligent Machines, McGill University.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
1 Collaborative Filtering & Content-Based Recommending CS 290N. T. Yang Slides based on R. Mooney at UT Austin.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
User Modeling and Recommender Systems: recommendation algorithms
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Presented By: Madiha Saleem Sunniya Rizvi.  Collaborative filtering is a technique used by recommender systems to combine different users' opinions and.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Analysis of massive data sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Announcements Paper presentation Project meet with me ASAP
Data Mining: Concepts and Techniques
Recommender Systems & Collaborative Filtering
CS728 The Collaboration Graph
Collaborative Filtering Nearest Neighbor Approach
Advanced Artificial Intelligence
Feature Selection Methods
Presentation transcript:

Filtering and Recommender Systems Content-based and Collaborative Some of the slides based On Mooney’s Slides

Feature selection & LSI Both MI and LSI are dimensionality reduction techniques MI is looking to reduce dimensions by looking at a subset of the original dimensions –LSI looks instead at a linear combination of the subset of the original dimensions (Good: Can automatically capture sets of dimensions that are more predictive. Bad: the new features may not have any significance to the user) MI does feature selection w.r.t. a classification task (MI is being computed between a feature and a class) –LSI does dimensionality reduction independent of the classes (just looks at data variance) –..where as MI needs to increase variance across classes and reduce variance within class Doing this is called LDA (linear discriminant analysis) LSI is a special case of LDA where each point defines its own class Digression

Personalization Recommenders are instances of personalization software. Personalization concerns adapting to the individual needs, interests, and preferences of each user. Includes: –Recommending –Filtering –Predicting (e.g. form or calendar appt. completion) From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

Feedback & Prediction/Recommendation Traditional IR has a single user—probably working in single-shot modes –Relevance feedback… WEB search engines have: –Working continually User profiling –Profile is a “model” of the user (and also Relevance feedback) –Many users Collaborative filtering –Propagate user preferences to other users… You know this one

Recommender Systems in Use Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences. Many on-line stores provide recommendations (e.g. Amazon, CDNow). Recommenders have been shown to substantially increase sales at on-line stores.

Feedback Detection –Click certain pages in certain order while ignore most pages. –Read some clicked pages longer than some other clicked pages. –Save/print certain clicked pages. –Follow some links in clicked pages to reach more pages. –Buy items/Put them in wish-lists/Shopping Carts –Explicitly ask users to rate items/pages Non-Intrusive Intrusive

Content-based vs. Collaborative Recommendation

Collaborative Filtering A 9 B 3 C : Z 5 A B C 9 : Z 10 A 5 B 3 C : Z 7 A B C 8 : Z A 6 B 4 C : Z A 10 B 4 C 8. Z 1 User Database Active User Correlation Match A 9 B 3 C. Z 5 A 9 B 3 C : Z 5 A 10 B 4 C 8. Z 1 Extract Recommendations C Correlation analysis Here is similar to the Association clusters Analysis!

Item-User Matrix The input to the collaborative filtering algorithm is an mxn matrix where rows are items and columns are users –Sort of like term-document matrix (items are terms and documents are users) Can think of items as vectors in the space of users (or users as vectors in the space of items) –Can do scalar clusters etc..

Collaborative Filtering Method Weight all users with respect to similarity with the active user. Select a subset of the users (neighbors) to use as predictors. Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings. Present items with highest predicted ratings as recommendations.

Similarity Weighting Typically use Pearson correlation coefficient between ratings for active user, a, and another user, u. r a and r u are the ratings vectors for the m items rated by both a and u r i,j is user i’s rating for item j

Neighbor Selection For a given active user, a, select correlated users to serve as source of predictions. Standard approach is to use the most similar n users, u, based on similarity weights, w a,u Alternate approach is to include all users whose similarity weight is above a given threshold.

Rating Prediction Predict a rating, p a,i, for each item i, for active user, a, by using the n selected neighbor users, u  {1,2,…n}. To account for users different ratings levels, base predictions on differences from a user’s average rating. Weight users’ ratings contribution by their similarity to the active user. ri,j is user i’s rating for item j

Covariance and Standard Deviation Covariance: Standard Deviation:

Significance Weighting Important not to trust correlations based on very few co-rated items. Include significance weights, s a,u, based on number of co-rated items, m.

Problems with Collaborative Filtering Cold Start: There needs to be enough other users already in the system to find a match. Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items. First Rater: Cannot recommend an item that has not been previously rated. –New items –Esoteric items Popularity Bias: Cannot recommend items to someone with unique tastes. – Tends to recommend popular items. WHAT DO YOU MEAN YOU DON’T CARE FOR BRITNEY SPEARS YOU DUNDERHEAD? #$%$%$&^

Content-Based Recommending Recommendations are based on information on the content of items rather than on other users’ opinions. Uses machine learning algorithms to induce a profile of the users preferences from examples based on a featural description of content. Lots of systems

Advantages of Content-Based Approach No need for data on other users. –No cold-start or sparsity problems. Able to recommend to users with unique tastes. Able to recommend new and unpopular items – No first-rater problem. Can provide explanations of recommended items by listing content-features that caused an item to be recommended. Well-known technology The entire field of Classification Learning is at (y)our disposal!

Disadvantages of Content-Based Method Requires content that can be encoded as meaningful features. Users’ tastes must be represented as a learnable function of these content features. Unable to exploit quality judgments of other users. –Unless these are somehow included in the content features.

Movie Domain EachMovie Dataset [Compaq Research Labs] –Contains user ratings for movies on a 0–5 scale. –72,916 users (avg. 39 ratings each). –1,628 movies. –Sparse user-ratings matrix – (2.6% full). Crawled Internet Movie Database (IMDb) –Extracted content for titles in EachMovie. Basic movie information: –Title, Director, Cast, Genre, etc. Popular opinions: –User comments, Newspaper and Newsgroup reviews, etc.

Content-Boosted Collaborative Filtering IMDb EachMovie Web Crawler Movie Content Database Full User Ratings Matrix Collaborative Filtering Active User Ratings Matrix (Sparse) Content-based Predictor Recommendations

Content-Boosted CF - I Content-Based Predictor Training Examples Pseudo User-ratings Vector Items with Predicted Ratings User-ratings Vector User-rated Items Unrated Items

Content-Boosted CF - II Compute pseudo user ratings matrix –Full matrix – approximates actual full user ratings matrix Perform CF –Using Pearson corr. between pseudo user-rating vectors User Ratings Matrix Pseudo User Ratings Matrix Content-Based Predictor

Why can’t the pseudo ratings be used to help content-based filtering? How about using the pseudo ratings to improve a content-based filter itself? –Learn a NBC classifier C 0 using the few items for which we have user ratings –Use C 0 to predict the ratings for the rest of the items –Loop Learn a new classifier C 1 using all the ratings (real and predicted) Use C 1 to (re)-predict the ratings for all the unknown items –Until no change in ratings With a small change, this actually works in finding a better classifier! –Change: Keep the class posterior prediction (rather than just the max class) This is called expectation maximization –Very useful on web where you have tons of data, but very little of it is labelled –Reminds you of K-means, doesn’t it?

(boosted) content filtering

Co-Training Motivation Learning methods need labeled data –Lots of pairs –Hard to get… (who wants to label data?) But unlabeled data is usually plentiful… –Could we use this instead??????

Co-training Suppose each instance has two parts: x = [x1, x2] x1, x2 conditionally independent given f(x) Suppose each half can be used to classify instance  f1, f2 such that f1(x1) = f2(x2) = f(x) Suppose f1, f2 are learnable f1  H1, f2  H2,  learning algorithms A1, A2 Unlabeled Instances [x1, x2] Labeled Instances A1 f2 Hypothesis ~ A2 Small labeled data needed You train me—I train you…

Observations Can apply A1 to generate as much training data as one wants –If x1 is conditionally independent of x2 / f(x), –then the error in the labels produced by A1 – will look like random noise to A2 !!! Thus no limit to quality of the hypothesis A2 can make

It really works! Learning to classify web pages as course pages –x1 = bag of words on a page –x2 = bag of words from all anchors pointing to a page Naïve Bayes classifiers –12 labeled pages –1039 unlabeled

Focussed Crawling Cho paper –Looks at heuristics for managing URL queue –Aim1: completeness –Aim2: just topic pages Prioritize if word in anchor / URL Heuristics: –Pagerank –#backlinks

Modified Algorithm Page is hot if: –Contains keyword in title, or –Contains 10 instances of keyword in body, or –Distance(page, hot-page) < 3

Results

More Results

Conclusions Recommending and personalization are important approaches to combating information over-load. Machine Learning is an important part of systems for these tasks. Collaborative filtering has problems. Content-based methods address these problems (but have problems of their own). Integrating both is best.