Recommender Systems Robin Burke DePaul University Chicago, IL.

Recommender Systems Robin Burke DePaul University Chicago, IL

About myself PhD 1993 Northwestern University PhD 1993 Northwestern University –Intelligent Multimedia Retrieval 1993-1998 1993-1998 –Post-doc at University of Chicago Kristian Hammond Kristian Hammond –Helped found Recommender, Inc. became Verb, Inc. became Verb, Inc. 1998-2000 1998-2000 –Dir. of Software Development –Adjunct at University of California, Irvine 2000-2002 2000-2002 –California State University, Fullerton 2002-present 2002-present –DePaul University

My Interests Memory Memory –How do we remember the right thing at the right time? –Why is it that computers are so bad at this? –How does knowledge of different types shape the activity of memory?

Organization 3 days 21 hours Not me talking all the time! Partners – –For in-class activities – –For coding labs For labs – –Must be one laptop per pair – –Using Eclipse / Java

Activity 1 With your partner With your partner One person should recommend a movie or DVD to the other One person should recommend a movie or DVD to the other –asking questions as necessary –in the end, you should be confident that they are right No right or wrong way to do this! No right or wrong way to do this! Take note Take note –the questions you ask –the reasons for the recommendation

Discussion Recommender Recommender –What did you have to ask? –How did you use this information? Recommendee Recommendee –What made you sure the recommendation was good?

Example: Amazon.com

Product similarity

Market-basket analysis

Profitability analysis

Sequential pattern mining

Application: Recommender.com

Similar movies

Applying a critique

New results

Knowledge employed Similarity metric Similarity metric –what makes something "alike"? –# of features in common is not sufficient Movies Movies –genres of movies –types of actors –directorial styles –meaning of ratings NR could mean adult, but it could just be a foreign movie NR could mean adult, but it could just be a foreign movie

This class Tuesday A. 8:00 – 10:30 B. 10:45 – 13:00 C. 15:00 – 18:00 Wednesday D. 8:00 – 10:00 E. 10:15 – 13:00 F. 17:00 – 19:00 Thursday G. 8:00 – 11:00 H. 14:30 – 16:00 I. 18:00 – 20:00

Roadmap Session A: Basic Techniques I Session A: Basic Techniques I –Introduction –Knowledge Sources –Recommendation Types –Collaborative Recommendation Session B: Basic Techniques II Session B: Basic Techniques II –Content-based Recommendation –Knowledge-based Recomme –Knowledge-based Recommendation Session C: Domains and Implementation I Session C: Domains and Implementation I –Recommendation domains –Example Implementation – –Lab I Session D: Evaluation I Session D: Evaluation I –Evaluation Session E: Applications Session E: Applications –User Interaction –Web Personalization Session F: Implementation II Session F: Implementation II –La –Lab II Session G: Hybrid Recommendation Session G: Hybrid Recommendation Session H: Robustness Session H: Robustness Session I: Advanced Topics Session I: Advanced Topics –Dynamics –Beyond accuracy

Recommender Systems Wikipedia: – –Recommendation systems are programs which attempt to predict items (movies, music, books, news, web pages) that a user may be interested in, given some information about the user's profile. My definition –Any system that guides the user in a personalized way to interesting or useful objects in a large space of possible options or that produces such objects as output.

Historical note Used to be a more restrictive definition Used to be a more restrictive definition –“people provide recommendations as inputs, which the system then aggregates and directs to appropriate recipients” (Resnick & Varian 1997)

Aspects of the definition basis for recommendation basis for recommendation –personalization process of recommendation process of recommendation –interactivity results of recommendation results of recommendation –interest / useful objects

Personalization –Any system that guides the user in a personalized way to interesting or useful objects in a large space of possible options or that produces such objects as output. Definitions agree that recommendations are personalized Definitions agree that recommendations are personalized –Some might say that suggesting a best-seller to everyone is a form of recommendation Meaning Meaning –the process is guided by some user-specific information could be a long-term model could be a long-term model could be a query could be a query

Interactivity –Any system that guides the user in a personalized way to interesting or useful objects in a large space of possible options or that produces such objects as output. Many possible interaction styles Many possible interaction styles –query / retrieve –recommendation list –predicted rating –dialog

Results –Any system that guides the user in a personalized way to interesting or useful objects in a large space of possible options or that produces such objects as output. Recommendation = Search? Recommendation = Search? Search Search –a query matching process –given a query return all items that match it return all items that match it Recommendation Recommendation –a need satisfaction process –given a need return items that are likely to satisfy it return items that are likely to satisfy it

Some definitions Recommendation Recommendation Items Items Domain Domain Users Users Ratings Ratings Profile Profile

Recommendation A prediction of a given user's likely preference regarding an item A prediction of a given user's likely preference regarding an item Issues Issues –Negative prediction –Presentation / Interface Notation Notation –Pred(u,i)

Items The things being recommended The things being recommended –can be products –can be documents Assumption Assumption –Discrete items are being recommended –Not, for example, contract terms Issues Issues –Cost –Frequency of purchase –Customizability –Configurations Notation Notation –I = set of all items –i = an individual item

Recommendation Domain What is being recommended? What is being recommended? –a $0.99 music track? –a $1.9 M luxury condo? Much depends on the characteristics of the domain Much depends on the characteristics of the domain –cost how costly is a false positive? how costly is a false positive? how costly is a false negative? how costly is a false negative? –portfolio OK to recommend something that the user has already seen? OK to recommend something that the user has already seen? compatibility with owned items? compatibility with owned items? – individual vs group are we recommending something for individual or group consumption? are we recommending something for individual or group consumption? –single item vs configuration are we recommending a single item or a configuration of items? are we recommending a single item or a configuration of items? what are the constraints that tie configurations together? what are the constraints that tie configurations together? –constraints what types of constraints are users likely to impose (hard vs soft)? what types of constraints are users likely to impose (hard vs soft)?

Example 1 Music track (ala iTunes) Music track (ala iTunes) –low cost –individual –configuration fit into existing playlist? fit into existing playlist? –portfolio should not be already owned should not be already owned –constraints likely to be soft likely to be soft

Example 2 Course advising Course advising –high cost –individual –configuration must fit with other courses must fit with other courses prerequisites prerequisites –portfolio should not have already been taken should not have already been taken –constraints may be hard may be hard –graduation requirements –time and day

Example 3 DVD rental DVD rental –low cost –group consumption –no configuration issues –portfolio possible to recommend a favorite title again possible to recommend a favorite title again –Christmas movies –constraints likely to be soft likely to be soft some could be hard like maximum allowed rating some could be hard like maximum allowed rating

Users People who need / want items People who need / want items Assumption Assumption –(Usually) repeat users Issues Issues –Portfolio effects Notation Notation –U = set of all users –u = a particular user

Ratings A (numeric) score given by a user to a particular item representing the user's preference for that item. A (numeric) score given by a user to a particular item representing the user's preference for that item. Assumption Assumption –Preferences are static (or at least of long duration) Issues Issues –Multi-dimensional ratings –Context-dependencies Notation Notation –r u,i = a rating of item i by user u –R U,i = R i = the ratings of item i by all users

Explicit vs Implicit Ratings A explicit rating is one that has been provided by a user A explicit rating is one that has been provided by a user –via a user interface An implicit rating is inferred from user behavior An implicit rating is inferred from user behavior –for example, as recorded in web log data Issues Issues –effort threshold –noise

Collecting Explicit Ratings

Profile A user profile is everything that the system knows about a particular user A user profile is everything that the system knows about a particular user Issues Issues –profile dimensionality Notation Notation –P = all profiles –P u = the profile of user u

Knowledge Sources An AI system requires knowledge An AI system requires knowledge Takes various forms Takes various forms –raw data –algorithm –heuristics –ontology –rule base

In Recommendation Social knowledge Social knowledge User knowledge User knowledge Content knowledge Content knowledge

Knowledge source: Collaborative A collaborative knowledge source is one that holds information about peer users in a system A collaborative knowledge source is one that holds information about peer users in a system Examples Examples –ratings of items –age, sex, income of other users

Knowledge source: User A user knowledge source is one that holds information about the current user A user knowledge source is one that holds information about the current user –the one who needs a recommendation Example Example –a query the user has entered –a model of the user's preferences

Knowledge source: Content A content knowledge source holds information about the items being recommended A content knowledge source holds information about the items being recommended Example Example –knowledge about how items satisfy user needs –knowledge about the attributes of items

Recommendation Knowledge Sources Taxonomy Recommendation Knowledge Collaborative Content User Opinion Profiles Demographic Profiles Opinions Demographics Item Features Means-ends Domain Constraints Contextual Knowledge Requirements Query Constraints Preferences Context Domain Knowledge Feature Ontology

Recommendation Types Default (non-personalized) Default (non-personalized) –“Would you like fries with that?” Collaborative Collaborative –“Most people who bought hamburgers also bought fries.” Demographic Demographic –“Most 45-year-old computer scientists buy fries.” Content-based Content-based –“You usually buy fries with your burgers.” Knowledge-based Knowledge-based –“A large order of curly fries would really complement the flavor of a Western Bacon Cheeseburger.”

Collaborative Key knowledge source Key knowledge source –opinion database Process Process –given a target user, find similar peer users –extrapolate from peer user ratings to the target user

Demographic Key knowledge sources Key knowledge sources –Demographic profiles –Opinion profiles Process Process –for target user, find users of similar demographic –extrapolate from similar users to target user

Content-based Key knowledge sources Key knowledge sources –User’s opinion –Item features Process Process –learn a function that maps from item features to user’s opinion –apply this function to new items

Knowledge-based Key knowledge source Key knowledge source –Domain knowledge Process Process –determine user’s requirements –apply domain knowledge to determine best item

Collaborative Recommendation Identify peers Generate recommendation

Recommendation Knowledge Sources Taxonomy Recommendation Knowledge Collaborative Content User Opinion Profiles Demographic Profiles Opinions Demographics Item Features Means-ends Domain Constraints Contextual Knowledge Requirements Query Constraints Preferences Context Domain Knowledge Feature Ontology

Two Problems Generate neighborhood Generate neighborhood –Peers should be users with similar needs / tastes –How to identify peer users? Generate predictions Generate predictions –Basic assumption = consistency in preference –Prefer those items generally liked by peers

Opinion Profile Consist of ratings of items Consist of ratings of items –P u = {r u,i i  I} –usually discrete numerical values We can think of such a profile as a vector We can think of such a profile as a vector – – –some (most) ratings will be missing –the vector is sparse The collection of all ratings for all users The collection of all ratings for all users –the rating matrix –usually very sparse

Cosine The angle between two vectors is given by The angle between two vectors is given by θ

Example Cosine similarity with Alice Cosine similarity with Alice

Cosine, cont'd Useful as a metric Useful as a metric –varies between -1 and 1 approaches 1 if angle is small approaches 1 if angle is small approches -1 if angle is near 180º approches -1 if angle is near 180º Common in information retrieval Common in information retrieval

Mean Adjustment Cosine is sensitive to the actual values in the vector Cosine is sensitive to the actual values in the vector –but users often have different "baseline" preferences –one might never rate an item below 3 / 5 –another might only rarely give a 5 / 5 These differences in scale These differences in scale –can mask real similarities between preferences Missing entries Missing entries –are effectively zero (very negative rating) Solution Solution –mean-adjustment –subtract the user's mean from each rating an item that gets an average score becomes a 0 an item that gets an average score becomes a 0 below average becomes negative below average becomes negative

Mean Adjusted Cosine

Example User6 now most similar User6 now most similar –because missing items aren't a penalty

Problem How to handle missing ratings? How to handle missing ratings? –sparsity Cosine Cosine –assumes a value for these values –regular cosine assumes zero (not a valid rating) assumes zero (not a valid rating) –adjusted cosine assumes the user's mean assumes the user's mean Neither really satisfactory Neither really satisfactory

Correlation Don't think of ratings as dimensions Don't think of ratings as dimensions Think of them as samples of a random variable Think of them as samples of a random variable –user opinion –taken at different points Try to estimate whether two user's opinions move in the same way Try to estimate whether two user's opinions move in the same way –if they are correlated

Correlation

Pearson's r Measurement of the correlation tendency of paired measurements Measurement of the correlation tendency of paired measurements –covariance / product of std. dev. Items not co-rated are not considered Items not co-rated are not considered

Cosine vs Correlation

Example

Neighborhood Size Too few Too few –prediction based on only a few neighbors Too many Too many –distant neighbors included –niche not specifically identified –taken to extreme overall average overall average

Sparsity What if the neighbor has only a few ratings in common with the target? What if the neighbor has only a few ratings in common with the target? Possible to compute correlation with just two ratings in common Possible to compute correlation with just two ratings in common

Example

Considerations in Prediction Proximity Proximity –should nearer neighbors get more say Sparsity Sparsity –should neighbors with less overlap get less (or no) say Baseline Baseline –different users have different average ratings All of these factors can be included in making predictions All of these factors can be included in making predictions

Typical prediction formula Take the user’s average Take the user’s average –add a weighted average of the neighbors –weight using the similarity scores

Collaborative Recommendation Advantages Advantages –possible to make recommendations knowing nothing about the items –extends common social practice, exchange of opinions –possible to find niches of users with obscure combinations of interests –possible to make disparate connections (serendipity) Disadvantages Disadvantages –vulnerability to manipulation (more later) –source of ratings needed explicit ratings preferred explicit ratings preferred –cold start problems (next slide)

Cold Start Problem New item New item –how can a new item be recommended? no users have rated it no users have rated it –must wait for the first person to rate it –possible solution: genre bot New user New user –how can a new user get a recommendation needs a profile that can be compared with others needs a profile that can be compared with others –possible solutions wait for user to rate items wait for user to rate items require users to rate items require users to rate items give some default recommendations while waiting for data give some default recommendations while waiting for data

Recommender Systems Robin Burke DePaul University Chicago, IL.

Similar presentations

Presentation on theme: "Recommender Systems Robin Burke DePaul University Chicago, IL."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recommender Systems Robin Burke DePaul University Chicago, IL.

Similar presentations

Presentation on theme: "Recommender Systems Robin Burke DePaul University Chicago, IL."— Presentation transcript:

Similar presentations

About project

Feedback