Download presentation
Presentation is loading. Please wait.
Published byRudolph Chapman Modified over 9 years ago
1
What do you recommend? THOSE … THIS … THAT 1
2
what you’ll read next summer (Amazon, Barnes&Noble) what movies you should watch… (Reel, RatingZone, Amazon) what websites you should visit (Alexa) I know all about you … what music you should listen to… (CDNow, Mubu, Gigabeat) what jokes you will like (Jester) & who you should date (Yenta) 2
3
Recommendations Items Search Results & Recommendations Products, web sites, blogs, news items, … 3
4
From scarcity to abundance Shelf space is a scarce commodity for traditional retailers – Also: TV networks, movie theaters, book shelves, … The web enables near-zero-cost dissemination of (information about) products – From scarcity to abundance – The long tail phenomenon (see the next slide) More choice necessitates better filters – Recommendation engines – How Into Thin Air made Touching the Void a bestseller (read the article on The Long Tail) 4
5
The Long Tail Source: Chris Anderson (2004) 5
6
Recommendation Types Editorial and hand curated – List of favorites – List of “essential” items Simple aggregates – Top 10, Most Popular, Recent Uploads Personalized – Tailored to individual users – Amazon, Netflix, … 6
7
Formal Model C = set of Customers S = set of Items Utility function U: C × S R – R = set of ratings – R is a totally ordered set – e.g., 0-5 stars, real number in [0,1] King Kong LOTRMatrixNacho Libre Alice10.2 Bob0.50.3 Carol0.21 David0.4 Utility Matrix 7
8
Key Problems Gathering “known” ratings for matrix – How to collect the data in the utility matrix? Extrapolate unknown ratings from known ratings – Mainly interested in high unknown ratings Interested in what you like, and not what you don’t like Evaluating extrapolation methods – How to measure success/performance of recommendation methods? 8
9
Gathering Ratings Explicit – Ask people to rate items – Doesn’t work well in practice – people can’t be bothered Implicit – Learn ratings from user actions e.g., purchase implies high rating, clicks, time spent on some page, demo downloads, … One cannot be sure whether user behavior is correctly interpreted – A user may not like all books he or she has bought; the book may be a gift, … – What about low ratings? 9
10
Extrapolating Utilities Key problem: matrix U is sparse – Most people have not rated most items – Cold start: New items have no ratings New users have no history Recommendation system (RS) seen as a function – Given: User model (e.g. ratings, preferences, demographics, situational context) Items (with or without description of item characteristics) – Find: Relevance score. Used for ranking. 10
11
Paradigms of recommender systems Recommender systems reduce information overload by estimating relevance 11
12
Paradigms of recommender systems Personalized recommendations 12
13
Paradigms of recommender systems Collaborative: "Tell me what's popular among my peers" 13
14
Paradigms of recommender systems Content-based: "Show me more of the same what I've liked" 14
15
Paradigms of recommender systems Knowledge-based: "Tell me what fits based on my needs" 15
16
Paradigms of recommender systems Hybrid: combinations of various inputs and/or composition of different mechanism 16
17
Content-based recommendations Main idea: recommend to customer (C) items that are similar to previous items rated highly by C Examples – Movie recommendations recommend movies with same actor(s), director, genre, … – Websites, blogs, news recommend other sites with “similar” content – People recommend people with many common friends 17
18
Content-based recommendation What do we need: ─Information about the available items such as the genre ("content") ─User profile describing what the user likes (the preferences) The task: ─learn user preferences ─locate/recommend items that are "similar" to the user preferences "show me more of the same what I've liked" 18
19
Plan of action likes Item profiles RedCirclesTriangles User profile match recommend build 19
20
Item Profiles For each item, create an item profile Profile is a set (vector) of features – movies: author, title, actor, director,… – text: set of “important” words in document – people: set of friends View item profile as a vector – One entry per feature (e.g., each actor, director, …) – Vector might be boolean or real-valued 20
21
User profiles Vector to describe user preferences – Aggregation of profiles of items that user likes Possibilities: – Weighted average of rated item profiles Items are movies Features are actors Item profile is a boolean vector where a “1” is set for the component corresponding to the actor in the movie For a particular actor, the entry in the user profile is the average over all the movies the user rated, e.g., if the user rated 5 movies and Julia Roberts appear in 3 of them, then the corresponding entry of user profile for Julia Roberts has a value of 0.6 – Variation: weight by difference from average rating for item This is suitable for non-boolean ratings, e.g., score of 1-5. Suppose a user’s average rating is 3, and the user rates an actor as 3, 4 and 5 in 3 movies The value for the actor will then be the average of [(3-3)+(4-3)+(5- 3)]/3 = 1 –…–… 21 a1a2a3…ak i111 i2111 i31111 U1.3.6.3.6
22
Prediction Prediction heuristic – Given user profile c and item profile s, estimate utility(c,s) (similarity function between c and s) E.g., cos(c,s) = c.s/(|c||s|) – Need efficient method to find items with high utility 22
23
Let’s take a closer look at text data … Most content-based recommendation techniques were applied to recommending text documents – Like web pages or newsgroup messages for example Content of items can also be represented as text documents – With textual descriptions of their basic characteristics. – Structured: Each item is described by the same set of attributes – Unstructured: free-text description TitleGenreAuthorTypePriceKeywords The Night of the Gun MemoirDavid CarrPaperback29.90Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia BarryHardcover49.90American contemporary fiction, detective, historical Into the FireRomance, Suspense Suzanne Brockmann Hardcover45.90American fiction, murder, neo- Nazism 23
24
Item representation Content representation and item similarities Simple approach – Compute the similarity of an unseen item with the user profile based on the keyword overlap (e.g. using the Dice coefficient) – Or use and combine multiple metrics TitleGenreAuthorTypePriceKeywords The Night of the Gun MemoirDavid CarrPaperback29.90Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia BarryHardcover49.90American contemporary fiction, detective, historical Into the FireRomance, Suspense Suzanne Brockmann Hardcover45.90American fiction, murder, neo- Nazism User profile Title GenreAuthorTypePriceKeywords … FictionBrunonia, Barry, Ken Follett Paperback25.65Detective, murder, New York 24
25
What is the (item) profile of a document? – Keeping all words is too costly and unnecessary Simple keyword representation has its problems – not every word has similar importance – longer documents have a higher chance to have an overlap with the user profile How to pick important words (for text)? – Usual heuristic is TF.IDF (Term Frequency times Inverse Doc Frequency) Item – document Feature - term How to pick important words (for text)? 25
26
TF: Measures, how often a term appears (density in a document) assuming that important terms appear more often normalization has to be done in order to take document length into account TF.IDF f ij = frequency of term t i in document d j n i = number of docs that mention term i N = total number of docs TF.IDF score w ij = TF ij × IDF i Doc profile = set of words with highest TF.IDF scores, together with their scores IDF: Aims to reduce the weight of terms that appear in all documents; rare words are given more weightage 26
27
Example TF-IDF representation Example taken from http://informationretrieval.org Antony and Cleopatra Julius CaesarThe TempestHamletOthelloMacbeth Antony157730000 Brutus41570100 Caesar2322270211 Calpurnia0100000 Cleopatra5700000 mercy1.5103551 worser1.3701110 27
28
Example TF-IDF representation Example taken from http://informationretrieval.org Antony and Cleopatra Julius Caesar The Tempest HamletOthelloMacbeth Antony157730000 Brutus41570100 Caesar2322270211 Calpurnia0100000 Cleopatra5700000 mercy1.5103551 worser1.3701110 28
29
Improving the vector space model Vectors are usually long and sparse remove stop words – They will appear in nearly all documents. – e.g. "a", "the", "on", … use stemming – Aims to replace variants of words by their common stem – e.g. "went" "go", "stemming" "stem", … size cut-offs – only use top n most representative words to remove "noise" from data – e.g. use top 100 words 29
30
Cosine similarity Given: Idea: Think of points as vectors from the origin [0, 0, …, 0], and measure distance between the vectors based on the angle they made with the origin Thus: 30
31
Recommending documents Simple method: Nearest neighbors – Given a set of documents D already rated by the user (like/dislike) Either explicitly via user interface Or implicitly by monitoring user’s behavior – Find the n nearest neighbors in D of a not-yet-seen document i – Take these neighbors to predict a rating for i E.g., k = 5 most similar documents to i; suppose 4 of these are liked by current user; then document i will also be liked by this user 31
32
Pros: Content-based Approach No need for data on other users Able to recommend to users with unique tastes Able to recommend new & popular items – No first-rater problem Able to provide explanations – Can provide explanations of recommended items by listing content-features that caused an item to be recommended 32
33
Requires content that can be encoded as meaningful features – finding appropriate features can be hard, e.g., images, movies, music Users’ tastes must be represented as a learnable function of these content features Overspecialization – Never recommends items outside user’s content profile – People might have multiple interests Unable to exploit the quality judgements of other users Cold-start problem for new users – How to build a user profile? Cons: Content-based Approach 33
34
So far … RS have important applications, many of which have been successful in practice Item profiles, user profiles, utility matrix Content-based recommendation Focus on recommendation of documents/textual content 34
35
Collaborative Filtering (CF) The most prominent approach to generate recommendations – used by large, commercial e-commerce sites – well-understood, various algorithms and variations exist – applicable in many domains (book, movies, DVDs,..) Approach – use the "wisdom of the crowd" to recommend items Basic assumption and idea – Users give ratings to catalog items (implicitly or explicitly) – Customers who had similar tastes in the past, will have similar tastes in the future 35
36
CF Approaches Input – Only a matrix of given user–item ratings Approach – Consider user c – Find set D of other users whose ratings are “similar” to c’s ratings – Estimate c’s ratings based on ratings of users in D Output types – A (numerical) prediction indicating to what degree the current user will like or dislike a certain item – A top-k list of recommended items 36
37
Similarity Metric HP1HP2HP3TWSW1SW2SW3 A451 B554 C245 D33 37 Jaccard similarity: sim(A,B) =1/5 < sim(A,C) = 2/4 – Ignores the value of the ratings Cosine similarity: sim(A, B) = 0.386 > sim(A,C) = 0.322 – Treat missing ratings as 0 – Implications? Missing ratings is “negative”/dislike Which is correct???? – Intuitively, we want sim(A,B) > sim(A,C) user item
38
Similarity Metric HP1HP2HP3TWSW1SW2SW3 A2/35/3-7/3 B1/3 -2/3 C-5/31/34/3 D00 38 Centered Cosine similarity (Pearson Correlation) – Normalize rating by subtracting the (row) mean – D’s ratings have effectively disappeared (since 0 is the same as blank!) Not unreasonable! D’s opinion cannot be taken seriously sim(A,B) = 0.092 > sim(A,C) = -0.559 Captures intuition better – Treat missing ratings as “average” (neutral) – Handles “tough raters” and “easy raters” = 2 – (2+4+5)/3
39
User-based nearest-neighbor CF Basic technique – Given an “active user” (Alice) and an item i not yet seen by Alice Find a set of users (nearest neighbors) who liked the same items as Alice in the past AND who have rated item i Use, e.g., the average of their ratings (or other similarity measures like cosine similarity or pearson correlation) to predict if Alice will like item i Do this for all items Alice has not seen and recommend the (top-k) best-rated Basic assumption and idea – If users had similar tastes in the past, they will have similar tastes in the future – User preferences remain stable and consistent over time 39
40
User-based nearest-neighbor CF Example – A database of ratings of the current user, Alice, and some other users is given: – Determine whether Alice will like or dislike Item5, which Alice has not yet rated or seen Item1Item2Item3Item4Item5 Alice5344 ? User131233 User243435 User333154 User415521 40
41
User-based nearest-neighbor CF Some first questions – How do we measure similarity? – How many neighbors should we consider? – How do we generate a prediction from the neighbors' ratings? Item1Item2Item3Item4Item5 Alice5344 ? User131233 User243435 User333154 User415521 41
42
Measuring user similarity: Pearson correlation A popular similarity measure in user-based CF a, b: users r a,p : rating of user a for item p r a : average rating of user a P: set of items, rated both by a and b Possible similarity values between -1 and 1 42 Cosine similarity
43
Pearson correlation Item1Item2Item3Item4Item5 Alice5344 ? User131233 User243435 User333154 User415521 sim = 0.85 sim = 0.00 sim = 0.70 sim = -0.79 A popular similarity measure in user-based CF a, b: users r a,p : rating of user a for item p r a : average rating of user a P: set of items, rated both by a and b Possible similarity values between -1 and 1 43
44
Pearson correlation Takes differences in rating behavior into account Works well in usual domains, compared with alternative measures – such as cosine similarity 44
45
Making predictions A common prediction function: N is the number of neighbors Calculate whether the neighbors’ ratings for the unseen item i are higher or lower than their average Combine the rating differences – use the similarity with user a as a weight Add/subtract the neighbors’ bias from the active user’s average and use this as a prediction In our example: assume N = 2, – pred(Alice, item5) = 4 + ([0.85*(3-2.4)+0.7*(4-3.2)]/1.55) = 4.69! 45
46
Item-Item Collaborative Filtering So far: User-user collaborative filtering Another view: item-item – For item s, find other similar items (that has been rated by the user) – Estimate rating for item s based on ratings (of the user) for these similar items – Can use same similarity metrics and prediction functions as in user-user model In practice, it has been observed that item-item often works better than user-user. Why? – Items are simpler (items belong to a small set of “genres”), users have multiple tastes 46
47
Item-based CF Example: – Look for items that are similar to Item5 – Take Alice's ratings for these items to predict the rating for Item5 Item1Item2Item3Item4Item5 Alice5344 ? User131233 User243435 User333154 User415521 47
48
Similarity metrics and making predictions Similarity can be measured by cosine similarity A common prediction function: Neighborhood size is typically also limited to a specific size Not all neighbors are taken into account for the prediction 48
49
Item-Item CF (|D|=2) 49
50
Item-Item CF (|D|=2) 50
51
51 Item-Item CF (|D|=2) Neighbor selection: Identify movies similar to Movie 1, rated by user 5 Computing item similarity: 1)Subtract mean rating mi from each movie i -m1 = (1+3+5+5+4)/5 = 3.6 -Row 1: (-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0) 2)Compute cosine similarities between rows
52
52 Item-Item CF (|D|=2) Neighbor selection: Identify movies similar to Movie 1, rated by user 5 Computing item similarity: 1)Subtract mean rating mi from each movie i -m1 = (1+3+5+5+4)/5 = 3.6 -Row 1: (-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0) 2)Compute cosine similarities between rows
53
53 Item-Item CF (|D|=2) Compute similarity weights: S 1,3 = 0.41; s 1,6 = 0.59
54
54 Item-Item CF (|D|=2) Predict by taking weighted average: r 1,5 = (0.41*2 + 0.59*3)/(0.41+0.59) = 2.6
55
Pros and cons of collaborative filtering Pros – Works for any kind of item No feature selection needed Cons: – Cold start New user problem – Need enough users in the system to find a match New item problem – Cannot recommend an item that has not been previously rated. E.g., new items, esoteric items – Sparsity of rating matrix Hard to find users that have rated the same items Cluster-based smoothing? – First rater Cannot recommend an unrated item Cannot recommend items to someone with unique taste – Popularity bias: Tends to recommend popular items 55
56
So far, … Collaborative Filtering – User-to-user Focus on people – Item-to-item Focus on item – Same mechanism can be used for both Item-to-item is generally superior 56
57
Hybrid recommender systems All base techniques are naturally incorporated by a good sales assistant (at different stages of the sales act) but have their shortcomings – For instance, cold start problems Idea of crossing two (or more) implementations – Avoid some of the shortcomings – Reach desirable properties not (or only inconsistently) present in parent individuals Different hybridization designs – Parallel use of several systems – Monolithic exploiting different features – Pipelined invocation of different systems 57
58
Monolithic hybridization design Only a single recommendation component Hybridization is "virtual" in the sense that – Features/knowledge sources of different paradigms are combined 58
59
Monolithic hybridization designs: Feature augmentation Content-boosted collaborative filtering Based on content features additional ratings are created – E.g. Alice likes Items 1 and 3 (unary ratings) Item7 is similar to 1 and 3 by a degree of 0.75 Thus Alice likes Item7 by 0.75 – Item matrices become less sparse – Significance weighting and adjustment factors Peers with more co-rated items are more important Higher confidence in content-based prediction, if higher number of own ratings 59
60
Parallelized hybridization design Output of several existing implementations combined Least invasive design Some weighting or voting scheme – Weights can be learned dynamically – Extreme case of dynamic weighting is switching 60
61
Recommender weighted(0.5:0.5) Item10.651 Item20.452 Item30.353 Item40.054 Item50.00 Parallelized hybridization design: Weighted Compute weighted sum: Recommender 1 Item10.51 Item20 Item30.32 Item40.13 Item50 Recommender 2 Item10.82 Item20.91 Item30.43 Item40 Item50 61
62
Pipelined hybridization designs One recommender system pre-processes some input for the subsequent one, e.g., refinement of recommendation lists (cascade) 62
63
Pipelined hybridization designs: Cascade Recommendation list is continually reduced First recommender excludes items – Remove absolute no-go items (e.g. knowledge-based) Second recommender assigns score – Ordering and refinement (e.g. collaborative) 63
64
Evaluating Predictions Compare predictions with known ratings – Root-mean-square error (RMSE) Other metrics – Coverage Number of items/users for which system can make predictions – Precision Accuracy of predictions – Receiver operating characteristic (ROC) Tradeoff curve between false positives and false negatives 64
65
Problems with Measures Narrow focus on accuracy sometimes misses the point – Prediction Diversity – Prediction Context – Order of predictions In practice, we care only to predict high ratings – RMSE might penalize a method that does well for high ratings and badly for others 65
66
Conclusion RS have important applications, many of which have been successful in practice Content-based RS – Focused on items Collaborative Filtering – Focused on people – User-based – Item-based Hybrid methods that combine both CB and CF 66
67
Acknowledgements Slides adapted from numerous sources: – Jure Leskovec and Jeff Ullman (Stanford) – Max Welling (UCI) 67
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.