Download presentation
Presentation is loading. Please wait.
Published byAlvin O’Neal’ Modified over 9 years ago
1
David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge
2
Overview Motivation. Matchbox model. Model Training. Accuracy. Generating Fast Recommendations. Compositionality Applications
3
Large scale personal recommendations UserUserItemItem
4
Collaborative Filtering 112233445566 AA BB CC DD Users Items ?????? Metadata?
5
Large Scale Personal Recommendations: –Products. –Services. –People. Leverage user and item metadata. Flexible feedback: –Ratings. –Clicks. Incremental Training. Goals
7
Map Sparse Features To ‘Trait’ Space 234566 456457 13456 654777 User ID Male Female Gender Country UK USA 1.2m Height 34 345 64 5474 Item ID Horror Movie Genre Drama Documentary Comedy
8
Matchbox With Metadata rr User Metadata ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 Item Metadata t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 User ‘trait’ 1 User ‘trait’ 2 Male British Camera SLR u 01 u 02 ID=234 User Item Rating potential ~
10
Factor Graphs / Trees Definition: Graphical representation of product structure of a function (Wiberg, 1996) –Nodes: = Factors = Variables –Edges: Dependencies of factors on variables. Question: –What are the marginals of the function (all but one variable are summed out)?
11
ss s2s2s2s2 s2s2s2s2 s1s1s1s1 s1s1s1s1 Factor Graphs and Bayesian Inference Bayes’ lawBayes’ law Factorising priorFactorising prior Factorising likelihoodFactorising likelihood Sum out latent variablesSum out latent variables t1t1t1t1 t1t1t1t1 t2t2t2t2 t2t2t2t2 dd yy
12
Factor Trees: Separation v v w w x x f1(v,w)f1(v,w) f1(v,w)f1(v,w) f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Sum of products becomes product of sums of all messages from neighbouring factors to variable! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
13
Messages: From Factors To Variables w w x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Factors only need to sum out all their local variables! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
14
Messages: From Variables To Factors x x f2(w,x)f2(w,x) f2(w,x)f2(w,x) Observation: Variables pass on the product of all incoming messages! y y f3(x,y)f3(x,y) f3(x,y)f3(x,y) z z f4(x,z)f4(x,z) f4(x,z)f4(x,z)
15
The Sum-Product Algorithm Three update equations (Aji & McEliece, 1997) Update equations can be directly derived from the distributive law.Update equations can be directly derived from the distributive law. Efficient for messages in the exponential family.Efficient for messages in the exponential family. Calculate all marginals at the same time.Calculate all marginals at the same time.
16
Approximate Message Passing Problem: The exact messages from factors to variables may not be closed under products. Solution: Approximate the marginal as well as possible in the sense of minimal KL divergence. Expectation Propagation (Minka, 2001): Approximate the marginal by moment-matching resulting in
17
Gaussian Message Passing * * = = * * = = ≈ ≈
18
Distributed Message Passing Non-distributed Distributed
19
Message Passing For Matchbox rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02
20
Message Passing For Matchbox rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02 ** ++ Message update functions powered by Infer.net
21
User/Item Trait Space ‘Preference Cone’ for user 145035
22
Incremental Training with ADF 112233445566 AA BB CC DD Users Items
23
ADF: Message Passing Iteration 1
24
Message Passing Iteration 2
25
Message Passing Iteration 3
26
Message Passing Iteration 4
28
rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02 Feedback Models
29
rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 u 01 u 02
30
Feedback Models rr =3=3 qq
31
t0t0t0t0 t0t0t0t0 t1t1t1t1 t1t1t1t1 t2t2t2t2 t2t2t2t2 t3t3t3t3 t3t3t3t3 >> >> << << rr qq
32
rr >0>0 qq
34
Performance and Accuracy MovieLens Data 1 million ratings1 million ratings 3,900 movies / 6,040 users3,900 movies / 6,040 users User / movie metadataUser / movie metadata
35
MovieLens – 1,000,000 ratings User Job OtherLawyer AcademicProgrammer ArtistRetired AdminSales StudentScientist Customer Service Self-Employed Health CareTechnician ManagerialCraftsman FarmerUnemployed HomemakerWriter User Age <18 18-25 25-34 35-44 45-49 50-55 >55 User Gender Male Female Movie Genre ActionHorror AdventureMusical AnimationMystery Children’sRomance ComedyThriller CrimeSci-Fi DocumentaryWar DramaWestern FantasyFilm Noir 6,040 users 3,900 movies User IDMovie ID
36
MovieLens with Thresholds Model (ADF), Training Time= 1 Minute Mean Absolute Error
37
MovieLens Error with Thresholds Mean Absolute Error
39
Recommendation Speed Goal: find N items with highest predicted rating. Challenge: potentially have to consider all items. Two approaches to make this faster: –Locality Sensitive Hashing –KD Trees Locality Sensitive Hash:
40
Random Projection Hashing 0 10 0 1 Random Projections: –Generate random hyper planes (m random vectors, a i ). –Gives m bit hash,, by: p(all bits match) cosine similarity. Store items in buckets indexed by keys. Given a user trait vector: 1.Generate key, q. 2.Search buckets by hamming distance from q until find N items.
41
Accuracy and Speedup
43
Context Model User Model Item Model Message Passing: Compositionality rr ** s1s1s1s1 s1s1s1s1 ++ u 11 u 21 s2s2s2s2 s2s2s2s2 ++ u 12 u 22 t1t1t1t1 t1t1t1t1 ++ v 11 v 21 t2t2t2t2 t2t2t2t2 ++ v 12 v 22 ++ x4x4x4x4 x4x4x4x4 x3x3x3x3 x3x3x3x3 x2x2x2x2 x2x2x2x2 x1x1x1x1 x1x1x1x1 >0>0 qq Feedback Model
45
Applications Ranking of content on web portalsOnline advertising (Display and Paid Search)Personalised web searchAlgorithm portfolio managementTweet/News recommendationFriends recommendation on social platforms
47
Conclusions Collaborative Filtering with Content information. Users and items compared in same ‘trait space’. Fast training by message passing. Fast recommendations by random projections. Flexible feedback model. Many valuable application scenarios
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.