Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian

Similar presentations


Presentation on theme: "Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian"— Presentation transcript:

1 Towards Personalized Context-Aware Recommendation by Mining Context Logs through Topic Models
Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian Springer-Verlag Berlin Heidelberg 2012

2 Goal To mine common context-aware preferences (CCPs) from many users’ context logs through topic models and represent each user’s personal context-aware preferences as a distribution of the mined common context-aware preferences. first we extract bags of Atomic Context-Aware Preference (ACP) Features for each user from their historical context logs. Then, we propose to mine CCPs from users’ ACP-feature bags through topic models. Finally, we make recommendations according to the given contexts and CCP distributions of users.

3 User Context Logs A real-world data set of context logs collected from 443 mobile phone users spanning for several months, which contains more than 8.8 million context records, 665 different interactions (activities) in 12 content categories.

4 Related Work Only leverage individual user’s historical context data preferences Do not take into account the problem of insufficient personal training data A personalized recommender system to recommend travel related information Location-based personalized recommender system by Bayesian Networks Points-of-interest (POI) for users in an automotive scenario by leveraging a Multi-Criteria Decision Making (MCDM) Based on rating logs of mobile users and the objective is to predict accurate ratings for the unobserved items under different contexts Collaborative Filtering (CF) based approaches leverage a classification rule of decision tree to understand users’ personal preference The approach can model user, location and activity as a 3-dimensional matrix, namely tensor modeled the rich contextual information with item by N-dimensional tensor, and proposed a novel algorithm to make tensor factorization It’s easier to collect context logs which contain users’ historical context data and activity records than rating data in user mobile devices.

5 Preliminary Capture the historical context data and corresponding activity records as contextual feature-value pairs: contextual features (e.g., Day name, Time range, and Location) corresponding values (e.g., Saturday, AM8:00-9:00, and Home) Transform raw location based context data such as GPS coordinates or cell Ids into social locations by some existing location mining approaches : “Home” and “Work Place” Transform raw activity records by mapping the activity of using a particular application: Transform two raw activity records “Play Angry Birds” and “Play Fruit Ninja” to same activity records “Play action games” User u with C prefers activity a : a = activity C = {p} = Atomic context p = independent context z = a variable of CCP

6 Example of activity records “Play action games”
ACP-Feature1 ACP-Feature2 ACP-Feature3 n User1 Play action game Saturday, evening, Home Play video Use facebook Monday, morning, WorkPlace User2 Play action game Sunday, evening, By the way Plan timetable Play song list Monday, afternoon, WorkPlace P(z|u) Games Topic (z1) Business Topic (z2) Music Topic (z3) Social Topic (z4) Others… User1 38/56 3/56 5/56 8/56 User2 36/125 24/125 12/125 15/125 P(a,p|z) z1 z2 z3 z4 Others… Games- Saturday, evening, Home 0.78 0.12 0.22 0.09 Games - Sunday, evening, By the way 0.77 0.11 0.17 0.03 Social network service - Monday, morning, WorkPlace 0.25 0.23 0.21 0.53

7 LDA (Latent Dirichlet Allocation)
Suppose you have the following set of sentences: I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli. LDA is a way of automatically discovering topics that these sentences contain. For example, given these sentences and asked for 2 topics, LDA might produce something like Sentences 1 and 2: 100% Topic A Sentences 3 and 4: 100% Topic B Sentence 5: 60% Topic A, 40% Topic B Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, … (at which point, you could interpret topic A to be about food) Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, … (at which point, you could interpret topic B to be about cute animals) 每個topic中字的分佈為β 每個文章中可以有多個topics

8 Explanation of LDA Corpus D : di = {w1, w2, …, wn} which is called bag of words. It is also the input of LDA. VOC = {w1, w2, …, wm} is a set of D and each word in VOC is different. The output vectors of LDA : α = each di’s probability distribution of z-th topics θd = <pt1, pt2, …, ptz> β = each topic’s probability distribution of word φz = <pw1, pw2, …, pwm> Core idea of LDA : p(w|d) = p(w|t)*p(t|d) initial θd and φz at first the word’s probability of topic z in di = pwm* ptz according to this probability, we can update the word in topic z finally, we can get a convergent result Pti=nti/n, n is the number of word in di , nti is the word appeared in ti pwi=Nwi/N, N is the number of this topic, Nwi is {wi} in VOC appeared in this topic times

9 Mining Common Context-Aware Preferences through Topic Models
(Play action game, Sunday) (Play action game, evening) (Play action game, Home) (a, pl) = bag of words di User1 User2 User3 Corpus D <T1, C={Sunday, evening, Home …}, play action game> <T2, C={Monday, morning, Work place …}, play facebook> <T3, C={Saturday, morning, Home …}, browse web news> <Ti, C={p1, p2, p3, …,pl}, a> Topic Z Cpp1 Cpp2 Cpp3 (a,p),(a,p)…

10 PCR-LDA The estimated values for two distributions {p(a, p|z)} and {p(z|u)}: the number of times ACP-feature (a, p) has been assigned to CCP z + β p(a, p|z) = all the ACP-feature (a, p) in CCP z+ the number of ACP-features from u’s context log * β the number of times a ACP-feature from user u’s context log that has been assigned to CCP z + α p(z|u) = all the ACP-feature (a, p) in user u’s context log + the number of CCPs * α

11 Experiments Static Data Set
Utilize 10 activity categories : Web, Multimedia, Management, Games, System, Navigation, Business, Reference, Social Network Service (SNS), Utility Contain 618 activities appear in total 408,299 activity-context records

12 Experiments benchmark methods
CPR (Context-aware Popularity based Recommendation): predict user preferred activities by the most frequent activities appear under C according to all users’ historical context logs PCR-i (Personalized Context-aware Recommendation by only leveraging Individual user’s context logs): rank each activity a by probability = ACP-feature (a, p) / all ACP-features in the context log of u Mean Average Precision at top K recommendation Mean Average Recall at top K recommendation

13 Case Study

14 …… Training User1 User2 … User151 User152 user446 User151
(Watch Movie, Monday) (Watch Movie, PM18:00-19:00) (Watch Movie, Home) (Play game, Monday) (Play game, PM20:00-21:00) (Play game, Home) (Browse website, Monday) (Browse website, PM21:00-22:00) (Browse website, Home) (Set timetable, Tuesday) (Set timetable, AM6:00-7:00) (Set timetable, Home) (Use google map, Tuesday) (Use google map, AM7:00-8:00) (Use google map, On the way) TopicZ1 (media) (Watch Movie, Monday) % (Watch Movie, Saturday) % (Watch Movie, Sunday) % (Take picture, Monday) % (Watch Movie, Am:0:00-1:00) % (Watch Movie, PM18:00-19:00) % (Watch Movie, PM22:00-23:00) % (Take picture, PM16:00-17:00) % (Take picture, PM17:00-18:00) % (Watch Movie, Home) % (Watch Movie, On the way) % (Watch Movie, Work place) % (Take picture, On the way) % TopicZ2 (internet) (Watch Movie, Monday) % (Watch Movie, Saturday) – 1.24% (Watch Movie, Sunday) – 2.11% (Browse website, Monday) – 6.12% (Watch Movie, Am:0:00-1:00) – 0.03% (Watch Movie, PM18:00-19:00) – 0.14% (Watch Movie, PM22:00-23:00) % (Browse website, PM16:00-17:00) – 4.14% (Browse website, PM21:00-22:00) – 5.56% (Watch Movie, Home) % (Watch Movie, On the way) – 0.01% (Watch Movie, Work place) – 0.11% (Browse website, Home) – 3.19% …… Training

15 p(a,p|u) = p(a,p|z)*p(z|u)
User152 (Watch Movie, Monday) (Watch Movie, PM21:00-22:00) (Watch Movie, Home) (Listen music, Monday) (Listen music, PM22:00-23:00) (Listen music,, Home) (Browse website, , Monday) (Browse website, PM22:00-23:00) (Browse website, Home) (Play online game, Tuesday) (Play online game, AM9:00-10:00) (Play online game, Home) (Play action game, Tuesday) (Play action game, AM10:00-11:00) (Play action game, On the way) p(a,p|u) = p(a,p|z)*p(z|u) The activity multimedia that user152 prefer is concerned by: Probability of (a,p) appear in topic z1 Probability of topic z1’s distribution in user152 Testing


Download ppt "Kuifei Yu, Baoxian Zhang, Hengshu Zhu,Huanhuan Cao, and Jilei Tian"

Similar presentations


Ads by Google