Download presentation
Published byKaren Price Modified over 9 years ago
1
LCARS: A Location-Content-Aware Recommender System
Hongzhi Yin† , Yizhou Sun‡, Bin Cui† Zhiting Hu†, Ling Chen †Peking University ‡Northeastern University University of Technology, Sydney
2
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
3
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
4
Background Location-based Social Networks Foursquare Facebook Places
Loopt Users share photos, comments or check-ins at a location Expanded rapidly, e.g., Foursquare gets over 3 million check-ins every day
5
Background
6
Background Event-based Social Networks (E.g. Meetup.com)
7
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
8
Problem Definition We aim to mine useful knowledge from the user activity history data in LBSNs and EBSNs to answer two typical questions in our daily life If we want to visit venues in a city such as Beijing, where should we go? If we want to attend local events such as dramas and exhibitions in a city, which events should we attend? For simplicity, we propose the notion of spatial items to denote both venues and events in a unified way, so that we can define our problem as follows: given a querying user 𝑢 with a querying city 𝑙 𝑢 , find k interesting spatial items within 𝑙 𝑢 , that match the preference of 𝑢.
9
Visit some spatial items So, what is the PROBLEM here?
Challenge(1/5) Spatial Item Recommendations in LBSN and EBSN Existing Solutions Based on item/user collaborative filtering Similar users gives the similar ratings to similar items Latent Factor models Similar Users User activity histories Visit some spatial items Similar Items Recommendation user + querying city Build recommendation models users So, what is the PROBLEM here? based on the model of co-rating and co-visit Why? Mao Ye, Peifeng Yin, Wang-Chien Lee: “Location recommendation for location-based social networks.” GIS2010 Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel: “LARS: A Location-Aware Recommender System.” ICDE2012
10
Challenge(2/5) User-item rating/visiting matrix
Millions of locations around the world Los Angeles New York City V1 V2 V3 … Vm-2 Vm-1 Vm User U0 Ui Uj Un A user visits ~100 spatial items User activity histories are locally clustered Recommendation queries target an area (very specific subset) Noulas, S. Scellato, C Mascolo and M Pontil “An Empirical Study of Geographic User Activity Patterns in Foursquare ” (ICWSM 2011) .
11
(Where you need recommendations the most)
Challenge(3/5) User’s activities are very limited in distant locations May NOT get any recommendations in some areas Things can get worse in NEW Areas (small cities and abroad) (Where you need recommendations the most)
12
Challenge(4/5) Gap V2 V4 V5 V1 V3 V6
User activity histories are locally clustered Los Angeles New York City Gap V2 V4 V5 V1 V3 V6 U1 U2 U3 U4 U5 U6 U7 U8 New City Problem: When U3 travels to New York City that is new to him since he has no activity history there, how can we recommend spatial items to her? In other words, how to link the users in one side to the items in the other side? Both User-based and Item-based CF methods would fail in this scenario.
13
Table 1: Topics discovered by LDA in an event-based social network
Challenge(5/5) Existing Latent factor models also fail to alleviate the new city problem. When we use these existing topic models to analyze user activity history data, spatial items in the discovered topics are clustered by their locations so, the topics describe the user’s spatial area of activity rather than users' interest related features (e.g, categories and genres of spatial items ) such as concert, film and exhibition. Table 1: Topics discovered by LDA in an event-based social network
14
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
15
Framework of LCARS
16
1. User Personal Interests/Preferences
Our Main Ideas (1/3) For spatial item recommendation, we need to consider (1) the querying user’s interest; (2) the local preference of the querying city, i.e., the local word-of-mouth opinion for a spatial item in the querying city. 1. User Personal Interests/Preferences Movie Food Shopping Recommender System 2. Local Preference
17
User Personal Interests/Preferences
Our Main Ideas (2/3) User Personal Interests/Preferences Movie Food Shopping Local Preference in a querying city Main idea #3: Combine user interest & local preference for recommendation in a unified way Main idea #1: Identify user interest using semantic information from the user activity history Main idea #2: Discover local preference in a specific querying city
18
Such as tags and category (e.g., movie, shopping, nigh life)
Our Main Ideas (3/3) Content Words of Items Such as tags and category (e.g., movie, shopping, nigh life) V1 V2 V3 V4 V5 V6 U1 U2 U3 U4 U5 U6 U7 U8 Los Angeles New York City The users in one side and the items in the other side can be linked together by the item contents.
19
Offline Modeling LCA-LDA Model
Some basic definitions User Profile: For each user 𝑢 in the dataset, we create a user profile 𝐷 𝑢 , which is a set of triples <v, 𝑙 𝑣 , 𝑐 𝑣 >. 𝑙 𝑣 denotes the location of item v in a region-level (e.g., city). 𝑐 𝑣 is a content word, such as a tag or a category word, associated with v . Topic: Each topic z in our work has two topic models 𝜙 𝑧 and 𝜙′ 𝑧. The former is a probability distribution over items (item ID) and the latter is a probability distribution over content words. User Interest: The intrinsic interest of user 𝑢 is represented by 𝜃 𝑢 , a probablity distribution over topics. Local Preference: The local preference in region 𝑙 is represent by 𝜃 𝑙 , a probability distribution over topics.
20
The Generative Process of LCA-LDA
We use LCA-LDA model to simulate the process of user decision-making for visiting behaviors.
21
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
22
Online Recommendation
Once we have inferred model parameters in LCA-LDA model, such as user interest 𝜃 𝑢 , the local preference 𝜃 𝑙 , topics 𝜙 𝑧 and ϕ′ z , and mixing weights 𝜆 𝑢 , in the offline modeling phase, the online recommendation part computes a ranking score for each spatial item v within querying region 𝑙 𝑢 , and then returns top-k ranked spatial items as the recommendations. The ranking score of v w.r.t query (u, 𝑙 𝑢 ) The preference weigh of query (u, 𝑙 𝑢 ) on topic z The score of item v on topic z
23
Naïve online algorithm
Given a query (u, 𝑙 𝑢 ) Compute the ranking scores for all items within the querying region 𝑙 𝑢 Find the best one, then the second best one, …, the k-th best one Good for small-scale problem Still not feasible for large-scale, e.g., there are millions of items in the dataset For the online recommendation phase, we have to compute the preference scores of a querying user to all spatial items within the querying city and subsequently select the best k among them to recommend to the user. When the number of spatial items becomes larger (e.g., millions), computing the top-k spatial items for each query requires millions of vector operations. To speed up the online process of producing recommendations, we extend the threshold-based algorithm that is capable of correctly finding top-k results by examining the minimum number of spatial items.
24
Threshold-based Algorithm
For each region 𝑙 , we pre-compute K sorted lists of spatial items. In each list 𝐿 𝑍 , the items are sorted by their score on topic z, i.e., F(v, 𝑙, z). Given a query (u, 𝑙 𝑢 ), we sequentially access the items and compute their ranking scores in each sorted list. For each list 𝐿 𝑍 , let 𝑣 𝑧 be the last item examined under sorted access. Define the threshold value 𝑇 𝑎 as follows: 𝑇 𝑎 = 𝑧 𝑊 𝑢, 𝑙 𝑢 ,𝑧 𝐹( 𝑙 𝑢 , 𝑣 𝑧 ,𝑧) As soon as at least k items have been examined whose ranking score is equal or large than the threshold value, then halt. Let 𝐿 be a list containing k items that have been examined with the highest ranking scores. Return 𝐿 to the querying users.
25
Nice Properties of TA The TA algorithm is able to correctly find the top-k items by examining the minimum number of items, since our defined the ranking function is strictly monotone. The threshold value 𝑇 𝑎 is obtained by aggregating the maximum 𝐹 𝑙 𝑢 ,𝑣,𝑧 represented by the last seen item in each list 𝐿 𝑍 . Consequently, it is the maximum possible ranking score that can be achieved by remaining unexamined items. Hence, if the smallest ranking score of the k examined items is no less than the threshold score, the algorithm can terminate immediately because no remaining item will have a higher ranking score than the found k items.
26
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
27
Experimental Data Sets
DoubanEvent. DoubanEvent is China’s largest event-based social networking site where users can publish and participate in social events. This data set consists of 100,000 users, 300,000 events and 3,500,000 check-ins. Foursquare: This dataset contains 11, 326 users, 182, 968 venues and 1, 385, 223 check-ins. User and Event Distributions over Cities in DoubanEvent
28
Evaluation Method (1/2) We design two real settings to evaluate the recommendation effectiveness of our LCA-LDA model: Querying cities are new cities to querying users; Querying cities are home cities to querying users; We then divide a user’s activity history into a test set and a training set. We adopt two different dividing strategies with respect to the two settings. For the first setting, we randomly select a visited non-home city as the new city, mark off all spatial items visited by the user in the city as the test set and use the rest of the user's activity history in other cities as the training set. For the second setting, we randomly select 20% of spatial items visited by the user in personal home city as the test set, and use the rest of personal activity history as the training set.
29
Evaluation Method (2/2) For each test case (𝑢, 𝑣, 𝑙 𝑣 ) in the test set Randomly select 1000 additional items located at lv and unrated by user 𝑢. Compute the ranking score for the test item 𝑣 as well as the additional 1000 spatial items. Form a ranked list by ordering 1001 items according to the ranking scores. Let p denote the rank of the item 𝑣 within this list. (The best result: p=0). Form a top-k recommendation list by picking the k top ranked items from the list. If p<k we have a hit. Otherwise we have a miss. For any single test case recall for a single test can assume either 0 (miss) or 1(hit) The overall recall is defined by averaging over all test cases
30
Baseline Methods USG: A unified location recommendation framework which linearly fuses User interest along with Social influence and Geographical influence. User-based CF methods CKNN: A Category-based k-Nearest Neighbors algorithm. CKNN projects a user's activity history into the category space and models user preference using a weighted category hierarchy. The similarity between two users in CKNN is computed according to their weights in the category hierarchy IKNN: A Item-based k-Nearest Neighbors algorithm. The similarity between two users is computed by the Cosine similarity between two users' item vectors. LDA: A user is viewed as a document, and the items visited by her is viewed as words in the document. Location-Aware LDA (LA-LDA):One component of LCA-LDA Content-Aware LDA(CA-LDA):Another component of LCA-LDA
31
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
32
Experimental Results Recommendation Effectiveness
33
Experimental Results Recommendation Effectiveness
34
Experimental Results Efficiency of online recommendation, querying cities are Beijing and Shanghai For mobile recommendation, the efficiency is very important, since users require the query response in real time
35
Experimental Results In order to clearly see the performance of LCARS, we zoom the results as follows.
36
Latent Information Analysis
Table 2 and Table 3 respectively show three latent topics (e.g., T8, T22, and T9) learned by LCA-LDA and LDA on the DoubanEvent data set.For each topic, we present the top eight events with the highest probabilities, including their IDs, categories and locations. By comparing the topics in Tables 2and 3, we observe that the events in each latent topic learned by LCA-LDA not only share the same category, but are also located in different cities. In contrast, the topics learnt by LDA are not category-consistent. For example, concerts, culture salons, films and exhibitions are mixed up in the topics T9, T8 and T16 learnt by LDA. Besides, the events in each topic learnt by LDA take place in the same city. For example, the events in topic T9 are located in Shanghai and the events in topic T8 are held in Beijing. The comparative results reveal that when we use existing topic model LDA to analyze the user activity history, we are unable to discover the users' interests in the features (latent topics) of spatial items such as ``exhibition'' and ``concert'', and most of the estimated topics describe the user's spatial area of activity instead of user interests.
37
Outline Introduction Our Solution – LCARS Experiments Conclusions
Background Challenges Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm Experiments Experimental Setup Experimental Results Conclusions
38
Conclusion Spatial item Recommendations Our Solution - LCARS Result
Data sparsity is a big challenge in recommendation systems New city problem amplify the data sparsity challenge Mobile scenario requires the recommender system to generate real-time response to the user query. Our Solution - LCARS Exploit the Local Preference of the querying city to alleviate the data sparsity. Local word-of-mouth is a valuable resource for making a recommendation. Take advantage of Content Information of items to overcome the sparsity. The contents build a bridge between users and items from disjoint regions. Extend the Threshold-based algorithm (TA) to produce fast online recommendations Result LCARS can produce more effective and more efficient
39
Q&A Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.