Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu
Challenges with current Social Networks Personal data left with Service Provider even when Social graph is removed Control of the User-generated content with Service Provider “Targeted Advertising” Data being sold to third-party companies without user’s approval (Ex: Facebook beacon) Solution: Peer-to-Peer(P2P) Architecture instead of Centralized Architecture
Peer-to-Peer Social Network Utilizing advantages of P2P computing in-order to avoid issues with current OSN. Client needs to install Java Desktop Application BATON overlay for the P2P Architecture (Balanced binary Tree Overlay Network) Relational Database – PostgreSQL Open Source Amazon EC2 Cloud platform for Servers Bootstrap Peer Server Peer Client Peer
Content organization in our P2P Social Network Bootstrap Peer Server Peer Bootstrap Peer Functionalities New user joining / leaving Auto fail-over Auto scaling Server Peer Functionalities Schema Mapping Data loader Data Indexer Access Control
Content organization in our P2P Social Network Categories of Content Posts Comments Blogs Articles Group Discussions Attachments Privacy Control Options Public Self Friends Only
About Hashtag Introduced by Twitter network in August 2007 String of words or characters preceded by “#” symbol Used for : 1.Forming groups 2.Highlight events or disasters and any information being posted by users under one context 3.Organize personal content
Common issues with Hashtags Ambiguous tags Ex: “apple” – may refer to fruit or mac product Heterogeneous tags Ex: Multi-lingual, acronyms, synonyms, plurals Expertise of the user adding hashtag Solution: Provide organization and relationship between hashtags Reference: HyperTwitter: Collaborative Knowledge Engineering via Twitter Messages – Martin Hepp
Hashtag Syntax For getting additional information from user Consolidating multiple synonymous hashtags Letting others know about relations between tags Syntax:
Hashtag Recommendation System
Recommendation Systems Content-based – Items already used by a user Collaborative Filtering – based on interests from similar users Hybrid
Related Work Authors: Su Mon Kywe & Tuan-Anh Hoang et.al Personalized hashtag recommendation Select hashtags from similar users and similar tweets and rank them TF-IDF scheme used for measuring weights of content and cosine similarity for comparison
Related Work Authors: Eva Zangerle, Wolfgang Gassler & Gunther Specht Find most similar messages in a crawled data set for the tweet entered by the user Retrieving the set of hashtags used within these most similar messages Ranking the computed set of hashtag recommendation candidates TF-IDF scheme used for measuring similarity between tweets
Related Work Authors: Fréderic Godin, Viktor Slavkovikj, Wesley De Neve Topic model for recommendation – LDA (Latent Dirichlet Allocation) Recommend general hashtag topics by identifying keywords Do not consider user preferences
Drawbacks with previous models TF-IDF works good for short messages Suggested tags are sparse – 77% of hashtags used only once, 94% not used more than five times Mostly used when recommending a maximum of five hashtags Using topic model, keywords for hashtags are recommended
Our Recommendation System Based on probabilistic topic model – LDA Both content-based and collaborative filtering methods For recommending hashtags and also content
About LDA – Latent Dirichlet Allocation Hidden topic model Discover topics from large set of documents Gives topic-word distribution Gibbs Sampling algorithm t1 tn w1wm weight
About LDA – Latent Dirichlet Allocation Training Phase Inference Phase doc1 docn t1tm weight
Offline Sampling Collection of Training Dataset JGibbLDA Content From Network JGibbLDA Final Doc- Topic Distribution JGibbLDA : TrainingInference
Content Types LDA User- generated content Content from Friends Content from Similar Users Location based Overall Popularity Keywords
Content Types User Generated Content Pass on all of the user’s data to LDA Identify Topics of Interest (TOI) Recommend hashtags from docs with TOI greater than threshold Content from Friends Pass on publicly available data from Friends to LDA Identify documents with similar TOI using Cosine similarity Recommend hashtags from docs with similar TOI
Data Retrieval Considerations in P2P OSN Overall Popular hashtags stored in bootstrap peer LDA models sampled for global data also in bootstrap peer Only “public” content is collected for options Friends and Similar Users New User will receive recommendations based on Overall Popularity and Location based
Conclusion & Future Work Simulation model for evaluation – Yet to! Users to choose their own hashtag recommendation approach Include hashtag-topic distribution along with topic- word distribution Embed Ecommerce capabilities into the application
Thank you!