PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.

Slides:

Advertisements

Similar presentations

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Sentiment Analysis on Twitter Data

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.

Presented By: Omofonmwan Nelson. Agenda:  Twitter  Benefits of Twitter  Tweet  Tweeter Services  Geographical Distribution  Conclusion.

Distant Supervision for Emotion Classification in Twitter posts 1/17.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

EventCube Aviation Safety Data Analysis System Fangbo Tao, Xiao Yu, Jiawei Han 08/10/13.

Modeling Relationship Strength in Online Social Networks Rongjian Xiang 1, Jennifer Neville 1, Monica Rogati 2 1 Purdue University, 2 LinkedIn WWW 2010.

We Know #Tag: Does the Dual Role Affect Hashtag Adoption? Lei Yang 1, Tao Sun 2, Ming Zhang 2, Qiaozhu Mei 1 1 School of Information, the University.

Link creation and profile alignment in the aNobii social network Luca Maria Aiello et al. Social Computing Feb 2014 Hyewon Lim.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Evaluating Search Engine

UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.

1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Learning to Rank for Information Retrieval

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.

+ Recommending Branded Products from Social Media Jessica CHOW Yuet Tsz Yongzheng Zhang, Marco Pennacchiotti eBay Inc. eBay Inc.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Microblogs: Information and Social Network Huang Yuxin.

Shanda Innovations Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Kevin Y. W. Chen.

Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.

Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang

Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏慈濟大學醫學資訊學系 2012/06/13.

Prediction of Influencers from Word Use Chan Shing Hei.

Chapter 23: Probabilistic Language Models April 13, 2004.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Learning to Rank From Pairwise Approach to Listwise Approach.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.

Post-Ranking query suggestion by diversifying search Chao Wang.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.

Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Evaluation of IR Systems

DM-Group Meeting Liangzhe Chen, Nov

Feature Selection for Ranking

INF 141: Information Retrieval

Learning to Rank with Ties

Topic: Semantic Text Mining

Analyzing Influence of Social Media Through Twitter

Modeling Topic Diffusion in Scientific Collaboration Networks

Presentation transcript:

PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit

Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation

Introduction Two papers covered in the presentation 1.Who Will Reply to / Retweet this Tweet ?: The Dynamics of Intimacy from Online 2.Modeling and Prediction Retweeting Dynamics on Microblogging Platforms

COMMON OBJECTIVE Prediction the popularity of a message posted in online social network Prediction the popularity of a message posted in online social network The second Paper : The second Paper : 1.Examine the popularity by who will be the replier/retweeter 2.Investigate the dynamics of dyadic friend relationship

BACKGROUND Social tie Social tie Social interaction in both online and offline social network Social interaction in both online and offline social network Proximity and homophily play key roles in social tie Proximity and homophily play key roles in social tie befriend with people who are close by and with similar features befriend with people who are close by and with similar features Tie strength Tie strength frequency of interaction frequency of interaction Trend Trend Interact with lower communication cost and great variety modes Interact with lower communication cost and great variety modes the increasing attention on social tie and tie strength the increasing attention on social tie and tie strength

MOTIVATIONS Existing studies: Existing studies: Tie strength is considered as static Tie strength is considered as static Unrealistic!! Unrealistic!! The dynamics of tie strength The dynamics of tie strength 1.Reciprocity : strength may not be reciprocal 2.Temporality : strength can evolve over time 3.Contextuality : perceive different feelings of closeness given different contexts Boost the effect of online advertisement,e.g. product recommendation Boost the effect of online advertisement,e.g. product recommendation

DYNAMICS OF TIE STRENGTH RECIPROCITY Quantization: percentile rank of v in terms of outgoing interactions from u to v Quantization: percentile rank of v in terms of outgoing interactions from u to v Reciprocal interaction rank (RIR) Reciprocal interaction rank (RIR) a large portion of social ties, the reciprocal interaction rank deviates from zero Inequality of tie strength

TEMPORALITY Tie strength between a dyadic relation change over time Tie strength between a dyadic relation change over time Investigation on “ Best Friend ” Investigation on “ Best Friend ” Sample 1 million users and their best friends for each month Sample 1 million users and their best friends for each month Best friend changed over time, 3 distinct best friends Best friend changed over time, 3 distinct best friends

CONTEXTUALITY Different degree of intimacy given different contexts Different degree of intimacy given different contexts Emotion analysis Emotion analysis H: ratio of repliers containing happy emotions sent from u to each of u’s friend H: ratio of repliers containing happy emotions sent from u to each of u’s friend PRxD: percentile rank for the user PRxD: percentile rank for the user People are more likely to enclose an emotion if the friend is ranked higher People are more likely to enclose an emotion if the friend is ranked higher

Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation

OVERVIEW User(U) + Time(T) Aims: Rank U’s friends according to their intentions of replying and retweeting the Tweet Tweet

HOW? Crawling + Training Predicting Feature Extraction -Temporality -Contextuality -Reciprocity

PREPROCESS – AFFINITY CALCULATION Profile Affinity Profile Affinity Binary: Gender, city Binary: Gender, city Real-valued: age, education, common friends Real-valued: age, education, common friends Affinity(Education)Affinity(Common Friends) Affinity(Age)

PREPROCESS – AFFINITY CALCULATION Topic Affinity Topic Affinity Topical Keyphrase extraction approach Topical Keyphrase extraction approach i.Twitter-LDA model to discover topics ii.PageRank-based method to identify important words

PREPROCESS - CONTEXT ANALYSIS Sentiment analysis using SVM Sentiment analysis using SVM Unigram and bigram Unigram and bigram Contextual features such as hashtag Contextual features such as hashtag Using emoticons as noisy labels Using emoticons as noisy labels Self-Disclosure Self-Disclosure Occurrence of the first person words Occurrence of the first person words Geo-Tweet Geo-Tweet Client Types(Mobile/Desktop) Client Types(Mobile/Desktop)

PREPROCESS - RESPONSIVENESS ANALYSIS Responsiveness: Historical interaction data to measure Responsiveness: Historical interaction data to measure Availability of a user has to reply to/retweet a tweet Availability of a user has to reply to/retweet a tweet Capacity of a user to read tweets and interact with others Capacity of a user to read tweets and interact with others Tendency of a user to interact with a particular friend Tendency of a user to interact with a particular friend Cost i.e. Availability CapacityTendency

TRAINING - LEARNING THE RANKING MODEL Restate the problem formally: Restate the problem formally: Feature: Given a tweet q posted by user u as a query, and u’s friends E ° u as candidate documents, a feature vector x(q, v) is generated for each query- document pair (q, v) where v ∈ E ° u. Label: y = (1, 2,..., m) is the label set for a query-document pair, where the label indicates the relevance, i.e. the intention to reply to/retweet a tweet. Aim: Train a ranking model F(X, Y) = f(x), which assigns a score for a feature vector x and can rank the documents for a given query Learning to Rank Framework Binary Relevance Judgement (|Y| = 2) Ordinary Relevance Scale(|Y| > 2)

PREDICTION Given a TWEET Perform query understanding (same approach with preprocessing phase) Perform query understanding (same approach with preprocessing phase) Extract Topical keywords Extract Topical keywords Segmentation and NE recognition Segmentation and NE recognition Sentiment analysis Sentiment analysis Self-disclosure analysis Self-disclosure analysis Feed the extracted feature to the ranking model Feed the extracted feature to the ranking model Retrieve a sorted ranking list Retrieve a sorted ranking list

Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation

DATASET billion tweets posted by 1.1 million users billion tweets posted by 1.1 million users Sep to Dec Sep to Dec million dyadic friendship relations million dyadic friendship relations

DATASET Profile Affinity Index: Profile Affinity Index: Topic Affinity Calculation: Topic Affinity Calculation: Number of topics: 50 Number of topics: 50 Number of keywords: top 20 Number of keywords: top 20 Coherent and Meaningfu l

PRELIMINARY SETTINGS Remove the tweets had no replies/retweets Remove the tweets had no replies/retweets tweet-reply dataset (215M tweets and 447M replies) tweet-reply dataset (215M tweets and 447M replies) tweet-retweet dataset (29M tweets and 38M retweets) tweet-retweet dataset (29M tweets and 38M retweets) Conduct a 10-fold cross validation Conduct a 10-fold cross validation Use a batch training mode in the experiments Use a batch training mode in the experiments

BASELINES SETTINGS Social, Topical, and Activity features based model (STA) Social, Topical, and Activity features based model (STA) logistic regression model is trained logistic regression model is trained Homophily-based Graphical model (HG) Homophily-based Graphical model (HG) Latent variable: tie strength between dyadic users Latent variable: tie strength between dyadic users

CRITERIA Topmost Accuracy Topmost Accuracy ratio of successfully predicted repliers/retweeters to the number of tweets ratio of successfully predicted repliers/retweeters to the number of tweets m: number of tweets δ {·}: indicator function l : label of user 1 (replier/retweeter ) 0 (non-replier/non-retweeter)

CRITERIA Mean Average Precision Mean Average Precision average over all average precision for m tweets average over all average precision for m tweets R i : number of actual repliers/retweeters of the tweet

CRITERIA Normalized Discounted Cumulative Gain Normalized Discounted Cumulative Gain evaluate ranking quality evaluate ranking quality G(·) : gain function (G(x) = 2 x − 1)

RANKING FUNCTIONS Logistic Regression Logistic Regression pointwise approach pointwise approach RankSVM RankSVM pairwise approach pairwise approach Lambda MART Lambda MART listwise approach listwise approach

RESULTS Topmost Accuracy Topmost Accuracy Mean Average Precision Mean Average Precision Normalized Discounted Cumulative Gain

Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation

RECOMMENDATION Partition a day time bins in hour-basis Partition a day time bins in hour-basis Increase the user activity variation Increase the user activity variation Worsen the prediction performance Worsen the prediction performance To eliminate the variation: Time Mapping from the second paper

TIME MAPPING Weibo Time Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Right graph: Greatly fluctuated -> relatively flat Greatly fluctuated -> relatively flat Mitigate the impact of message’s post Mitigate the impact of message’s post Time on its early-stage popularity Time on its early-stage popularity Eliminate the effect of user activity Eliminate the effect of user activity variation variation