PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.

PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit

Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation

Introduction Two papers covered in the presentation 1.Who Will Reply to / Retweet this Tweet ?: The Dynamics of Intimacy from Online 2.Modeling and Prediction Retweeting Dynamics on Microblogging Platforms

COMMON OBJECTIVE Prediction the popularity of a message posted in online social network Prediction the popularity of a message posted in online social network The second Paper : The second Paper : 1.Examine the popularity by who will be the replier/retweeter 2.Investigate the dynamics of dyadic friend relationship

BACKGROUND Social tie Social tie Social interaction in both online and offline social network Social interaction in both online and offline social network Proximity and homophily play key roles in social tie Proximity and homophily play key roles in social tie befriend with people who are close by and with similar features befriend with people who are close by and with similar features Tie strength Tie strength frequency of interaction frequency of interaction Trend Trend Interact with lower communication cost and great variety modes Interact with lower communication cost and great variety modes the increasing attention on social tie and tie strength the increasing attention on social tie and tie strength

MOTIVATIONS Existing studies: Existing studies: Tie strength is considered as static Tie strength is considered as static Unrealistic!! Unrealistic!! The dynamics of tie strength The dynamics of tie strength 1.Reciprocity : strength may not be reciprocal 2.Temporality : strength can evolve over time 3.Contextuality : perceive different feelings of closeness given different contexts Boost the effect of online advertisement,e.g. product recommendation Boost the effect of online advertisement,e.g. product recommendation

DYNAMICS OF TIE STRENGTH RECIPROCITY Quantization: percentile rank of v in terms of outgoing interactions from u to v Quantization: percentile rank of v in terms of outgoing interactions from u to v Reciprocal interaction rank (RIR) Reciprocal interaction rank (RIR) a large portion of social ties, the reciprocal interaction rank deviates from zero Inequality of tie strength

TEMPORALITY Tie strength between a dyadic relation change over time Tie strength between a dyadic relation change over time Investigation on “ Best Friend ” Investigation on “ Best Friend ” Sample 1 million users and their best friends for each month Sample 1 million users and their best friends for each month Best friend changed over time, 3 distinct best friends Best friend changed over time, 3 distinct best friends

CONTEXTUALITY Different degree of intimacy given different contexts Different degree of intimacy given different contexts Emotion analysis Emotion analysis H: ratio of repliers containing happy emotions sent from u to each of u’s friend H: ratio of repliers containing happy emotions sent from u to each of u’s friend PRxD: percentile rank for the user PRxD: percentile rank for the user People are more likely to enclose an emotion if the friend is ranked higher People are more likely to enclose an emotion if the friend is ranked higher

OVERVIEW User(U) + Time(T) Aims: Rank U’s friends according to their intentions of replying and retweeting the Tweet Tweet

HOW? Crawling + Training Predicting Feature Extraction -Temporality -Contextuality -Reciprocity

PREPROCESS – AFFINITY CALCULATION Profile Affinity Profile Affinity Binary: Gender, city Binary: Gender, city Real-valued: age, education, common friends Real-valued: age, education, common friends Affinity(Education)Affinity(Common Friends) Affinity(Age)

PREPROCESS – AFFINITY CALCULATION Topic Affinity Topic Affinity Topical Keyphrase extraction approach Topical Keyphrase extraction approach i.Twitter-LDA model to discover topics ii.PageRank-based method to identify important words

PREPROCESS - CONTEXT ANALYSIS Sentiment analysis using SVM Sentiment analysis using SVM Unigram and bigram Unigram and bigram Contextual features such as hashtag Contextual features such as hashtag Using emoticons as noisy labels Using emoticons as noisy labels Self-Disclosure Self-Disclosure Occurrence of the first person words Occurrence of the first person words Geo-Tweet Geo-Tweet Client Types(Mobile/Desktop) Client Types(Mobile/Desktop)

PREPROCESS - RESPONSIVENESS ANALYSIS Responsiveness: Historical interaction data to measure Responsiveness: Historical interaction data to measure Availability of a user has to reply to/retweet a tweet Availability of a user has to reply to/retweet a tweet Capacity of a user to read tweets and interact with others Capacity of a user to read tweets and interact with others Tendency of a user to interact with a particular friend Tendency of a user to interact with a particular friend Cost i.e. Availability CapacityTendency

TRAINING - LEARNING THE RANKING MODEL Restate the problem formally: Restate the problem formally: Feature: Given a tweet q posted by user u as a query, and u’s friends E ° u as candidate documents, a feature vector x(q, v) is generated for each query- document pair (q, v) where v ∈ E ° u. Label: y = (1, 2,..., m) is the label set for a query-document pair, where the label indicates the relevance, i.e. the intention to reply to/retweet a tweet. Aim: Train a ranking model F(X, Y) = f(x), which assigns a score for a feature vector x and can rank the documents for a given query Learning to Rank Framework Binary Relevance Judgement (|Y| = 2) Ordinary Relevance Scale(|Y| > 2)

PREDICTION Given a TWEET Perform query understanding (same approach with preprocessing phase) Perform query understanding (same approach with preprocessing phase) Extract Topical keywords Extract Topical keywords Segmentation and NE recognition Segmentation and NE recognition Sentiment analysis Sentiment analysis Self-disclosure analysis Self-disclosure analysis Feed the extracted feature to the ranking model Feed the extracted feature to the ranking model Retrieve a sorted ranking list Retrieve a sorted ranking list

DATASET 1.116 billion tweets posted by 1.1 million users 1.116 billion tweets posted by 1.1 million users Sep. 2009 to Dec. 2013 Sep. 2009 to Dec. 2013 73.262 million dyadic friendship relations 73.262 million dyadic friendship relations

DATASET Profile Affinity Index: Profile Affinity Index: Topic Affinity Calculation: Topic Affinity Calculation: Number of topics: 50 Number of topics: 50 Number of keywords: top 20 Number of keywords: top 20 Coherent and Meaningfu l

PRELIMINARY SETTINGS Remove the tweets had no replies/retweets Remove the tweets had no replies/retweets tweet-reply dataset (215M tweets and 447M replies) tweet-reply dataset (215M tweets and 447M replies) tweet-retweet dataset (29M tweets and 38M retweets) tweet-retweet dataset (29M tweets and 38M retweets) Conduct a 10-fold cross validation Conduct a 10-fold cross validation Use a batch training mode in the experiments Use a batch training mode in the experiments

BASELINES SETTINGS Social, Topical, and Activity features based model (STA) Social, Topical, and Activity features based model (STA) logistic regression model is trained logistic regression model is trained Homophily-based Graphical model (HG) Homophily-based Graphical model (HG) Latent variable: tie strength between dyadic users Latent variable: tie strength between dyadic users

CRITERIA Topmost Accuracy Topmost Accuracy ratio of successfully predicted repliers/retweeters to the number of tweets ratio of successfully predicted repliers/retweeters to the number of tweets m: number of tweets δ {·}: indicator function l : label of user 1 (replier/retweeter ) 0 (non-replier/non-retweeter)

CRITERIA Mean Average Precision Mean Average Precision average over all average precision for m tweets average over all average precision for m tweets R i : number of actual repliers/retweeters of the tweet

CRITERIA Normalized Discounted Cumulative Gain Normalized Discounted Cumulative Gain evaluate ranking quality evaluate ranking quality G(·) : gain function (G(x) = 2 x − 1)

RANKING FUNCTIONS Logistic Regression Logistic Regression pointwise approach pointwise approach RankSVM RankSVM pairwise approach pairwise approach Lambda MART Lambda MART listwise approach listwise approach

RESULTS Topmost Accuracy Topmost Accuracy Mean Average Precision Mean Average Precision Normalized Discounted Cumulative Gain

RECOMMENDATION Partition a day time bins in hour-basis Partition a day time bins in hour-basis Increase the user activity variation Increase the user activity variation Worsen the prediction performance Worsen the prediction performance To eliminate the variation: Time Mapping from the second paper

TIME MAPPING Weibo Time Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Right graph: Greatly fluctuated -> relatively flat Greatly fluctuated -> relatively flat Mitigate the impact of message’s post Mitigate the impact of message’s post Time on its early-stage popularity Time on its early-stage popularity Eliminate the effect of user activity Eliminate the effect of user activity variation variation

PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.

Similar presentations

Presentation on theme: "PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.

Similar presentations

Presentation on theme: "PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit."— Presentation transcript:

Similar presentations

About project

Feedback