Download presentation
Presentation is loading. Please wait.
Published bySuzanna Melton Modified over 8 years ago
1
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit
2
Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation
3
Introduction Two papers covered in the presentation 1.Who Will Reply to / Retweet this Tweet ?: The Dynamics of Intimacy from Online 2.Modeling and Prediction Retweeting Dynamics on Microblogging Platforms
4
COMMON OBJECTIVE Prediction the popularity of a message posted in online social network Prediction the popularity of a message posted in online social network The second Paper : The second Paper : 1.Examine the popularity by who will be the replier/retweeter 2.Investigate the dynamics of dyadic friend relationship
5
BACKGROUND Social tie Social tie Social interaction in both online and offline social network Social interaction in both online and offline social network Proximity and homophily play key roles in social tie Proximity and homophily play key roles in social tie befriend with people who are close by and with similar features befriend with people who are close by and with similar features Tie strength Tie strength frequency of interaction frequency of interaction Trend Trend Interact with lower communication cost and great variety modes Interact with lower communication cost and great variety modes the increasing attention on social tie and tie strength the increasing attention on social tie and tie strength
6
MOTIVATIONS Existing studies: Existing studies: Tie strength is considered as static Tie strength is considered as static Unrealistic!! Unrealistic!! The dynamics of tie strength The dynamics of tie strength 1.Reciprocity : strength may not be reciprocal 2.Temporality : strength can evolve over time 3.Contextuality : perceive different feelings of closeness given different contexts Boost the effect of online advertisement,e.g. product recommendation Boost the effect of online advertisement,e.g. product recommendation
7
DYNAMICS OF TIE STRENGTH RECIPROCITY Quantization: percentile rank of v in terms of outgoing interactions from u to v Quantization: percentile rank of v in terms of outgoing interactions from u to v Reciprocal interaction rank (RIR) Reciprocal interaction rank (RIR) a large portion of social ties, the reciprocal interaction rank deviates from zero Inequality of tie strength
8
TEMPORALITY Tie strength between a dyadic relation change over time Tie strength between a dyadic relation change over time Investigation on “ Best Friend ” Investigation on “ Best Friend ” Sample 1 million users and their best friends for each month Sample 1 million users and their best friends for each month Best friend changed over time, 3 distinct best friends Best friend changed over time, 3 distinct best friends
9
CONTEXTUALITY Different degree of intimacy given different contexts Different degree of intimacy given different contexts Emotion analysis Emotion analysis H: ratio of repliers containing happy emotions sent from u to each of u’s friend H: ratio of repliers containing happy emotions sent from u to each of u’s friend PRxD: percentile rank for the user PRxD: percentile rank for the user People are more likely to enclose an emotion if the friend is ranked higher People are more likely to enclose an emotion if the friend is ranked higher
10
Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation
11
OVERVIEW User(U) + Time(T) Aims: Rank U’s friends according to their intentions of replying and retweeting the Tweet Tweet
12
HOW? Crawling + Training Predicting Feature Extraction -Temporality -Contextuality -Reciprocity
13
PREPROCESS – AFFINITY CALCULATION Profile Affinity Profile Affinity Binary: Gender, city Binary: Gender, city Real-valued: age, education, common friends Real-valued: age, education, common friends Affinity(Education)Affinity(Common Friends) Affinity(Age)
14
PREPROCESS – AFFINITY CALCULATION Topic Affinity Topic Affinity Topical Keyphrase extraction approach Topical Keyphrase extraction approach i.Twitter-LDA model to discover topics ii.PageRank-based method to identify important words
15
PREPROCESS - CONTEXT ANALYSIS Sentiment analysis using SVM Sentiment analysis using SVM Unigram and bigram Unigram and bigram Contextual features such as hashtag Contextual features such as hashtag Using emoticons as noisy labels Using emoticons as noisy labels Self-Disclosure Self-Disclosure Occurrence of the first person words Occurrence of the first person words Geo-Tweet Geo-Tweet Client Types(Mobile/Desktop) Client Types(Mobile/Desktop)
16
PREPROCESS - RESPONSIVENESS ANALYSIS Responsiveness: Historical interaction data to measure Responsiveness: Historical interaction data to measure Availability of a user has to reply to/retweet a tweet Availability of a user has to reply to/retweet a tweet Capacity of a user to read tweets and interact with others Capacity of a user to read tweets and interact with others Tendency of a user to interact with a particular friend Tendency of a user to interact with a particular friend Cost i.e. Availability CapacityTendency
17
TRAINING - LEARNING THE RANKING MODEL Restate the problem formally: Restate the problem formally: Feature: Given a tweet q posted by user u as a query, and u’s friends E ° u as candidate documents, a feature vector x(q, v) is generated for each query- document pair (q, v) where v ∈ E ° u. Label: y = (1, 2,..., m) is the label set for a query-document pair, where the label indicates the relevance, i.e. the intention to reply to/retweet a tweet. Aim: Train a ranking model F(X, Y) = f(x), which assigns a score for a feature vector x and can rank the documents for a given query Learning to Rank Framework Binary Relevance Judgement (|Y| = 2) Ordinary Relevance Scale(|Y| > 2)
18
PREDICTION Given a TWEET Perform query understanding (same approach with preprocessing phase) Perform query understanding (same approach with preprocessing phase) Extract Topical keywords Extract Topical keywords Segmentation and NE recognition Segmentation and NE recognition Sentiment analysis Sentiment analysis Self-disclosure analysis Self-disclosure analysis Feed the extracted feature to the ranking model Feed the extracted feature to the ranking model Retrieve a sorted ranking list Retrieve a sorted ranking list
19
Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation
20
DATASET 1.116 billion tweets posted by 1.1 million users 1.116 billion tweets posted by 1.1 million users Sep. 2009 to Dec. 2013 Sep. 2009 to Dec. 2013 73.262 million dyadic friendship relations 73.262 million dyadic friendship relations
21
DATASET Profile Affinity Index: Profile Affinity Index: Topic Affinity Calculation: Topic Affinity Calculation: Number of topics: 50 Number of topics: 50 Number of keywords: top 20 Number of keywords: top 20 Coherent and Meaningfu l
22
PRELIMINARY SETTINGS Remove the tweets had no replies/retweets Remove the tweets had no replies/retweets tweet-reply dataset (215M tweets and 447M replies) tweet-reply dataset (215M tweets and 447M replies) tweet-retweet dataset (29M tweets and 38M retweets) tweet-retweet dataset (29M tweets and 38M retweets) Conduct a 10-fold cross validation Conduct a 10-fold cross validation Use a batch training mode in the experiments Use a batch training mode in the experiments
23
BASELINES SETTINGS Social, Topical, and Activity features based model (STA) Social, Topical, and Activity features based model (STA) logistic regression model is trained logistic regression model is trained Homophily-based Graphical model (HG) Homophily-based Graphical model (HG) Latent variable: tie strength between dyadic users Latent variable: tie strength between dyadic users
24
CRITERIA Topmost Accuracy Topmost Accuracy ratio of successfully predicted repliers/retweeters to the number of tweets ratio of successfully predicted repliers/retweeters to the number of tweets m: number of tweets δ {·}: indicator function l : label of user 1 (replier/retweeter ) 0 (non-replier/non-retweeter)
25
CRITERIA Mean Average Precision Mean Average Precision average over all average precision for m tweets average over all average precision for m tweets R i : number of actual repliers/retweeters of the tweet
26
CRITERIA Normalized Discounted Cumulative Gain Normalized Discounted Cumulative Gain evaluate ranking quality evaluate ranking quality G(·) : gain function (G(x) = 2 x − 1)
27
RANKING FUNCTIONS Logistic Regression Logistic Regression pointwise approach pointwise approach RankSVM RankSVM pairwise approach pairwise approach Lambda MART Lambda MART listwise approach listwise approach
28
RESULTS Topmost Accuracy Topmost Accuracy Mean Average Precision Mean Average Precision Normalized Discounted Cumulative Gain
29
Part1: Introduction Part2: Learning to Rank Friends Part3: Evaluation Part4: Recommendation
30
RECOMMENDATION Partition a day time bins in hour-basis Partition a day time bins in hour-basis Increase the user activity variation Increase the user activity variation Worsen the prediction performance Worsen the prediction performance To eliminate the variation: Time Mapping from the second paper
31
TIME MAPPING Weibo Time Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Measure time not by wall time ( seconds) but by the number of messages that users post on Weibo Right graph: Greatly fluctuated -> relatively flat Greatly fluctuated -> relatively flat Mitigate the impact of message’s post Mitigate the impact of message’s post Time on its early-stage popularity Time on its early-stage popularity Eliminate the effect of user activity Eliminate the effect of user activity variation variation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.