Download presentation
Presentation is loading. Please wait.
1
D. Arendt (presenter), S. Volkova, E. Bell
Identifying Effective Signals to Predict Deleted and Suspended Accounts on Twitter across Languages D. Arendt (presenter), S. Volkova, E. Bell Data Sciences and Analytics Group, National Security Directorate, Pacific Northwest National Laboratory International AAAI Conference on Web and Social Media 2017, Montreal, CA November 10, 2018
2
Large Scale Social Media Analytics
Public Opinions, Emotions, Attitudes Paris attacks Nov Situational Awareness Nice terror attack Jul Psycho demographics Elections Stock Market Healthcare Analytics Protests, Civil Unrest Natural Disasters RU-UA Conflict 2014 – 2015 Cross lingual analysis => language-agnostic models Brussels bombing Mar November 10, 2018
3
Deleted and Suspended Accounts
1 in 10 accounts on Twitter is fake* 83M (8.7%) of Facebook users may be fake** ? * ** November 10, 2018
4
Suspended Accounts A. Deleted or de-activated by users
B. Suspended by Twitter social bots, spam or fraudulent accounts spread misinformation, deceptive content, propaganda account security at risk compromised or hacked abusive tweets or behavior threats, impersonation Twitter spam policy: November 10, 2018
5
Task Definition Representativeness Bias Validity Repreducibility
Develop models to predict to be deleted or suspended accounts on Twitter Contributions Go beyond non-linguistic clues: profile, network, and tweeting behavior Contrast the effectiveness of underexplored linguistic cues: topics, embeddings, writing style + images and affects Study suspicious behavior across languages: Russian, Spanish, English Impact Improve reproducibility and reduce bias Reduce misinformation and social bots November 10, 2018
6
Background We do: Deeper linguistic signals Images and affects
Influence Bot Detection Suspended Account Analysis Spam Account Detection Post Deletion Prediction Subrahmanian et al DARPA’s Twitter bot challenge Lawrence, 2015 Bot detection on RuNet Thomas et al M suspended accounts to characterize spammers’ behavior Bamman et al., 2012 Weibo deleted messages Martinez-Romo and Araujo 2013 LMs for deleted tweets Potash et al., 2016 classify deleted tweets using topics, embeddings Guo and Chen, 2014 style signals Lin and Huang, 2013 Lee, Eoff, Caverlee, 2011 behavior signals We do: Deeper linguistic signals Images and affects Neural network models Across languages November 10, 2018
7
Twitter Data Russian English Spanish
English and Spanish: Sept – Dec, 2015 (deleted/suspended), 2016 (active) “.. best herbs for weight loss begin with green tea...” Russian: Mar – Jun, 2015 (deleted/suspended) Dec 2015 (active) “…cyborgs hung the Ukrainian flag in Donetsk Airport…” Russian Dataset Papers Account Deletion Prediction on RuNet: A Case Study of Suspicious Twitter Accounts Active During the Russian-Ukrainian Crisis NAACL Workshop on Computational Approaches to Deception Detection. Contrasting Public Opinion Dynamics and Emotional Response during Crisis. SocInfo2016. November 10, 2018
8
Predictive Tasks and Models
Three prediction tasks: 3-class: Deleted vs. Suspended vs. Non-Deleted 2-class: Deleted + Suspended vs. Non-Deleted 2-class: Deleted vs. Suspended Classification models (10-fold c.v.): State-of-the-art models: Log Linear, SVM and Random Forest Neural network models: Long Short-Term Memory (LSTM) Embedding → recurrent (LSTM) → output layers Sigmoid activation and RMSprop optimizer Text pre-processing: lemmatizer (Russian) and stemmer (Spanish) Weighted F1 we calculate metrics for each label, and find their average, weighted by support (the # of true instances for each label). This alters macro F1 to account for label imbalance. Keras: scikit-learn: Keras optimizers: Keras activations: Lemmarizer: Stemmer: November 10, 2018
9
Predictive Signals How do they communicate? Who do they interact with?
Profile Followers, tweets, friend2follower, name chars/words, bio length, tweet rate Language Tweet ngrams (+LSA) binary/count-based, LDA topic clusters, embedding clusters Syntax & Style Tweet length words/chars, RT, URL, mention, hashtag ratios, uppercase, elong Network Mentions (+LSA), hashtags (+LSA) binary and normalized count-based unigrams Affect Proportion of tweets with six Ekman’s emotions, and three sentiments Images 2048-dim vector representations from fully connected layer in CNN What do they talk about? How do they communicate? Who do they interact with? What affects do they express? November 10, 2018
10
Underexplored Representations
Embeddings: 2K-dim vectors GLoVe (Pennington et al., 2014) NPMI (Lampos et al. 2014) Word2Vec (Mikolov et al. 2013) Topics: K-dim vectors LDA (Blei, Ng, and Jordan 2003) ⍺=0.1, β =0.005, t = 1000 topics Affects: dim vectors RU sentiment (Chetviorkin and Loukachevitch, 2014) RU emotions (Volkova et al, 2016) EN and ES affects (Volkova and Bachrach 2015) Images: dim vectors Inception v3 model on ImageNet (Russakovsky et al. 2015) November 10, 2018
11
Results: Deleted vs. Susp. vs. Non-Deleted
Network] Content Content > Network > Affects > Profile > Images EN > RU > ES (except profile for RU, F1=78) Style for EN (F1=87), tweet ngrams for RU (F1=82), ES (F1=74) November 10, 2018
12
Results: Deleted + Susp. vs. Non-Deleted
Network Content Easier task than 3-way with affect, profile and style for RU > EN, ES Tweets, Embeddings > Topics > Style > Network > Affect > Images Tweets for EN (F1=88) and RU (F1=87), embeddings for ES (F1=82) November 10, 2018
13
Results: LSTM vs. Log Linear
2-class: D vs. S 2-class: DS vs. ND LSTM > LL or comparable performance (DS vs. ND for EN and ES) Approach Predictive Signals Model F1 Benevenuto et al., 2010 Tweet, Behavior SVM 0.87 Martinez-Romo et al., 2013 Language Model 0.88 Lee et al., 2010 Syntax, Behavior RF 0.89 Yang et al., 2011 Syntax, Tweet This work, 2017 Tweet, Behavior, Embeddings LSTM 0.92 – 0.98 Tweet Deletion Spam Account November 10, 2018
14
Contrastive Profile + Style Analysis
Style Differences: D + S Shorter tweets Less elongated, capitalized words, and repeated punctuation Lower hashtag, mention and URL per word rate Less RTs, tweets with hashtags, URLs and mentions Less tweets with punctuation and emoticons Profile Differences: D + S Less followers, friends, tweets (except friends for RU) Lower friend-to-follower ratio (Lee, Eoff, and Caverlee 2011) Shorter bios (except ES) Shorter user names (Ferrara et al. 2016) Do not last long (Lee, Eoff, and Caverlee 2011; Thomas et al. 2011) ND accounts are more stylistically expressed than DS ND ↑ >> DS and RU >> ES > EN Except DS↑ elongations, (EN), emoticons (EN, ES) look like real users (avatars) similar followers, friends, tweeting behavior change this behavior over time
15
Contrastive Affect Analysis
Mean differences in affect features between DS and ND accounts. Psycholing support DS express opinionated content↑ (pos & neg, except EN) Anger↓, fear↓ and disgust↑ across languages Sadness↑, surprise↑, joy↑ except ES November 10, 2018
16
Key Findings Large-scale study on account deletion and suspension prediction Linguistic signals: ngrams, style, embeddings + image and affect signals Predictive performance varies across languages: EN, ES and RU Scalable and extendable approach Analyzed deleted, suspended vs. non-deleted user behavior on Twitter Non-linguistic behavior: Have significantly different profiles Network = Who they interact Linguistic behavior: Style = How they communicate (EN) Content = What they tweet – text, images (ES, RU) Affects = What affects they express (RU) Future work: network (graph embeddings), image representations, combine signals effectively (ensemble models), combine text + image (LSTM + CNN) Presence of certain topics, style, and ngrams in user tweets leads to a higher likelihood for that users’ account deletion or suspension November 10, 2018
17
Why to use our models Safer and healthier environment in SM
propaganda, misinformation, manipulative campaigns SM data reproducibility and reliability large-scale computational social science studies November 10, 2018
18
Svitlana Volkova, PhD Senior Research Scientist
Data Sciences and Analytics Group Computational Analytics Division National Security Directorate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.