Authors: S. Volkova, J. JanG, Presenter: Maria Glenski

Authors: S. Volkova, J. JanG, Presenter: Maria Glenski
Misleading or Falsification? Inferring Deceptive Strategies and Types in Online News and Social Media Authors: S. Volkova, J. JanG, Presenter: Maria Glenski Data Sciences and Analytics, National Security Directorate, Pacific Northwest National Laboratory WWW Track on Journalism, Misinformation and Fact-Checking, April 25th, Lyon, France WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Deceptive News Shared Online
WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

WWW Track on Journalism, Misinformation and Fact-Checking
Contributions Recent work: Psycholinguistic analysis across deception types in the news pages (Rashkin et al., 2017) Predicting credibility of PolitiFact statements (Rashkin et al., 2017; Wang et al., 2017) and analyzing credibility of tweets (Mitra et al., 2017) Models to classify deceptive news types on Twitter (Volkova et al., 2017) Our approach: Focusing on deception types and deception strategies – misleading and falsification Verifying model generalizability across domains: news pages, tweets and summary statements Qualitatively analyze writers’ intent behind misinformation: psycholinguistic signals moral foundations connotations WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Deception Types and Strategies
Disinformation: false facts to deliberately deceive the audience VS. Misinformation is conveyed in the honest but mistaken belief that the relayed incorrect facts are true Propaganda: a form of persuasion to influence audiences via controlled transmission of deceptive, selectively omitting, and one-sided messages Hoax: type of misinformation that aims to deliberately deceive the reader Misleading: topic changes, irrelevant information, and equivocations Falsification: with contradictions or distortions WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Task Definition Build generalizable predictive models to differentiate between deception types and strategies in news across domains Deception strategies: misleading vs. falsification Deception types: propaganda vs. hoax vs. disinformation More Intent to Deceive Less Intent to Deceive Falsification Misleading Over time, connections between accounts within similarity network appear and disappear Amount of similarity between accounts is also temporally dynamic More Intent to Deceive Less Intent to Deceive Disinformation Propaganda Hoax WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Datasets: Deception Strategies
Domains Misleading Falsification Summaries 616 1,376 News Pages 81 85 Tweets 96 109 Confirmed cases of disinformation summaries from the European Union’s East Strategic Communications Task Force: Falsification: unprovable, no evidence, no proof, no supporting evidence, Crowdsourcing: pairwise inter-annotator agreement kappa is 0.64 (5 annotators) Followed URLs in disinformation summaries to collect the original news pages Queried Twitter public API using SVO and timestamps to extract unique disinfo tweets Parsed summaries, news pages and tweets using SyntaxNet to extract SVO tuples Understand agents and themes of deception Contrast connotations/perspectives across deception types WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Datasets: Deception Types
Domains Propaganda Hoaxes Disinformation News Pages 17,872 5,297 166 Tweets 3,834 453 205 Collecting propaganda and hoax news pages and tweets: Downloaded 17,872 propaganda (ActivistPost), 5,297 hoax (DCGazette) news pages Collected the corresponding propaganda and hoax tweets using public Twitter API Collecting disinformation news pages and tweets: Followed URLs in disinformation summaries to collect the original news pages Queried Twitter public API using SVO and timestamps to extract disinformation tweets WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Predictive Models and Signals
Machine learning models: MaxEntropy and RandomForest1 Neural network models: Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN)2 Predictive signals: Content: TFIDF, dimensionality reduction, GloVe embeddings3 Style, syntax, complexity and readability: Automated Readability Index (ARI), Flesch-Kincaid readability tests, Coleman-Liau index4 Biased language: intensifiers, dramatic adverbs, assertive, imperative, report verbs Moral foundations: care and harm, fairness and cheating, loyalty and betrayal, authority and subversion, purity and degradation Psycholinguistic signals: imperative commands, personal pronouns, emotional language, quotations, and inclusions What is being discussed How the content is being discussed Lexicons How emotional, subjective the discussion is 1https://nlp.stanford.edu/projects/glove/ 2http://scikit-learn.org/stable/ 3https://keras.io/ 4https://github.com/nltk/nltk_contrib/tree/master/nltk_contrib/readability WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Deception Strategy Classification Results: Misleading vs. Falsification The best models are LSTM and MaxEntropy Falsification strategy is easier to identify than misleading strategy Deceptive strategies are easier to predict in tweets than in summaries and news Predictive signals: connotations (summaries), moral foundations, biased language and psycholinguistic cues (news pages), syntax and connotations (tweets) WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Deception Type Classification Results: Propaganda vs. Disinformation vs. Hoax The best performing model is LSTM Disinformation is easier to predict than propaganda or hoaxes Deceptive news types – disinformation, propaganda, and hoaxes, unlike deceptive strategies, are more salient, and easier to identify in tweets than in news pages Predictive signals: content (summaries and tweets) WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Connotation Analysis: Background
Identify writers’ intent behind digital misinformation by analyzing psycholinguistic signals – moral foundations and connotations extracted from different types of deceptive news WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Connotation Analysis Results: Disinformation
Writer → agent Writer → theme Implications: Quantitatively demonstrate how agents and themes of strategic deception vary across deception types Qualitatively identify the hidden agenda of content WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Linguistic Realizations of Deception
Misleading vs. Falsification Significant differences in subjective language and moral foundations Misleading statements are more subjective than falsified statements in summaries and news pages but not tweets Falsified compared to misleading statements include more: Harm+ and Ingroup+ signals in tweets Affect terms in tweets Tweets Implications: Build models for factuality assessment without external knowledge Improve fact-checking systems by going beyond fake news classification WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Summary and Future Work
Predictive signals: Content + moral foundations and connotations are more predictive of deception strategies than style and syntax Content is the most predictive of deception types Predictive models: LSTMs achieve higher performance compared to ML models Deception types: Disinformation is less difficult to predict compared to hoaxes and propaganda Deception strategies: Falsification strategy is easier to infer than misleading strategy Content is the most predictive of misleading strategy across all domains How emotional and subjective the discussion is What is being discussed Domains: Deception types, unlike deception strategies, are easier to identify in tweets than in news pages WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Future Work Multilingual, multimodal (text and images) deception classification Misinformation propagation and influence (deception types, languages) Reactions to deceptive news across platforms: Reddit and Twitter References: Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter. S. Volkova, K. Shaffer, J. Jang and N. Hodas. ACL 2017. Truth of Varying Shades: On Political Fact-Checking and Fake News. H. Rashkin, E. Choi, J. Jang, Y. Choi, and S. Volkova. EMNLP 2017. Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models. M. Glenski, E. Ayton, D. Arendt and S. Volkova. Proceedings of Google Clickbait Workshop Domains: Deception types, unlike deception strategies, are easier to identify in tweets than in news pages WWW Track on Journalism, Misinformation and Fact-Checking November 19, 2018

Svitlana Volkova Senior Research Scientist
Data Sciences and Analytics Group Computational and Statistical Analytics Division

Authors: S. Volkova, J. JanG, Presenter: Maria Glenski

Similar presentations

Presentation on theme: "Authors: S. Volkova, J. JanG, Presenter: Maria Glenski"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Authors: S. Volkova, J. JanG, Presenter: Maria Glenski

Similar presentations

Presentation on theme: "Authors: S. Volkova, J. JanG, Presenter: Maria Glenski"— Presentation transcript:

Similar presentations

About project

Feedback