Quantifying Deception Propagation on Social Networks

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1, Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language.
Welcome to EPS 525 Introduction to Statistics Dr. Robert Horn Summer 2008 Mondays – Thursdays 1:00 – 3:15 p.m.
Population Pyramids TEKS: WG7A
EBI Statistics 101.
∩ CCR Amsterdam Center for Career Research Careers in the Netherlands: ACCR national research initiative outline & preliminary results Claartje J. Vinkenburg.
The Effect of Gender on the Response to Technological Error Calvin Chan and Jamie Jin PSY/ORF 322 Spring 2005.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme
Population Pyramids. A special graph that shows the make-up of a population by age and gender. A special graph that shows the make-up of a population.
Why don’t women do as well as men in Chemistry Finals at Oxford ? Jane mellanby Department of Experimental Psychology.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
A statistical method for testing whether two or more dependent variable means are equal (i.e., the probability that any differences in means across several.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
CHARACTERIZATION OF USER BEHAVIOR IN SOCIAL NETWORKS TO BETTER UNDERSTAND CYBERBULLYING Homa Hosseinmardi Department of Computer Science University of.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Prediction of Influencers from Word Use Chan Shing Hei.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
1 United States Education at a Glance 2015 Andreas Schleicher Director for Education and Skills Release date: 24 November 2015.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Research -Graph Search for “Interests” -See above -Find large interest -Take that interest to the AI tool AI Tool -Input the interest -Refine Demographics.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
1 Effects of Outcome Expectancies on Chinese Adolescents’ Gambling Intention Wong Sau Kuen Stella, PhD Department of Applied Social Sciences The Hong Kong.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Income and Education Less than 9th 13,59113,32913,65813,90314,05214,21113,89414,16814,93415,09415,16615,47715,461.
Comparing Two Proportions Chapter 21. In a two-sample problem, we want to compare two populations or the responses to two treatments based on two independent.
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Svitlana Volkova 1, Yoram Bachrach 2 1 Center for Language and.
Kaitlyn Patterson & Wendy Wolfe
Candace Pang1 and Elizabeth Price2; Mentors: Dr. Chien-fei Chen3, Dr
Alison Burros, Kallie MacKay, Jennifer Hwee, & Dr. Mei-Ching Lien
Taking Part 2008 Multivariate analysis December 2008
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
Shift Happens 2016.
ANOVA: Multiple Comparisons & Analysis of Variance
business analytics II ▌assignment one - solutions autoparts 
Estimating the Formation Rates of Heavy Metal Polyatomic Species in Plutonium Analyses with an HR-MC-ICP-MS with a Desolvating Nebulizer Alex Mitroshkov,
Chi-Square X2.
Introduction to Population Pyramids
Objective: Given a data set, compute measures of center and spread.
Forecasting the Future using Diverse Social Media Sources
Visualizing Spatiotemporal Embeddings Demo
Spearman’s rho Chi-square (χ2)
Lexical: Words vs. Characters Syntactic and Stylistic
Deceptive News Prediction Clickbait Score Inference
Dieudo Mulamba November 2017
Algorithmic Glass Ceiling in Social Networks
Receiver Interpretations of Emoji Functions: A Gender Perspective
Dot and stem-and-leaf plots
Proportion of Original Tweets
Authors: S. Volkova, J. JanG, Presenter: Maria Glenski
Exploring the relationship between Authentic Leadership and Project Outcomes and Job Satisfaction with Information Technology Professionals by Mark A.
Practice Mid-Term Exam
Introduction to Statistics
CHAPTER 1 Exploring Data
Show suggestions and borderlines Hierarchical Clustering
Deep Learning Research & Application Center
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
A Network Science Approach to Fake News Detection on Social Media
Forward and Backward Max Pooling
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Plainfield, NJ.
Section 11-1 Review and Preview
Westfield, NJ.
Heterogeneous convolutional neural networks for visual recognition
Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.
Presentation transcript:

Quantifying Deception Propagation on Social Networks Maria Glenski † and Svitlana Volkova Data Sciences and Analytics Group, National Security Directorate † Computer Science and Engineering, University of Notre Dame NOTE: Print this poster file at 100% SCALE to result in a physical print measuring 48” wide x 36” tall. All type size notations shown above are based on the final printed size of the poster. • Contact Digital Duplicating (375-2969, http://digitalduplicating.pnl.gov) to order poster printing and finishing services for your completed poster design. • Remember to have your poster cleared for public display/distribution through the ERICA Information Release system (http://erica.pnl.gov). • Sidebar “About PNNL” box is considered optional, and can be removed if space is needed for technical content. Data Collection Inferring User Attributes Predicted User Demographics 282 Twitter accounts previously identified1 as spreading verified (V) or deceptive news. Deceptive news sources are categorized as clickbait (CB), conspiracy (CS), propaganda (P), or disinformation (D). Twitter data: collected all tweets posted in 2016 directly retweeting or mentioning verified or deceptive sources. 145,688 users with ≥ 5 English-interactions with deceptive sources. Each user classified as person or organization using the Humanizr classifier2. 66,171 person-users with ≥200 public tweets available. Learned attribute-specific CNN models initialized with 200-dim GLOVE embeddings using annotated data3: AUC ROC Gender 0.89 Age 0.72 Income 0.72 Education 0.76 Inferred attributes of each user from their 200 most recent tweets. Summary of attributes and the number of users classified as each demographic by our models: Attribute Class N Gender M : Male 63,259 F : Female 2,912 Age Y : 24 and younger 3,336 O : 25 and older 62,835 Income B : Below $35k 12,885 A : At least $35k 53,286 Education H : High School 11,807 D : Bachelor’s Degree 54,364 V CB CS P D % Person 91.47 92.33 91.50 93.03 89.76 # Users 83,785 8,367 16,520 57,314 30,456 Intent to Deceive V CB CS P D . . . Dense Layer (100 units) Convolutional Layer (100 units) Embedding Layer (200 units) Word Sequence Input Attribute Class Activation Layer (soft max) CNN architecture Distributions of each attribute within each population and all users overall: % Population Population Type # sources # tweets # users posting # users mentioned V 182 6,567,002 1,423,227 390,164 CB 11 40,347 19,361 6,002 CS 13 126,246 35,799 9,171 P 26 609,251 233,799 34,532 D 50 3,487,732 292,437 82,638 Total 282 10,819,357 1,784,655 471,967 Who is more likely to propagate? Is there some group of users responsible for most of the propagation? Mann Whitney U tests comparing binary features of whether users shared from sources of each type at least once, find who is more likely (p < 0.01) to propagate from each type: We look at the Palma Ratios for the distributions of retweet frequencies of user-source pairs within each population where Palma Ratio= 𝑇𝑜𝑝 10% 𝐵𝑜𝑡𝑡𝑜𝑚 40% . If all users share an equal number of tweets, Palma Ratio= 1 4 The most active 10% of younger users retweeting disinformation sources share 41.34 times what the least active 40% do. Propagation of disinformation sources is more often spread by the same top 10% of users with higher incomes than lower incomes. V CB CS P D Gender Age Income Education Women retweet conspiracy theory more evenly than men. #1 Overall, disinformation is propagated most unevenly, followed by conspiracy theory. The biggest differences in behavior also occur with disinformation and conspiracy theory sources. Key M: Male F: Female There is always a group of users sharing more than others. All Palma Ratios > 0.25 Users in different age brackets have the greatest difference in Palma Ratios for verified sources. Y: 24 and younger O: 25 and older V CB CS P D M-F 1.39 0.32 5.14 0.54 7.03 Y-O -2.30 -0.35 -4.30 -0.67 23.19 B-A 0.72 2.75 1.02 26.50 H-D -0.40 0.51 2.86 24.86 V CB CS P D M/F 1.45 1.23 1.84 1.18 1.39 Y/O 0.49 0.80 0.35 0.81 2.28 B/A 1.07 1.43 1.46 1.3 3.95 H/D 0.91 1.30 1.48 1.21 3.33 B: Below $35k A: At least $35k H: High school D: Bachelor’s Degree Comparing populations using deltas (left) and ratios (right) of the Palma Ratios of each, by attribute. Volkova, S., et al. "Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter." ACL 2017. 3. Volkova, S. and Bachrach, Y. Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast. ACL 2016. 2. McCorriston, J., et al. "Organizations Are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter." ICWSM 2015. This research was conducted under the Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory, a multi-program national laboratory operated by Battelle for the U.S. Department of Energy