A Network Science Approach to Fake News Detection on Social Media

Slides:

Advertisements

Similar presentations

Herbert SHIU Joseph FONG Jeanne Lam.  Introduction  Facebook features  Facebook as an education platform  Case study  Conclusion.

Advertisements

UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.

Final Presentation Undergraduate Researchers: Graduate Student Mentor: Faculty Mentor: Jordan Cowart, Katie Allmeroth Krist Culmer Dr. Wenjun Zeng Investigating.

ES: Expert Systems n Knowledge Base (facts, rules) n Inference Engine (software) n User Interface.

Introduction Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Introduction Facebook How does Facebook use your data? Where do you think.

A Social Help Engine for Online Social Network Mobile Users Tam Vu, Akash Baid WINLAB, Rutgers University May 21,

LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.

Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland.

1 Speaker : 童耀民 MA1G Authors: Ze Li Dept. of Electr. & Comput. Eng., Clemson Univ., Clemson, SC, USA Haiying Shen ; Hailang Wang ; Guoxin.

A Media-based Social Interactions Analysis Procedure Alan Keller Gomes and Maria da Graça Campos Pimentel SAC’12 17 March 2015 Hyewon Lim.

Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.

Protecting Sensitive Labels in Social Network Data Anonymization.

This project is co-funded by the European Union T2.4: Social Media Data Capturing Module Brussels, 16 December 2014.

Social Media 101 The Basics Policies for Employees.

Advanced Higher Computing Science

Gross Niv Analyzing Spammer’s Social Networks for Fun and Profit

Creating your online identity

Social Media and Marketing Plan

SOCIAL MEDIA INVESTING IN OUR FUTURE AND LEGACY

Stress Detection android Application Thomas Wolf

Name: Sayyid Ali Hussein Gaabane

UML Diagrams By Daniel Damaris Novarianto S..

Facebook in the Classroom

click your mouse or hit enter to advance animation

They’re there, but are they YOURS?

Object-Oriented Analysis and Design

Evaluating state of the art in AI

PYP Sekolah Ciputra New Digital Portfolio.

The important use of Twitter in the Educators’ World

E-Commerce Theories & Practices

Dr. Michael Zimmer Jenna Willoughby

Assessing Credibility

Dos and Don’ts of Social Media

June 2017 High Density Clusters.

MID-SEM REVIEW.

Gregory LaFlash Patrick O’Loughlin

From the Information Super Highway to the Cloud

Dieudo Mulamba November 2017

November 8th, 2017 Matthew Davis and John Fink

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Targeting Social Media Campaigns

B.Ramamurthy Partially Based on Ben Jones Book [1]

Exploratory Card Sorts

Smart Portal To Protect Child Online

Using and Understanding Social Media

The Role of Prototyping

in the Social Studies Classroom

Guidelines on using Social Media

GOOGLE + Google+ (pronounced Google plus) is a Google social networking project. It lunched in June 2011 and there are more than 212 million active users.

iSRD Spam Review Detection with Imbalanced Data Distributions

Consumer Behaviour PROJECT WORK Laura Grazzini

Social Media Marketing Strategy Template

A Network Science Approach to Fake News Detection on Social Media

CS 594: Empirical Methods in HCC Social Network Analysis in HCI

ARCHITECTURE OVERVIEW

Ying Dai Faculty of software and information science,

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

Digital Defence Diplomacy

Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita

Graph and Link Mining.

Advertising, Branding, and social media

Day 3 Outline Social media overview + trends Social media strategy

Building Topic/Trend Detection System based on Slow Intelligence

BACKGROUND & MOTIVATION

Jana Diesner, PhD Associate Professor, UIUC

Yining ZHAO Computer Network Information Center,

UML  UML stands for Unified Modeling Language. It is a standard which is mainly used for creating object- oriented, meaningful documentation models for.

Social Media Marketing Strategy Template

Presentation transcript:

A Network Science Approach to Fake News Detection on Social Media Research Methodology Isayas Adhanom

Outline Overview Data Collection Data Analysis Conclusion Discussion

Overview

Overview Why fake news detection? 2.46 Billion Social media users More than 67% of Americans get news from social media Many people believe fake news stories Has severe impact on consumers and social order.

Overview Contd... Why did we choose Facebook? Around 2 billion active monthly users. Does not get as much focus as twitter in relation to fake news detection. Fake news stories are more prevalent in Facebook than other social media platforms.

Overview Contd... Fake news stories are getting more engagement than real news stories in Facebook.

Overview Contd... Why use complex networks? The nature of fake news posts make them hard to spot using AI, NLP or other content based methods alone. We can construct social network graphs, analyze the structure and dynamics of the communities, and develop inferences from them.

Overview Contd... What do we want to accomplish? We want to model the behaviour of Facebook users around fake news posts. We want to compare this with earlier models of Facebook user’s behaviour, and with new models we create from our non-fake news posts. We want to see if the users form echochambers as proposed in earlier studies. We also want to see what kind of communities the users form.

Data Collection

Data Collection Obtaining fake news posts is challenging due to two reasons: Categorizing news posts as fake news is highly subjective and ambiguous. Fake news stories are routinely taken down by social media accounts.

Data Collection Contd... Obtaining fake news posts from Facebook is also harder than Twitter due to two reasons: Facebook has strict data privacy rules. Facebook gives limited access to data. Does not allow you to see friendships.

Data Collection Contd... We have used a data set compiled by BuzzFeed, where more than 2,282 Facebook posts were analyzed and labeled based on their truthfulness. To prepare this dataset, BuzzFeed News selected three large hyperpartisan Facebook pages each from the right and from the left, as well as three large mainstream political news pages. All pages are facebook verified pages.

Data Collection Contd... The data was collected over the course of seven weekdays (Sept. 19 to 23 and Sept. 26 and 27). The posts were rated as ”mostly true,” ”mixture of true and false,” or ”mostly false.” Post that were satirical or opinion-driven, or that otherwise lacked a factual claim, were rated as ”no factual content.” The posts were also classified as a link, photo, video, or text.

Data Collection Contd... Raters were asked to provide notes and sources to explain their rulings of ”mixture of true and false” or ”mostly false.” If two reviews of the same post did not match, the post was reviewed by a third person. All the post that were rated as false were also reviewed by that same third person.

Data Collection Contd... Out of the 2,282 Facebook posts: 1,145 posts were from mainstream pages, 666 were from hyper partisan right-wing pages, and 471 were from hyper partisan left-wing pages Finally the data was collected into a CSV file with the link to the original facebook posts.

Data Collection Contd...

Data Collection Contd... In order to extend this dataset and to collect data about the actual users who interacted with these news posts, we need to access historical data on Facebook. The best way to access, collect and store data from social media platforms is generally through Application Programming Interfaces (APIs) . An API is a set of clearly defined methods of communication between various software components.

Data Collection Contd... Facebook has strict data privacy policies that keep changing, and allow you to mine only public data from pages or from users who have authenticated your app. All our news posts were post in public pages, so we can get data from them. The data we can get is still limited.

Data Collection Contd... The data we can get includes: The IDs of the users who commented on the posts, and the time of the comments. The IDS, names and type of reaction of the users who reacted on the posts. The number of shares. However, we can not get the friendship status of the users.

Data Collection Contd... We will use Facebook Graph API version 2.10 (the new API version 2.11 does not allow you to get the IDs or names of the users who engaged with the posts). We will implement our programs using the Facebook graph API on Python 2.7. We will get JSON data from Facebook and clean it using our own procedures implemented with Pyhton 2.7.

Example of our JSON data (reactions).

Example of our JSON data (comments).

Data Collection Contd... All the data we collect and use will be publicly available data. We will use only public data on Facebook that can be accessed through the Facebook Graph API. We will redact our data (especially the usernames and user IDs) to hide any private data that might be used to identify individual users. We will abide by all terms, conditions and privacy policies of Facebook and BuzzFeed news.

Analysis

Data Analysis We will clean our data and denote the pages and the user engagements as nodes. The nodes can be categorized into two disjoint sets (pages and user engagements). This dataset can be represented as a bipartite graph. Thus, we will represent our data as a bipartite graph of facebook pages and users who engaged with them.

Bipartite Graph (from the book Network Science by A. L. Barabasi)

Data Analysis Contd... We will create a bipartite projection of the posts-users graph. In the projections the nodes are the posts and two posts are connected if they are both by at least one user . The weight of a link is determined by the number of common users who engaged with the two posts.

Data Analysis Contd ... We will use the Fast Greedy (FG) algorithms to detect communities (clusters) in our graph. We will compare the communities formed with the partisan data from our original dataset. We will try to see if the communities are formed along partisan lines, or if they form other patterns. We will use R and Pajek for analysing and modelling our data.

Data Analysis Contd ... We will implement these methods by dividing our data into fake news posts and real news posts. We will use this a control measure in our experiment. We will then see if there is are differences in user behaviour aroun fake news posts and real news posts.

Conclusion

Conclusion Get data from BuzzFeed News Extend the data set using data from Facebook. Build bipartite graph projections from our dataset. Detect communities around fake news posts and real news posts. Analyze our results to detect unusual user behavior.

Any questions?