A Network Science Approach to Fake News Detection on Social Media

A Network Science Approach to Fake News Detection on Social Media
Research Methodology Isayas Adhanom

Outline Overview Data Collection Data Analysis Conclusion Discussion

Overview

Overview Why fake news detection? 2.46 Billion Social media users
More than 67% of Americans get news from social media Many people believe fake news stories Has severe impact on consumers and social order.

Overview Contd... Why did we choose Facebook?
Around 2 billion active monthly users. Does not get as much focus as twitter in relation to fake news detection. Fake news stories are more prevalent in Facebook than other social media platforms.

Overview Contd... Fake news stories are getting more engagement than real news stories in Facebook.

Overview Contd... Why use complex networks?
The nature of fake news posts make them hard to spot using AI, NLP or other content based methods alone. We can construct social network graphs, analyze the structure and dynamics of the communities, and develop inferences from them.

Overview Contd... What do we want to accomplish?
We want to model the behaviour of Facebook users around fake news posts. We want to compare this with earlier models of Facebook user’s behaviour, and with new models we create from our non-fake news posts. We want to see if the users form echochambers as proposed in earlier studies. We also want to see what kind of communities the users form.

Data Collection

Data Collection Obtaining fake news posts is challenging due to two reasons: Categorizing news posts as fake news is highly subjective and ambiguous. Fake news stories are routinely taken down by social media accounts.

Data Collection Contd... Obtaining fake news posts from Facebook is also harder than Twitter due to two reasons: Facebook has strict data privacy rules. Facebook gives limited access to data. Does not allow you to see friendships.

Data Collection Contd... We have used a data set compiled by BuzzFeed, where more than 2,282 Facebook posts were analyzed and labeled based on their truthfulness. To prepare this dataset, BuzzFeed News selected three large hyperpartisan Facebook pages each from the right and from the left, as well as three large mainstream political news pages. All pages are facebook verified pages.

Data Collection Contd... The data was collected over the course of seven weekdays (Sept. 19 to 23 and Sept. 26 and 27). The posts were rated as ”mostly true,” ”mixture of true and false,” or ”mostly false.” Post that were satirical or opinion-driven, or that otherwise lacked a factual claim, were rated as ”no factual content.” The posts were also classified as a link, photo, video, or text.

Data Collection Contd... Raters were asked to provide notes and sources to explain their rulings of ”mixture of true and false” or ”mostly false.” If two reviews of the same post did not match, the post was reviewed by a third person. All the post that were rated as false were also reviewed by that same third person.

Data Collection Contd... Out of the 2,282 Facebook posts:
1,145 posts were from mainstream pages, 666 were from hyper partisan right-wing pages, and 471 were from hyper partisan left-wing pages Finally the data was collected into a CSV file with the link to the original facebook posts.

Data Collection Contd...

Data Collection Contd... In order to extend this dataset and to collect data about the actual users who interacted with these news posts, we need to access historical data on Facebook. The best way to access, collect and store data from social media platforms is generally through Application Programming Interfaces (APIs) . An API is a set of clearly defined methods of communication between various software components.

Data Collection Contd... Facebook has strict data privacy policies that keep changing, and allow you to mine only public data from pages or from users who have authenticated your app. All our news posts were post in public pages, so we can get data from them. The data we can get is still limited.

Data Collection Contd... The data we can get includes:
The IDs of the users who commented on the posts, and the time of the comments. The IDS, names and type of reaction of the users who reacted on the posts. The number of shares. However, we can not get the friendship status of the users.

Data Collection Contd... We will use Facebook Graph API version 2.10 (the new API version 2.11 does not allow you to get the IDs or names of the users who engaged with the posts). We will implement our programs using the Facebook graph API on Python 2.7. We will get JSON data from Facebook and clean it using our own procedures implemented with Pyhton 2.7.

Example of our JSON data (reactions).

Example of our JSON data (comments).

Data Collection Contd... All the data we collect and use will be publicly available data. We will use only public data on Facebook that can be accessed through the Facebook Graph API. We will redact our data (especially the usernames and user IDs) to hide any private data that might be used to identify individual users. We will abide by all terms, conditions and privacy policies of Facebook and BuzzFeed news.

Analysis

Data Analysis We will clean our data and denote the pages and the user engagements as nodes. The nodes can be categorized into two disjoint sets (pages and user engagements). This dataset can be represented as a bipartite graph. Thus, we will represent our data as a bipartite graph of facebook pages and users who engaged with them.

Bipartite Graph (from the book Network Science by A. L. Barabasi)

Data Analysis Contd... We will create a bipartite projection of the posts-users graph. In the projections the nodes are the posts and two posts are connected if they are both by at least one user . The weight of a link is determined by the number of common users who engaged with the two posts.

Data Analysis Contd ... We will use the Fast Greedy (FG) algorithms to detect communities (clusters) in our graph. We will compare the communities formed with the partisan data from our original dataset. We will try to see if the communities are formed along partisan lines, or if they form other patterns. We will use R and Pajek for analysing and modelling our data.

Data Analysis Contd ... We will implement these methods by dividing our data into fake news posts and real news posts. We will use this a control measure in our experiment. We will then see if there is are differences in user behaviour aroun fake news posts and real news posts.

Conclusion

Conclusion Get data from BuzzFeed News
Extend the data set using data from Facebook. Build bipartite graph projections from our dataset. Detect communities around fake news posts and real news posts. Analyze our results to detect unusual user behavior.

Any questions?

A Network Science Approach to Fake News Detection on Social Media

Similar presentations

Presentation on theme: "A Network Science Approach to Fake News Detection on Social Media"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Network Science Approach to Fake News Detection on Social Media

Similar presentations

Presentation on theme: "A Network Science Approach to Fake News Detection on Social Media"— Presentation transcript:

Similar presentations

About project

Feedback