Download presentation
Published byDamon Joshua Marshall Modified over 9 years ago
1
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2
2
Outline Introduction Background Study Approach For Detection
Experimental Setup Evaluation Conclusion
3
WHAT IS RUMOR?
4
Rumor is a controversial, fact-checkable statement
5
Rumor is a controversial, fact-checkable statement
Malaysia airline MH370 is missing Malaysia airline MH370 crashed
6
Rumor is a controversial, fact-checkable statement
Recreational Marijuana should be made legal Recreational Marijuana becomes legal in Michigan Malaysia airline MH370 is missing Malaysia airline MH370 crashed
7
Introduction It is very difficult to claim that every post on social media is a factual claim The broad success of online social media has created fertile soil for the emergence and fast spread of rumors. This paper proposes an automated tool to identify potential Rumors
8
Spread of Rumor Oh my god is this real? Breaking: Two Explosions in the White House and Barack Obama is injured Is this true? Or hacked account? Breaking: Two Explosions in the White House and Barack Obama is injured Is this real or hacked? Breaking: Two Explosions in the White House and Barack Obama is injured Is this legit? Breaking: Two Explosions in the White House and Barack Obama is injured
9
Detecting Rumor Rumors are basically judge on the key phrases it has –
“Is this true?” “Really?” “What? The paper proposes algorithm for identifying newly emerging, controversial topics that is scalable to massive stream of tweets i.e. signal tweets Then it identifies a set of regular expressions that define the set of signal tweets The key insight is that some people who are exposed to a rumor, before deciding whether to believe it or not, will take a step of information enquiry to seek more information or to express skepticism without asserting specifically that it is false
10
Related Work Detection Problems in Social Media!
The work on detecting rumor has started in recent years. Sharing/ Retweeting / Trending determines it’s a rumor or not. Question Asking in Social Media Another detection feature used in related work is question asking. Mendoza et al. found on a small set of cases that false tweets were questioned much more than confirmed truths. Detection using question mark! Previous work has shown that only one third of tweets with question marks are real questions, and not all questions are related to rumors.
11
Problem Statement Rumor Cluster We define a rumor cluster R as a group of social media posts that are either declaring, questioning, or denying the same fact claim, s, which may be true or false. Let S be the set of posts declaring s, E be the set of posts questioning s, and C be the set of tweets denying s, then R = S ∪ E ∪ C. We say s is a candidate rumor if S ≠ ∅ and E ∪ C ≠ ∅. The paper’s objective is to minimize the delay from the time when the first tweet about the rumor is posted to the detection time. RUMOR Fact Checkable Controversial
12
Approach for Detection
13
Detection of rumors Identify Signal Tweets Identify Signal Clusters
Detect Statements Capture Non- signal Tweets Rank Candidate Rumor Cluster
14
Identify Signal Rumor If we want to detect rumors, the first thing we should know is what rumors look like. Author defines rumors as a verification of a piece of factual knowledge, i.e. “According to the Mayan Calendar, does the world end on Dec 16th, 2013?”. Or as corrections (debunks) of a question. i.e. “This new is true!”
15
What we need is more than theory
Using Porter Stemmer and Chi-Squared algorithm on tweets, with 3423 tweets labeled as verification or correction, and we draw the pattern of good signals.
16
Identify Signal Clusters
What is Signal Cluster? After a rumor tweet emerges, people might follow, i.e. retweet it or come up with a new one containing similar information, thus forming a group or cluster. Is it true? Two explosions in the White house and Barack Obama is injured! What? An eight year girl died at Boston marathon explosion. The shocking new is tested be to wrong!
17
How do we do it? Use connected component clustering algorithm, Jaccard Similarity algorithm and Minhash algorithm to achieve it. What??!! Two Explosions in the White House and Barack Obama is Injured in head. Is it true?? Two Explosions in the White House and Barack Obama is Injured on arm. Two Explosions in the White House and Barack Obama is not Injured.
18
Detect Statement Right now what we get is a few clusters of potential rumors, not sure about the content. Our goal is the rumor content, not the pattern. Which one to draw out?
19
A way out Just pick out the statement that appears more often than 80% of other statements. Why 80? Have higher probability to be a rumor! What??!! Two Explosions in the White House and Barack Obama is Injured Is it true?? Two Explosions in the White House and Barack Obama is Injured Two Explosions in the White House and Barack Obama is Injured
20
Compare Non-Signal Tweets
Remember when we detect rumor clusters, using signals. Tweets not belong to verification or correction, but also can bear rumor information. Match those statements with non-signal tweets. Also using Jaccard similarity. If the score > 0.6, we can say they matched.
21
Rank candidate rumor clusters
Ranking rumor cluster Percentage of signal tweets Entropy ratio Tweet lengths Retweets URLs Hashtags @ Mentions Till now, in network, we have got several rumor clusters. Each cluster stands for one rumor statement. But output should be one, the most potential rumor. Popularity? NO! i.e. funny picture or touching video.
22
Experimental Setup
23
Data Sets BOSTON MARATHON BOMBING (high-profile event) Two bombs exploded at the finish line of the annual Boston Marathon competition on April 15th, 2013 which contains 30,340,218 unique tweets. GARDENHOSE (random sample) Collected a tweet stream in a random month of the year (November 1 to November 30, 2013) which contains 1,242,186,946 tweets.
24
Baselines and Variants of Methods
1. Trending Topics 2. Hash tag Tracking 3. Corrections Only 4. Enquiries and Corrections Rank candidate rumors purely by popularity, the number of tweets in the cluster.(identify signal tweets) 5. SVM ranking 6. Decision tree ranking Use both enquiry and correction tweets as signals.(rank the candidate rumor clusters)
25
Effectiveness of Enquiry Signals
Precision of Candidate Rumor Clusters Precision of rumor detection using different signals. Candidate rumors ranked by popularity only. Maximum number of output rumor clusters: 10 per hour for BOSTON and 50 per day for GARDENHOSE.
26
Effectiveness of Enquiry Signals
Earliness of Detection Earliness of detection comparing to Enquiries+ Corrections: enquiry signals help to detect rumors hours
27
Ranking Candidate Rumor Clusters
@N is the percentage of real rumors among the top N candidate rumor clusters output by the a method. of different ranking methods
28
Effectiveness of Enquiry Signals
In order to verify that the ranking algorithm is not overfitting only one data set, We also applied the decision tree trained using 7 days of labeled results in GARDENHOSE data set to rank rumor clusters detected hourly from BOSTON data set. if rumor clusters are ranked by the Decision Tree. One third of top 50 clusters are real rumors.
29
Efficiency of Framework
Filtering of tweets Clustering Potential rumor statements The cost is significantly reduced as compared to approach which first generates trending topics and then identify rumors.
30
Time Comparison Trending Topics: Clustering Hashtag Tracking:
Filtering & Clustering This Method: Filtering, Clustering then retrieving back Same clustering and ranking implementation was used except filtering tweets with enquiry and tweets were not retrieved back after clustering.
31
Tracking Rumor Using Enquiry Method
Tracking detected rumors about Boston Marathon Bombing
32
Conclusion Method which capitalizes on verification questions which also appear sooner facilitating early detection. Cluster only those tweets that contain enquiry patterns, extract the statements and use them to pull back in the rest of the non-signal tweets. Robust even with tweets exceeding 100 million. Future work- Signal labelled by humans to have iterative improvements Improving the filtering of enquiry and correction signal by training a classifier
33
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.