Download presentation
Presentation is loading. Please wait.
1
Disinformation on the Web:
Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ.
2
Web: Source of information
3
Web: Source of false information
4
Types of false information
Misinformation honest mistake Disinformation deliberate lie to mislead Hoax “deliberately fabricated falsehood made to masquerade as truth” Wikipedia
5
The free encyclopedia that anyone can edit
Why Wikipedia? Freely accessible Large reach Major source of information for many Easy to add (false) information The free encyclopedia that anyone can edit
6
Hoaxes on Wikipedia
7
What are the hoaxes like?
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?
8
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
9
Data: Wikipedia Hoaxes
Hoax article vs hoax facts 21,218 hoax articles Hoax lifecycle:
10
What are the hoaxes like?
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?
11
Impact of hoaxes “The worst hoaxes are those which
(a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.” Jimmy Wales on Quora
12
“The worst hoaxes are those which (a) last for a long time”
Impact of hoaxes “The worst hoaxes are those which (a) last for a long time” 0.99
13
“The worst hoaxes are those which (b) receive significant traffic”
Impact of hoaxes “The worst hoaxes are those which (b) receive significant traffic” 10 100 500
14
Impact of hoaxes 1.08 active inlinks “The worst hoaxes are those which
(c) are relied upon by credible news media” 1.08 active inlinks
15
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! What are the hoaxes like? Can we find them?
16
Wrongly flagged temporarily flagged
Successful hoax pass patrol survive for a month viewed frequently Failed hoax flagged and deleted during patrol Hoax Legitimate articles never flagged Wrongly flagged temporarily flagged Non-hoax
17
Characteristics of hoaxes
Appearance: Link-network: Support: Editor: how the article looks how the article connects how other articles refer to it how the article creator looks
18
Characteristics of hoaxes
Appearance: Link-network: Support: Editor: how the article looks how the article connects how other articles refer to it how the article creator looks Features: Plain-text length Plain-text-to-markup ratio Wiki-link density Web-link density Hoax articles are longer, but they mostly have plain text and have lesser web and wiki links.
19
Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and how the article connects how other articles refer to it how the article creator looks few references. CC = 0 incoherent article CC > 0 coherent article Legitimate articles are more coherent than successful hoaxes
20
Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and hoaxes have incoherent wikilinks. how other articles refer to it how the article creator looks few references. Features: Number of prior mentions Time since first mention Creator of first mention Hoax mentions are less in number, more recently created, and mostly created by IP addresses or article creator
21
Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and hoaxes have incoherent wikilinks. hoaxes have few, recent, suspicious mentions. how the article creator looks few references. Features: Creator’s age Creator’s experience Hoax creators are more recently registered, and have lesser editing experience.
22
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! Hoaxes are different from non-hoaxes in many respects! Can we find them?
23
Will a hoax get past patrol?
Detection of hoaxes AUC = 98% Editor and Network features Is an article a hoax? AUC = 86% Editor and support features AUC = 71% Appearance features Will a hoax get past patrol? Is an article flagged as hoax really one?
24
We discovered new hoaxes!
Flagged by us, deleted by Wikipedia administrators Steve Moertel Popcorn entrepreneur 6 years 11 months! Maurice Foxell Children’s book author and Knight Commander of Royal Victorian Order 1 year 7 months! More at cs.umd.edu/~srijan/hoax
25
Human Guessing Experiment
What makes a hoax credible? Can readers identify hoaxes? 64 public successful hoaxes Pair with similar legitimate articles 320 random hoax/non-hoax pairs x 10 raters on Mech Turk Results 50% 66% 86% Random Human Classifier If a hoax “looks” like a genuine Wikipedia article, it is assumed to be credible. Accurate detection needs non-appearance features.
26
Conclusion Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! Hoaxes are different from non-hoaxes in many respects! Non-appearance features are important to detect hoaxes!
27
Thanks! srijan@cs.umd.edu cs.umd.edu/~srijan/hoax
hoax articles vs hoax facts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.