Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disinformation on the Web:

Similar presentations


Presentation on theme: "Disinformation on the Web:"— Presentation transcript:

1 Disinformation on the Web:
Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ.

2 Web: Source of information

3 Web: Source of false information

4 Types of false information
Misinformation honest mistake Disinformation deliberate lie to mislead Hoax “deliberately fabricated falsehood made to masquerade as truth” Wikipedia

5 The free encyclopedia that anyone can edit
Why Wikipedia? Freely accessible Large reach Major source of information for many Easy to add (false) information The free encyclopedia that anyone can edit

6 Hoaxes on Wikipedia

7 What are the hoaxes like?
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?

8 Data: Wikipedia Hoaxes
Hoax article vs hoax facts

9 Data: Wikipedia Hoaxes
Hoax article vs hoax facts 21,218 hoax articles Hoax lifecycle:

10 What are the hoaxes like?
Today’s talk Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?

11 Impact of hoaxes “The worst hoaxes are those which
(a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.” Jimmy Wales on Quora

12 “The worst hoaxes are those which (a) last for a long time”
Impact of hoaxes “The worst hoaxes are those which (a) last for a long time” 0.99

13 “The worst hoaxes are those which (b) receive significant traffic”
Impact of hoaxes “The worst hoaxes are those which (b) receive significant traffic” 10 100 500

14 Impact of hoaxes 1.08 active inlinks “The worst hoaxes are those which
(c) are relied upon by credible news media” 1.08 active inlinks

15 Today’s talk Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! What are the hoaxes like? Can we find them?

16 Wrongly flagged temporarily flagged
Successful hoax pass patrol survive for a month viewed frequently Failed hoax flagged and deleted during patrol Hoax Legitimate articles never flagged Wrongly flagged temporarily flagged Non-hoax

17 Characteristics of hoaxes
Appearance: Link-network: Support: Editor: how the article looks how the article connects how other articles refer to it how the article creator looks

18 Characteristics of hoaxes
Appearance: Link-network: Support: Editor: how the article looks how the article connects how other articles refer to it how the article creator looks Features: Plain-text length Plain-text-to-markup ratio Wiki-link density Web-link density Hoax articles are longer, but they mostly have plain text and have lesser web and wiki links.

19 Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and how the article connects how other articles refer to it how the article creator looks few references. CC = 0 incoherent article CC > 0 coherent article Legitimate articles are more coherent than successful hoaxes

20 Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and hoaxes have incoherent wikilinks. how other articles refer to it how the article creator looks few references. Features: Number of prior mentions Time since first mention Creator of first mention Hoax mentions are less in number, more recently created, and mostly created by IP addresses or article creator

21 Characteristics of hoaxes
Appearance: Link-network: Support: Editor: hoaxes mostly have text and hoaxes have incoherent wikilinks. hoaxes have few, recent, suspicious mentions. how the article creator looks few references. Features: Creator’s age Creator’s experience Hoax creators are more recently registered, and have lesser editing experience.

22 Today’s talk Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! Hoaxes are different from non-hoaxes in many respects! Can we find them?

23 Will a hoax get past patrol?
Detection of hoaxes AUC = 98% Editor and Network features Is an article a hoax? AUC = 86% Editor and support features AUC = 71% Appearance features Will a hoax get past patrol? Is an article flagged as hoax really one?

24 We discovered new hoaxes!
Flagged by us, deleted by Wikipedia administrators Steve Moertel Popcorn entrepreneur 6 years 11 months! Maurice Foxell Children’s book author and Knight Commander of Royal Victorian Order 1 year 7 months! More at cs.umd.edu/~srijan/hoax

25 Human Guessing Experiment
What makes a hoax credible? Can readers identify hoaxes? 64 public successful hoaxes Pair with similar legitimate articles 320 random hoax/non-hoax pairs x 10 raters on Mech Turk Results 50% 66% 86% Random Human Classifier If a hoax “looks” like a genuine Wikipedia article, it is assumed to be credible. Accurate detection needs non-appearance features.

26 Conclusion Impact of hoaxes Characteristics of hoaxes Detection
Most hoaxes are caught soon, but some hoaxes are impactful! Hoaxes are different from non-hoaxes in many respects! Non-appearance features are important to detect hoaxes!

27 Thanks! srijan@cs.umd.edu cs.umd.edu/~srijan/hoax
hoax articles vs hoax facts


Download ppt "Disinformation on the Web:"

Similar presentations


Ads by Google