Download presentation
Presentation is loading. Please wait.
Published byJevon Ivens Modified over 9 years ago
1
How Crowdsourcable is Your Task? Carsten Eickhoff Arjen P. de Vries WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011), Hong Kong, China, February 9–12, 2011.
2
2 The Crowdsourcing Boom Crowdsourcing, a Tale of Great Romance A Journey to the Dark Side of Crowdsourcing Is all Lost? Conclusions O Outline
3
3 Billions of judgements are being crowdsourced each year CrowdFlower – Judgement volume doubled (2009-2010) Significant numbers of research publications rely on crowdsourcing to create scientific resources...but is it actually reliable? I The Crowdsourcing Boom
4
4 The Crowdsourcing Boom Crowdsourcing, a Tale of Great Romance A Journey to the Dark Side of Crowdsourcing Is all Lost? Conclusions O Outline
5
5 Summer 2008 How do I quickly get a large number of judgements? Task: Message grouping for discourse understanding Crowdsourcing produced very reliable results I Crowdsourcing – A Tale of Great Romance
6
6 Summer 2008 How do I quickly get a large number of judgements? Task: Message grouping for discourse understanding Crowdsourcing produced very reliable results I Crowdsourcing – A Tale of Great Romance
7
7 Fall 2008 Crowdsourcing has become a standard data source The excitement wears off I Crowdsourcing – A Tale of Great Romance
8
8 A dark and cold day in late autumn 2009 You need judgements for yet another experiment I Crowdsourcing – A Tale of Great Romance
9
9 A dark and cold day in late autumn 2009 You need judgements for yet another experiment You get cheated! I Crowdsourcing – A Tale of Great Romance
10
10 A dark and cold day in late autumn 2009 You need judgements for yet another experiment You get cheated! Again and again... I Crowdsourcing – A Tale of Great Romance
11
11 The Crowdsourcing Boom Crowdsourcing, a Tale of Great Romance A Journey to the Dark Side of Crowdsourcing Is all Lost? Conclusions O Outline
12
12 O A Journey to the Dark Side Task-based overview What is it that malicious workers do? Do we have remedies?
13
13 Task: Closed class questions Possible cheat: uniform answering (all yes/no) Possible cheat: arbitrary answers Remedy: Good gold standard data helps Pitfall: Cheaters who think about the task at hand can cause a lot of trouble (e.g. relevance judgements) I A Journey to the Dark Side
14
14 Task: Open class questions Possible cheat (1): Copy and paste standard text Possible cheat (2): Copy and paste domain-specific text Remedy: (1) is easy to detect. (2) is problematic I A Journey to the Dark Side
15
15 Task: Internal quality control Possible cheat: artificially boost your own confidence Possible cheat: even worse, do so in a network Remedy: We need a better confidence measure than prior acceptance rate Pitfall: Due to the large scale of HITs it is hard to find a reliable confidence measure I A Journey to the Dark Side
16
16 Task: External quality control Setup: redirect workers to your own site and let them do the HITs there Possible cheat: make up confirmation token Possible cheat: re-use genuine token Possible cheat: claim that you did not get a token Remedy: all of the above are easy to detect I A Journey to the Dark Side
17
17 The Crowdsourcing Boom Crowdsourcing, a Tale of Great Romance A Journey to the Dark Side of Crowdsourcing Is all Lost? Conclusions O Outline
18
18 E Is all Lost? Posterior detection and filtering of cheaters works reliably But we waste resources (money, time, nerves..) Can we discourage cheaters from doing our HIT in the first place?
19
19 E Is all Lost? Which HIT types do cheaters like? The Summer 2008 HIT hardly attracted any cheaters The one in Autumn was swamped by them The Summer task required a lot of creativity whereas the Autumn one was a straightforward relevance judgement
20
20 E Is all Lost? Hypothesis: “If the HIT conveys the impression of requiring creativity, cheaters are less likely to take it.” 2 HIT types – Suitability for children – Standard relevance judgements
21
21 E Task/Interface Design
22
22 C Crowd Filtering
23
23 F Conclusion The share of malicious workers can be significantly reduced by making your task: Innovative Creative Non-repetitive Crowd Filtering can help to reduce the share of malicious workers at the cost of higher completion time. Previous acceptance rate is not a robust predictor of worker reliability
24
24 V Thank You!
25
25 V Questions, Remarks, Concerns? c.eickhoff@tudelft.nl
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.