Crowdsourcing research data UMBC ebiquity, 2010-03-01.

Slides:



Advertisements
Similar presentations
Social Media –General Overview. Social media is..
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
CrowdER - Crowdsourcing Entity Resolution
Creating Collaborative Partnerships
Applying Crowd Sourcing and Workflow in Social Conflict Detection By: Reshmi De, Bhargabi Chakrabarti 28/03/13.
Human- Computer Interfaces HUMAN COMPUTATION.  Humans helping solve large problems  Using humans WITH computers to solve problems not solvable be either.
Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE.
Running Experiments on Mechanical Turk: Day 1 Tim Brady What is Turk? Who are Turkers? How much do I pay? Are Turk workers good subjects? Setting up a.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Amazon Mechanical Turk (Mturk) What is MTurk? – Crowdsourcing Internet marketplace that utilizes human intelligence to perform tasks that computers are.
Human Computation Steven Emory CS 575. Overview What is Human Computation? History of Human Computation Examples of Human Computation Bad Example Good.
CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin.
CAPTCHA Prabhakar Verma “08MC30”.
Human Computation CSC4170 Web Intelligence and Social Computing Tutorial 7 Tutor: Tom Chao Zhou
10th Workshop "Software Engineering Education and Reverse Engineering" Ivanjica, Serbia, 5-12 September 2010 First experience in teaching HCI course Dusanka.
Human Computation Steven Emory CS 575 Human Issues in Computing.
Introduction to machine learning
Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin.
Adobe Forms THE FORM ELEMENT PANEL. Creating a form using the Adobe FormsCentral is a quick and easy way to distribute a variety of forms including surveys.
Towards Boosting Video Popularity via Tag Selection Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia -
Students: Ilya Paskhover, Itay Gal Supervisors: Oleg Rokhlenko, Nadav Golbandi.
Human Computation and Crowdsourcing Uichin Lee May 8, 2011.
Lights, Camera, Caption! Presented by Kaela Parks.
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
DATA-CENTERED CROWDSOURCING WORKSHOP PROF. TOVA MILO SLAVA NOVGORODOV TEL AVIV UNIVERSITY 2014/2015.
Data and text mining workshop The role of crowdsourcing Anna Noel-Storr Wellcome Trust, London, Friday 6 th March 2015.
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
Crowdsourcing Translation, Proofreading, and Other Tasks INSANELY GREAT BRiTAiN LTD, PAUL ST, LONDON, UNITED KINGDOM Visit
Exploration Seminar 3 Human Computation Roy McElmurry.
Problem description and pipeline
1 TURKOISE: a Mechanical Turk-based Tailor-made Metric for Spoken Language Translation Systems in the Medical Domain Workshop on Automatic and Manual Metrics.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
Crowdsourcing: Ethics, Collaboration, Creativity KSE 801 Uichin Lee.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Evaluation of Spam Detection and Prevention Frameworks for and Image Spam - A State of Art Pedram Hayati, Vidyasagar Potdar Digital Ecosystems and.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Crowdsourcing Nathan McFarland Isaac Nichols February 26 th, 2013.
By Blake Stratton. Data Chapter The questionnaire is Printed on paper. People write or tick the boxes. Someone needs to type it in the computer. Some.
4 Free Tools to monitor your Social Media online reputation.
Classsourcing: Crowd-Based Validation of Question-Answer Learning Objects Jakub Šimko, Marián Šimko, Mária Bieliková, Jakub Ševcech, Roman Burger
Leveraging Web 2.0 for Prelicensure Education A Presentation for the 2009 CNIA Conference Christine A. Hudak, Ph.D., RN-BC, CPHIMS Case Western Reserve.
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.
Group Activity I: Spot the Fake ESSIR 2015 Thessaloniki, Sep 3-4, 2015.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
David Ackerman, Associate VP Crystal Butler, Research Associate.
Creating User Interfaces Ideas & Trends Homework: Post constructive comments. Work on project.
Wikispam, Wikispam, Wikispam PmWiki Patrick R. Michaud, Ph.D. March 4, 2005.
The Rise of Crowdsourcing in Management Research Organized by: Yuqing (Ching) Ren Natalia Levina August 9, 2010 Academy of Management Annual Meeting.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
Crowdsourcing Blog Track Top News Judgments at TREC Richard McCreadie, Craig Macdonald, Iadh Ounis {richardm, craigm, 1.
Artificial Intelligence, simulation and modelling.
Peter Matthews, Cliff C. Zou University of Central Florida AsiaCCS 2010.
Chapter 8 The Social Enterprise: From Recruiting to Problem Solving and Collaboration.
Crowd-based mining of reusable process model patterns Carlos Rodríguez, Florian Daniel, Fabio Casati BPM 2014, September 9th 2014, Eindhoven, The Netherlands.
Curing Cancer with Alfresco and the American Society for Clinical Pathologists Ron Swan CTO Ray Wijangco Alfresco Practice Manager.
Topics In Social Computing (67810) Module 3 – Designing Social Systems & Mechanisms Crowdsourcing.
Crowdsourcing: How to Benefit from (Too) Many Great Ideas (Blohm et al., 2013) Olga Jemeljanova Joona Kanerva Niko Kuki Mikko Nummela Group
Research Logo Funding Engineering SEO/SEM Lead-Gen Sales Customer Support R&D CSR.
Designing a Human-Machine Hybrid Computing System for Unstructured Data Analytics KOUSHIK SINHA, GEETHA MAJUNATH, BIDYUT GUPTA, SHAHRAM RAHIMI.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Conducting Behavioral Research on Amazon‘s Mechanical Turk
Data-Centered Crowdsourcing Workshop
Tim Sheerman-Chase, Eng-Jon Ong and Richard Bowden
Topics in Linguistics ENG 331
Lecture 21: Machine Learning Overview AP Computer Science Principles
Mechanical Turk Lessons: CardLasso
Lecture 9: Machine Learning Overview AP Computer Science Principles
Presentation transcript:

Crowdsourcing research data UMBC ebiquity,

Overview How did we get into this Crowdsourcing defined Amazon Mechanical Turk CloudFlower Two examples – Annotating tweets for named entities – Evaluating word clouds Conclusions

Motivation Needed to train a named entity recognizer for Twitter statuses – Need human judgments on 1000s of tweets to identify NERs of type PER, ORG or LOC anand drove to boston to see the red sox play PER LOC ORG ORG NAACL 2010 Workshop: Creating Speech and Language Data With Amazon’s Mechanical Turk Shared task papers: what can you do with $100

Crowdsourcing Crowdsourcing = Crowd + Outsourcing Tasks normally performed by employees outsourced via an open call to a large community Some examples – Netflix prize – InnoCentive: solve R&D challenges – DARPA Network Challenge

Web Crowdsourcing Ideal fit for the Web Lots of custom examples – ESP Game, now Google Image Labeler – reCAPTCHA – Galaxy Zoo: armature astronomers classify galaxy images General crowd sourcing services – Amazon Mechanical Turk – CrowdFLower

Amazon Mechanical Turk Amazon service since 2005 Some tasks can’t be done well by computers and some require human judgements Amazon’s name: Human Intelligence Task (HIT) Requesters define tasks & upload data, workers (aka turkers) do tasks, get paid HITs generally low value, e.g., $0.02 each or $4-$5/hour, Amazon takes 10% Examples of HITs Add Keywords to images Crop Images Spam Identification – Generating a test set to train NN Subtitling, speech-to-text Adult content analysis Facial Recognition Proof Reading OCR Correction/Verification Annotate text

Original Mechanical Turk The Turk: 1 st chess playing automaton hoax Constructed 1770, toured US/EU for over 80 years Played Napoleon Bonaparte and Benjamin Franklin

Mturk quality control How do you ensure the work delivered by the Turk is of good quality? Define qualifications, give pre-test, mix in tasks with known answers Requesters can reject answers – Manually – Automatically? - when there are multiple assignments, won’t get paid unless 2 other people give the same result – No ‘recourse’ for Turkers but ratings of requesters

AMT Demo annotating NEs

CrowdFlower Commercial effort by Dolores Labs Sits on top of AMT Real time results Choose multiple worker channel like AMT, Samasource Quality control measures

What motivates Mechanical Turkers? Adopted from Dolores Lab Blog

CrowdFlower Markup Language (CML) Interactive form builder CML tags for radio, checkboxes, multiline text etc.

Analytics

Per worker stats

Gold Standards Ensure quality, prevents scammers from giving bad results Interface to monitor Gold stats If a worker makes mistake on a known result, he will be notified and shown his mistake. The error rates without the gold standard is more than twice as high as when we do use a gold standard. Helps in 2 ways - improves worker accuracy - allows CrowdFlower to determine who is giving accurate answers Adopted from

Conclusion Ask us after Spring break how it went You might find AMT useful to collect annotations or judgments for your research $25-$50 can go a long way

AMT Demo Better Word Cloud?