Spam? No, thanks! Panos Ipeirotis – New York University ProPublica, Apr 1 st 2010 (Disclaimer: No jokes included)

Slides:

Advertisements

Similar presentations

Panos Ipeirotis Stern School of Business

Advertisements

Quality Management on Amazon Mechanical Turk Panos Ipeirotis Foster Provost Jing Wang New York University.

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers New York University Stern School Victor Sheng Foster Provost Panos.

Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Big Data Stupid Decisions The Importance Of Measuring What We Should Be Measuring Stern School of Business, New York University “A Computer.

Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis Stern School of Business New York University Joint work with: Jing.

Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University Title Page.

Introduction to Sampling (Dr. Monticino). Assignment Sheet  Read Chapter 19 carefully  Quiz # 10 over Chapter 19  Assignment # 12 (Due Monday April.

M. George Physics Dept. Southwestern College

Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE.

Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.

Amazon Mechanical Turk (Mturk) What is MTurk? – Crowdsourcing Internet marketplace that utilizes human intelligence to perform tasks that computers are.

Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.

Evaluating Search Engine

1. Estimation ESTIMATION.

Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.

Ensemble Learning: An Introduction

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Crowdsourcing Quality Management and other stories Panos Ipeirotis New York University & Tagasauris.

Standard Error of the Mean

Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.

Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University Joint work with Jing Wang, Foster Provost,

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,

Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.

1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Chapter 8 Hypothesis Testing : An Introduction.

From Last week.

Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University Joint work with Jing Wang, Foster Provost,

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

Get Another Label? Improving Data Quality and Machine Learning Using Multiple, Noisy Labelers Panos Ipeirotis New York University Joint work with Jing.

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Panos Ipeirotis Stern School of Business New York University Joint.

Data Structures & Algorithms and The Internet: A different way of thinking.

A Technical Approach to Minimizing Spam Mallory J. Paine.

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Joint work with Foster Provost & Panos Ipeirotis New York University.

Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.

Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.

Everyday is a new beginning in life. Every moment is a time for self vigilance.

Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.

© 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk New York City Meet Up September 1, 2009 WELCOME!

Water Test Take 1 cup from each sleeve –See numbers on bottom of cup –Numbers should be a # < 100 and that number –For small # (

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4b) Department of Electrical.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

On Thursday, I’ll provide information about the project Due on Friday after last class. Proposal will be due two weeks from today (April 15 th ) You’re.

LOGO 1 Corroborate and Learn Facts from the Web Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Shubin Zhao, Jonathan Betz (KDD '07 )

Crowdsourcing using Mechanical Turk Quality Management and Scalability Panos Ipeirotis – New York University.

1 Common Mistakes in Performance Evaluation (1) 1.No Goals  Goals  Techniques, Metrics, Workload 2.Biased Goals  (Ex) To show that OUR system is better.

STAT 3120 Statistical Methods I Statistical Power.

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Tracking Malicious Regions of the IP Address Space Dynamically.

AP Statistics Chapter 21 Notes

Wikispam, Wikispam, Wikispam PmWiki Patrick R. Michaud, Ph.D. March 4, 2005.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Algorithms an Introduction. History This course was first taught in the late 1960s The main principals that maintained the area –Find algorithms that.

The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.

Science, Measurement, Uncertainty and Error1 Science, Measurements, Uncertainty and Error.

Adventures in Crowdsourcing Panos Ipeirotis Stern School of Business New York University Thanks to: Jing Wang, Marios Kokkodis, Foster Provost, Josh Attenberg,

Multiplication Timed Tests.

Comparing Bayesian and Frequentist Inference for Decision-Making

Replicability and Reproducibility in Crowdsourcing & Social Media

Presentation transcript:

Spam? No, thanks! Panos Ipeirotis – New York University ProPublica, Apr 1 st 2010 (Disclaimer: No jokes included)

“A Computer Scientist in a Business School” “A Computer Scientist in a Business School” Panos Ipeirotis - Introduction  New York University, Stern School of Business

Example: Build an Adult Web Site Classifier  Need a large number of hand-labeled sites  Get people to look at sites and classify them as: G (general), PG (parental guidance), R (restricted), X (porn) Cost/Speed Statistics  Undergrad intern: 200 websites/hr, cost: $15/hr  MTurk: 2500 websites/hr, cost: $12/hr Cost/Speed Statistics  Undergrad intern: 200 websites/hr, cost: $15/hr  MTurk: 2500 websites/hr, cost: $12/hr

Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience)

Improve Data Quality through Repeated Labeling  Get multiple, redundant labels using multiple workers  Pick the correct label based on majority vote  Probability of correctness increases with number of workers  Probability of correctness increases with quality of workers 1 worker 70% correct 1 worker 70% correct 11 workers 93% correct 11 workers 93% correct

11-vote Statistics  MTurk: 227 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr 11-vote Statistics  MTurk: 227 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr Single Vote Statistics  MTurk: 2500 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr Single Vote Statistics  MTurk: 2500 websites/hr, cost: $12/hr  Undergrad: 200 websites/hr, cost: $15/hr But Majority Voting is Expensive

Using redundant votes, we can infer worker quality  Look at our spammer friend ATAMRO447HWJQ together with other 9 workers Our “friend” ATAMRO447HWJQ mainly marked sites as G. Obviously a spammer…  We can compute error rates for each worker Error rates for ATAMRO447HWJQ  P[X → X]=9.847%P[X → G]=90.153%  P[G → X]=0.053%P[G → G]=99.947%

Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2%  P[X → X]=9.847%P[X → G]=90.153%  P[G → X]=0.053%P[G → G]=99.947% Action: REJECT and BLOCK Results:  Over time you block all spammers  Spammers learn to avoid your HITS  You can decrease redundancy, as quality of workers is higher

After rejecting spammers, quality goes up  Spam keeps quality down  Without spam, workers are of higher quality  Need less redundancy for same quality  Same quality of results for lower cost With spam 1 worker 70% correct With spam 1 worker 70% correct With spam 11 workers 93% correct With spam 11 workers 93% correct Without spam 1 worker 80% correct Without spam 1 worker 80% correct Without spam 5 workers 94% correct Without spam 5 workers 94% correct

Correcting biases  Classifying sites as G, PG, R, X  Sometimes workers are careful but biased  Classifies G → P and P → R  Average error rate for ATLJIK76YH1TF: 45.0% Is ATLJIK76YH1TF a spammer? Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0%

Correcting biases  For ATLJIK76YH1TF, we simply need to compute the “non-recoverable” error-rate (technical details omitted)  Non-recoverable error-rate for ATLJIK76YH1TF: 9% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0% Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0%P[G → P]=80.0%P[G → R]=0.0%P[G → X]=0.0% P[P → G]=0.0%P[P → P]=0.0%P[P → R]=100.0%P[P → X]=0.0% P[R → G]=0.0%P[R → P]=0.0%P[R → R]=100.0%P[R → X]=0.0% P[X → G]=0.0%P[X → P]=0.0%P[X → R]=0.0%P[X → X]=100.0%

Too much theory? Open source implementation available at:  Input: –Labels from Mechanical Turk –Cost of incorrect labelings (e.g., X  G costlier than G  X)  Output: –Corrected labels –Worker error rates –Ranking of workers according to their quality  Alpha version, more improvements to come!  Suggestions and collaborations welcomed!

Thank you! Questions? “A Computer Scientist in a Business School” “A Computer Scientist in a Business School”