Introduction to Labor Marketplaces: Taskcn Uichin Lee KAIST KSE KSE801: Human Computation and Crowdsourcing.

Slides:



Advertisements
Similar presentations
Quality control tools
Advertisements

LASTor: A Low-Latency AS-Aware Tor Client
Analysis and Modeling of Social Networks Foudalis Ilias.
Comparison of Social Networks by Likhitha Ravi. Outline What is a social network? Elements of social network Previous studies What is missing in previous.
School of Information University of Michigan Expertise networks in online communities: structure and algorithms Jun Zhang, Mark Ackerman, Lada Adamic School.
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
Random Sampling and Data Description
Introduction to Mechanized Labor Marketplaces: Mechanical Turk Uichin Lee KAIST KSE.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Power Laws: Rich-Get-Richer Phenomena
Modeling Process Quality
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
QUANTITATIVE DATA ANALYSIS
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Lesson Fourteen Interpreting Scores. Contents Five Questions about Test Scores 1. The general pattern of the set of scores  How do scores run or what.
Welcome to Turnitin.com’s Peer Review! This tour will take you through the basics of Turnitin.com’s Peer Review. The goal of this tour is to give you.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Oozing out knowledge in human brains to the Internet Lada Adamic School of Information University of Michigan
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Network Science and the Web: A Case Study Networked Life CIS 112 Spring 2009 Prof. Michael Kearns.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
Approval System (Workflow) Tender Information System Bid Selection Tool Pre-Qualification Portal Vendor Mgmt System Tendering Software Risk Mgmt System.
Computers Are Your Future Eleventh Edition Chapter 10: Careers & Certification Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Uichin Lee, Jihyoung Kim *, Eunhee Yi **, Juyup Sung, Mario Gerla * KAIST Knowledge Service Engineering * UCLA Computer Science ** LG UX R&D Lab
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
BY ANITHA.C ASHA V DEEPTHI.J SHALINI. OBJECTIVE: The main objective of our project is collection, classification, analysis and interpretation of data.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
SCSC 311 Information Systems: hardware and software.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Math Across the Curriculum: Statistics and Probability Paraprofessional Training August 24 th – August 28th.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Investigating the Relationship between Scores
University of Sunderland CSEM03 R.E.P.L.I. Unit 1 CSEM03 REPLI Research and the use of statistical tools.
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Lecture 13: Network centrality Slides are modified from Lada Adamic.
Research Ethics:. Ethics in psychological research: History of Ethics and Research – WWII, Nuremberg, UN, Human and Animal rights Today - Tri-Council.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
11/02/09 Chapter 7-Project Planning 1 Elements of Project Planning  Divide project into tasks, tasks into subtasks, subtasks into...  Estimate duration.
Slides are modified from Lada Adamic
Chapter Eight: Using Statistics to Answer Questions.
Network Community Behavior to Infer Human Activities.
L56 – Discrete Random Variables, Distributions & Expected Values
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Different Types of Data
Comparison of Social Networks by Likhitha Ravi
Crowdsourcing with all-pay auctions: A field experiment on Taskcn
Introduction to Summary Statistics
STATS DAY First a few review questions.
Department of Computer Science University of York
Route Inspection Which of these can be drawn without taking your pencil off the paper and without going over the same line twice? If we introduce a vertex.
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Introduction to Labor Marketplaces: Taskcn Uichin Lee KAIST KSE KSE801: Human Computation and Crowdsourcing

Category Rewards: >=1000CYN Rewards: CYN

Task Categories Design Logo / VI Design Graphic design Software interface design Design of Buildings Brochure Three-dimensional modeling Product identification / packaging Web Site Web Design / Production Site planning Web application development Flash animation Construction site as a whole Search Engine Optimization Writing Named / slogan Technical / Application Writing Event Planning Business plans / tenders Literature Writing / Creative Translation and other writing Programming Applications Scripts / Tools Database Development Mobile / embedded development System Management MultimediaSystem Management PPT presentation / courseware Video capture / editing Photography / photo post Audio / audio processing Multimedia data collection

Signed up / Submitted Remaining Time RewardTask Classification

20% commission (regular user) vs. 18% commission (gold user) Winner takes all (single bid) 1001 viewed; 38 signed up; 27 submitted

Below: viewing submissions (all 27 submissions) Only “gold users” can hide their submissions

Taskcn Papers Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn, Jiang Yang, Lada A. Adamic, Mark S. Ackerman, EC 2008 Competing to Share Expertise: the Taskcn Knowledge Sharing Community, Jiang Yang, Lada A. Adamic, Mark S. Ackerman, ICWSM, 2008 The Networks of Markets: Online Services for Workers and Job Providers (Taskcn), MSR-TR 2010

Skill and Workload vs. Reward Human-coded variables: skill and workload – Skill: minimum skill required to complete a task – Workload: average time to finish a task – Ordinal scale is used for rating Two raters evaluated 157 randomly selected tasks in the design category – Raters do not know the reward of a task RewardMinimum skill Workload Spearman’s rank correlation coefficient

Taskcn: Stats Data set: 4 year data set: 17,000 tasks + 1.7m submissions RMB/CNY(Chinese Yuan): 5CYN ~ $1

Open Tasks over Time (Weekly Bins)

Offered Reward per Task (dev: programming) (CNY)

Worker Characterization 3 groups (based on how many submissions were made) – At least 10 attempts (submissions), at least 50 attempts, and All Average revenue per submission (CNY) 100 CNY

Worker Characterization # of submissions per task follows power law distribution Number of submissions per task (CCDF)

Worker Characterization Joint distribution of workers’ submissions across different categories

Market Segmentation Individual worker behavior: – A typical worker tends to focus submissions to a specific range of rewards. – Specifically, a typical worker tends to submit most frequently solutions to tasks from a narrow range of rewards and attempts higher-reward tasks with diminishing frequency with the value of the reward. Collective worker behavior: – When workers are viewed as a community, however, higher rewards tend to attract larger number of submissions

Individual Behavior Histogram of submissions by top 10 workers (left fig) Experienced workers have a narrow reward range Reward (CYN) Fraction of submissions unique mode (= occurs most frequently) 1000, 500, 300, 200, ….

Collective Behavior Number of submissions per task (across all workers) increases as the associated reward increases – Due to large number of workers who only made few attempts and never came back to Taskcn (heavy tail: # submissions per worker)

Winning as Incentive to Continue High number of registered users never attempted any task (89%) – June 2006 – May 2007 (EC 2008) – 66,182 registered users Appears that people want to avoid the futility of their efforts (like lotteries?) Winning experience is an important incentive: – First attempt: 2307 won vs. 169,456 others failed – The winner group has attempted more trials than the loser group – Cox proportional hazard analysis: 19% lower probability of stopping after each subsequent attempt

User’s Prestige Network Community expertise network (CEN): people’s expertise can be measured by structural prestige

Centrality Metrics Calculate “centrality” metrics of a worker – Degree centrality: sum of weights of out-edges of worker u – Eigenvector centrality: steady-state visit probability of user u when a random walker traverses the normalized graph – Closeness centrality: the inverse of the average length of a shortest path that originates from worker u and terminates at worker v (for every worker v in the graph) – Betweenness centrality: the sum of the fraction of shortest paths between every pair of workers that pass through worker u

User’s Prestige Network Same two users compete twice: – same winner 77% of the time (compared to 1/2 chance) Same two users compete 3x: – same winner all 3 times in 56% of the cases (compared to 1/4 chance) Node size: proportional to Eigenvector centrality (PageRank) Blue : a user who has won at least once 28 winners out of total 800 users

User’s Prestige Network Indegree/outdegree distribution (design) win lose

Task’s Prestige Network If winners of other tasks lose in this task, this task is more prestigious... User A won task X, but lost task Y Task Y is more prestigious than task X (directed edge from X to Y)

Motif Profiles Motif analysis provides a finer grained, local view into the networks of users/tasks Below table shows the frequencies of dyadic and triadic motifs

Average Expertise of All Users of a Task For a given task, test association between centrality of the task (Task’s Prestige Net) vs. avg. indegree and PageRank of the users who submitted to the task (User’s Prestige Net) – Task outdegree (lost) vs. PageRank (low)

Importance of Experience The reward of selected tasks by typical workers exhibits a diminishing increase with the number of submissions (of course) The expected revenue per submission by typical workers tends to increase until it settles around a constant with the number of submissions.

Importance of Experience Users learn to choose tasks with less competitive tasks Skilled users survive and continue to participate the work (?)

Importance of Experience Winning probability increases as # submissions increases (also avg. rewards increases, yet it tapers off); thus avg. revenue also increases and it tapers off Estimated Prob. of Winning Average Rewards (CNY) Average Revenue (CNY)

Summary Amount of reward does not correlate with – # submissions – Expertise level Yet, it does correlate with the number of views Can infer expertise from expertise networks Successful users – Choose less popular tasks – Focus on specific reward range (best suited for one’s expertise?) – Increase revenue with # of attempts (but it tapers off)