Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and.

Task and Workflow Design I KSE 801 Uichin Lee

TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and Max Goldman (MIT CSAIL) UIST 2010

Workflow in M-Turk HIT Data Collected in CSV File Requester posts HIT Groups to Mechanical Turk Data Exported for Use

Workflow: Pros & Cons Easy to run simple, parallelized tasks. Not so easy to run tasks in which turkers improve on or validate each others’ work. TurKit to the rescue!

The TurKit Toolkit Arrows indicate the flow of information. Programmer writes 2 sets of source code: – HTML files for web servers – JavaScript executed by TurKit Output is retrieved via a JavaScript database. Turkers Mechanical Turk Web ServerTurKit *.html*.js Programmer JavaScript Database

Crash-and-rerun programming model Observation: local computation is cheap, but the external class cost money Managing states over a long running program is challenging – Examples: Computer restarts? Errors? Solution: store states in the database (in case) If an error happens, just crash the program and re-run by following the history in DB – Throw a “crash” exception; the script is automatically re-run. New keyword “once”: – Remove non-determinism – Don’t need to re-execute an expensive operation (when re-run) But why should we re-run???

Example: quicksort

Parallelism First time the script runs, HITs A and C will be created For a given forked branch, if a task fails (e.g., HIT A), TurKit crashes the forked branch (and re-run) Synchronization w/ join()

MTurk Functions Prompt(message, # of people) – mturk.prompt("What is your favorite color?", 100) Voting(message, options) Sort(message, items) VOTE()SORT()

TurKit: Implementation TurKit: Java using Rhino to interpret JavaScript code, and E4X2 to handle XML results from MTurk IDE: Google App Engine3 (GAE) Online IDE

Exploring Iterative and Parallel Human Computation Processes Greg Little, Lydia B. Chilton Max Goldman, Robert C. Miller HCOMP 2010

HC Task Model Dimension: – Dependent (iterative) or independent (parallel) tasks – Creation and decision tasks Task model examples Creation tasks (creating new content): e.g., writing ideas, imagery solutions, etc. Decision tasks (voting/rating): e.g., rating the quality of a description of an image

HC Task Model Combining tasks: iterative and parallel tasks Iterative pattern: a sequence of creation tasks where the result of each task feeds into the next one, followed by a comparison task Parallel pattern: a set of creation tasks executed in parallel, followed by a task of choosing the best

Experiment: Writing Image Description Iterative vs. parallel; each 6 creation tasks ($0.02), followed by rating tasks (1-10 scale, $0.01)

Experiment: Writing Image Description Turkers in iterative condition gave better description while parallel condition always shows an empty text area.

Experiment: Writing Image Description Average rating after n iterations – After six iterations: 7.9 vs. 7.4, t-test T 29 =2.1, p=0.04 iterative parallel

Experiment: Writing Image Description Length vs. rating: positive correlation The two outliers (circled) represent instances of text copied from the Internet (with superficial description) Length (characters) Rating

Experiment: Writing Image Description Work Quality: – 31% mainly append content at the end, and make only minor modifications (if any) to existing content; – 27% modify/expand existing content, but it is evident that they use the provided description as a basis; – 17% seem to ignore the provided description entirely and start over; – 13% mostly trim or remove content; – 11% make very small changes (adding a word, fixing a misspelling, etc); – 1% copy-paste superficially related content found on the internet. Creating vs. improving (takes about the same time, avg. 211 seconds)

Experiment: Brainstorming

Iterative work: higher average rating – Biased thinking: e.g., tech -> xxtech -> yytech Parallel work: diversity, higher deviation (rating) – No iteration for brainstorming IterationRating Avg. Rating iterative parallel

Example: Blurry Text Recognition

Iterative performs better than parallel Iteration Accuracy

Summary TurKit: a flexible programming tool for m-turk Various work-flow can be designed; e.g., iterative, parallel, and hybrid Iterative performs better than parallel in several cases (e.g., image description, brainstorming, text recognition)

Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and.

Similar presentations

Presentation on theme: "Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and.

Similar presentations

Presentation on theme: "Task and Workflow Design I KSE 801 Uichin Lee. TurKit: Human Computation Algorithms on Mechanical Turk Greg Little, Lydia B. Chilton, Rob Miller, and."— Presentation transcript:

Similar presentations

About project

Feedback