Download presentation
Presentation is loading. Please wait.
1
Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC Santa Cruz Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People
2
2 The Goal Design Fundamental Algorithms for Human Computation Latency Cost Uncertainty Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?
3
3 The Problems Sort / Max GraphSearch Categorize Filter Crowd- Latency Cost Uncertainty : Difficult! Progress! [VLDB 2011] The focus of this talk. Summaries of the rest
4
Filters 4 Dataset of Items Predicate 1 Predicate 2 …… Predicate k Is this image that of Bytes Café ? Is the image blurry? Does it show people’s faces? Filtered Dataset Given: —Error Probability (FP/FN) & Selectivity for each predicate —Desired Overall Error Probability To: Compose a filtering strategy —Minimize Overall Cost (# of questions) Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?
5
Single Filter Surprisingly difficult! Need to meet an overall error threshold —Say, up to 10% of my images may be wrongly filtered Minimize overall expected number of questions Boils down to the following: —Take one item —Ask some questions Results in a certain number of (Y, N) for a given item —Do I stop (if so, what do I return), or do I continue asking? 5 Dataset of Items Predicate 1 Filtered Dataset
6
Hasn’t this been done before? Solutions from statistics guarantee the same error per item —Important on contexts like: Automobile testing Diagnosis We’re worried about aggregate error over all items: a uniquely data-oriented problem —I don’t care if every image is perfect as long as the overall error is met. —As we will see, results in $$$ savings 6
7
Strategies 7 YES = 5, NO = 6 Return “Passed” YES = 5, NO = 6 Return “Passed” YES Answers NO Answers NO Answers YES = 3, NO = 7 Return “Failed” YES = 3, NO = 7 Return “Failed” YES = 3, NO = 5 Continue YES = 3, NO = 5 Continue Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Start here, with no questions
8
Common Strategies Always ask X questions, return most likely answer —The triangle shape If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking. —Rectangular shape Ask until |#YES - #NO| > X, or at most Y questions —Chopped off rectangle —Anhai’s work on MOBS 8
9
Summary of Results A characterization of which “shapes” are optimal A optimal PTIME “probabilistic” approach —LP leveraging the inherent DP structure —Optimal: Strategy with minimum overall cost for given parameters and requirements —Probabilistic: Probability of “Pass” “Fail” “Continue” 9
10
Empirical Results Evaluation on 10000 synthetic scenarios Tested: —Optimal, Brute Force, Statistical, 5 Heuristic Algorithms Optimal Probabilistic issues fewer questions overall —15% savings on average compared to brute force 32% savings when optimal wins —22% savings on average compared to the statistics approach 49% savings when optimal wins 10 Translates to $$$ for many items !! Generate Parameters Other Algorithms Brute Force Deterministic Brute Force Deterministic Optimal Probabilistic COST1 >> COST2 COST3 >>
11
Crowd-Max/Sort The problem(s): —Find the strategy of sorting n items Given: Probability of error for a comparison Given: Desired threshold on error,#questions,#rounds Sorting automatically given evidence —NP-Hard even for a simple probability of error model —Related work in the area of voting theory, economics Which r questions do we ask next? 11 Ask all pairs a total of 2k/n times Tournament, with k repetitions at each level One question in each round Decreasing Parallelism More Accuracy
12
Crowd-GraphSearch Image Categorization Example 12 vehicle car nissanhondatoyota maximasentra To attach: image of a honda car Is image one of vehicle? YES! Is image one of toyota? NO! Is image one of honda? YES! target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions. target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions.
13
Crowd-Categorize k buckets, n items Categorize every item, overall error < threshold For k = 1, same as filters problem Two versions: —Discrete Independent (like in the filters case) Dependent buckets (e.g., colors, GraphSearch) —Continuous (e.g., age) 13 ……. Dataset of Items
14
14 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.