Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC.

Slides:



Advertisements
Similar presentations
Dynamic Programming 25-Mar-17.
Advertisements

Cooperative Transmit Power Estimation under Wireless Fading Murtaza Zafer (IBM US), Bongjun Ko (IBM US), Ivan W. Ho (Imperial College, UK) and Chatschik.
Deco Query Processing Hector Garcia-Molina, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Jennifer Widom Stanford and UCSC Scoop The Stanford –
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Deco — Declarative Crowdsourcing
Answering Queries using Humans, Algorithms & Databases Aditya Parameswaran Stanford University (Joint work with Alkis Polyzotis, UC Santa Cruz) 1/11/11.
Face Alignment by Explicit Shape Regression
Design of Experiments Lecture I
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Web Information Retrieval
Structured Design. 2 Design Quality – Simplicity “There are two ways of constructing a software design: One is to make it so simple that there are obviously.
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 7.3 Secure and Resilient Location Discovery in Wireless.
WINE 2011 Manipulating Tournaments WINE 2011 Manipulating Tournaments Manipulating Stochastically Generated Single Elimination Tournaments for Nearly All.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
Evaluating, Combining and Generalizing Recommendations with Prerequisites Aditya Parameswaran Stanford University (with Profs. Hector Garcia-Molina and.
CompSci Searching & Sorting. CompSci Searching & Sorting The Plan  Searching  Sorting  Java Context.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Planning under Uncertainty
Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Sensing, Tracking, and Reasoning with Relations Leonidas Guibas, Feng Xie, and Feng Zhao Xerox PARC and Stanford University.
A Local Facility Location Algorithm Supervisor: Assaf Schuster Denis Krivitski Technion – Israel Institute of Technology.
A New Approach for Task Level Computational Resource Bi-Partitioning Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE, University of California,
1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.
Efficient Algorithms for Matching Pedro Felzenszwalb Trevor Darrell Yann LeCun Alex Berg.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Scott Perryman Jordan Williams.  NP-completeness is a class of unsolved decision problems in Computer Science.  A decision problem is a YES or NO answer.
MIC’2011 1/58 IX Metaheuristics International Conference, July 2011 Restart strategies for GRASP+PR Talk given at the 10 th International Symposium on.
Announcements: Website is now up to date with the list of papers – By 1 st Tuesday midnight, send me: Your list of preferred papers to present By 8 th.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Database Management 9. course. Execution of queries.
Mark Dunlop, Computer and Information Sciences, Strathclyde University 1 Algorithms & Complexity 5 Games Mark D Dunlop.
Data-Centric Human Computation Jennifer Widom Stanford University.
Scientific Writing Abstract Writing. Why ? Most important part of the paper Number of Readers ! Make people read your work. Sell your work. Make your.
Discussion: So Who Won. Announcements Looks like you’re turning in reviews… good! – Some of you are spending too much time on them!! Key points, what.
Image segmentation Prof. Noah Snavely CS1114
Learning and Inferring Transportation Routines By: Lin Liao, Dieter Fox and Henry Kautz Best Paper award AAAI’04.
“Artificial Intelligence” in my research Seung-won Hwang Department of CSE POSTECH.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Online Social Networks and Media
Robust Real Time Face Detection
CSE473 Winter /04/98 State-Space Search Administrative –Next topic: Planning. Reading, Chapter 7, skip 7.3 through 7.5 –Office hours/review after.
Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.
Lagrangean Relaxation
Dynamic Programming.  Decomposes a problem into a series of sub- problems  Builds up correct solutions to larger and larger sub- problems  Examples.
Crowdscreen: Algorithms for Filtering Data using Humans Aditya Parameswaran Stanford University (Joint work with Hector Garcia-Molina, Hyunjung Park, Neoklis.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Error-Correcting Code
FACTS Placement Optimization For Multi-Line Contignecies Josh Wilkerson November 30, 2005.
Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
CMPT 463. What will be covered A* search Local search Game tree Constraint satisfaction problems (CSP)
Lesson Objectives Aims Understand the following “standard algorithms”:
Announcements: By Tuesday midnight, start submitting your class reviews: First paper: Human-powered sorts and joins Start thinking about projects!!
Rank Aggregation.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely
Presentation transcript:

Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC Santa Cruz Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People

2 The Goal Design Fundamental Algorithms for Human Computation Latency Cost Uncertainty Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?

3 The Problems Sort / Max GraphSearch Categorize Filter Crowd- Latency Cost Uncertainty : Difficult! Progress! [VLDB 2011] The focus of this talk. Summaries of the rest

Filters 4 Dataset of Items Predicate 1 Predicate 2 …… Predicate k Is this image that of Bytes Café ? Is the image blurry? Does it show people’s faces? Filtered Dataset  Given: —Error Probability (FP/FN) & Selectivity for each predicate —Desired Overall Error Probability  To: Compose a filtering strategy —Minimize Overall Cost (# of questions) Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?

Single Filter  Surprisingly difficult!  Need to meet an overall error threshold —Say, up to 10% of my images may be wrongly filtered  Minimize overall expected number of questions  Boils down to the following: —Take one item —Ask some questions Results in a certain number of (Y, N) for a given item —Do I stop (if so, what do I return), or do I continue asking? 5 Dataset of Items Predicate 1 Filtered Dataset

Hasn’t this been done before?  Solutions from statistics guarantee the same error per item —Important on contexts like: Automobile testing Diagnosis  We’re worried about aggregate error over all items: a uniquely data-oriented problem —I don’t care if every image is perfect as long as the overall error is met. —As we will see, results in $$$ savings 6

Strategies 7 YES = 5, NO = 6 Return “Passed” YES = 5, NO = 6 Return “Passed” YES Answers NO Answers NO Answers YES = 3, NO = 7 Return “Failed” YES = 3, NO = 7 Return “Failed” YES = 3, NO = 5 Continue YES = 3, NO = 5 Continue Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Start here, with no questions

Common Strategies  Always ask X questions, return most likely answer —The triangle shape  If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking. —Rectangular shape  Ask until |#YES - #NO| > X, or at most Y questions —Chopped off rectangle —Anhai’s work on MOBS 8

Summary of Results  A characterization of which “shapes” are optimal  A optimal PTIME “probabilistic” approach —LP leveraging the inherent DP structure —Optimal: Strategy with minimum overall cost for given parameters and requirements —Probabilistic: Probability of “Pass” “Fail” “Continue” 9

Empirical Results  Evaluation on synthetic scenarios  Tested: —Optimal, Brute Force, Statistical, 5 Heuristic Algorithms  Optimal Probabilistic issues fewer questions overall —15% savings on average compared to brute force 32% savings when optimal wins —22% savings on average compared to the statistics approach 49% savings when optimal wins 10 Translates to $$$ for many items !! Generate Parameters Other Algorithms Brute Force Deterministic Brute Force Deterministic Optimal Probabilistic COST1 >> COST2 COST3 >>

Crowd-Max/Sort  The problem(s): —Find the strategy of sorting n items Given: Probability of error for a comparison Given: Desired threshold on error,#questions,#rounds  Sorting automatically given evidence —NP-Hard even for a simple probability of error model —Related work in the area of voting theory, economics  Which r questions do we ask next? 11 Ask all pairs a total of 2k/n times Tournament, with k repetitions at each level One question in each round Decreasing Parallelism More Accuracy

Crowd-GraphSearch Image Categorization Example 12 vehicle car nissanhondatoyota maximasentra To attach: image of a honda car Is image one of vehicle? YES! Is image one of toyota? NO! Is image one of honda? YES! target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions. target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions.

Crowd-Categorize  k buckets, n items  Categorize every item, overall error < threshold  For k = 1, same as filters problem  Two versions: —Discrete Independent (like in the filters case) Dependent buckets (e.g., colors, GraphSearch) —Continuous (e.g., age) 13 ……. Dataset of Items

14 Questions?