CrowdDB : Answering queries with Crowdsourcing

Slides:



Advertisements
Similar presentations
What is a Database By: Cristian Dubon.
Advertisements

Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
Students: Ilya Paskhover, Itay Gal Supervisors: Oleg Rokhlenko, Nadav Golbandi.
ارائه دهندگان: مجتبی بلبلی،احمد رحمانی،مجتبی صادقی استاد درس: دکتر شیخ اسماعیلی درس : پایگاه داده پیشرفته بهار91 دانشگاه آزاد اسلامی واحد علوم و تحقیقات.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
Introduction to SQL Steve Perry
Introduction to Microsoft Access 2003 Mr. A. Craig Dixon CIS 100: Introduction to Computers Spring 2006.
Data-Centric Human Computation Jennifer Widom Stanford University.
Relationships and Advanced Query Concepts Using Multiple Tables Please use speaker notes for additional information!
1 Intro to JOINs SQL INNER JOIN SQL OUTER JOIN SQL FULL JOIN SQL CROSS JOIN Intro to VIEWs Simple VIEWs Considerations about VIEWs VIEWs as filters ALTER.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Chapter 18 Object Database Management Systems. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Motivation for object.
JOI/1 Data Manipulation - Joins Objectives –To learn how to join several tables together to produce output Contents –Extending a Select to retrieve data.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Prepared by The Smartpath Information Systems
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
DATA-CENTERED CROWDSOURCING WORKSHOP PROF. TOVA MILO SLAVA NOVGORODOV TEL AVIV UNIVERSITY 2015/2016.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Database Design, Application Development, and Administration, 6 th Edition Copyright © 2015 by Michael V. Mannino. All rights reserved. Chapter 5 Understanding.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
N5 Databases Notes Information Systems Design & Development: Structures and links.
Don't Know Jack About Object-Relational Mapping?
INTRODUCTION TO DATABASES (MICROSOFT ACCESS)
CrowdDb.
Logical Database Design and the Rational Model
The STEM Academy Data Solution
Recruiter 2.0 Overview May 1, 2012.
The Client-Server Model
Practical Office 2007 Chapter 10
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
 2012 Pearson Education, Inc. All rights reserved.
Information Systems Today: Managing in the Digital World
So, what was this course about?
Fundamentals of Information Systems, Sixth Edition
A paper on Join Synopses for Approximate Query Answering
Applied CyberInfrastructure Concepts Fall 2017
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
CrowdDb.
Lecture 2 The Relational Model
Databases.
RELATIONAL DATABASE MODEL
COS 346 Day 8.
Query Execution Presented by Khadke, Suvarna CS 257
Deco: Declarative Crowdsourcing
ORACLE SQL Developer & SQLPLUS Statements
ERD’s REVIEW DBS201.
February 7th – Exam Review
File Systems and Databases
Database Vs. Data Warehouse
Chapter 9 Database and Information Management.
Teaching slides Chapter 8.
CS 440 Database Management Systems
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
A – Pre Join Indexes.
DATABASE Purpose of database
Database Systems: Design, Implementation, and Management
eSeries Entities By Julie Ladner
Presentation transcript:

CrowdDB : Answering queries with Crowdsourcing Michael Franklin et al., SIGMOD’11 Presentation by Parijat Mazumdar

CrowdDB : Motivation Two fundamental problems with present RDBMSs : Closed World Assumption Extremely literal SELECT market_capitalization FROM company WHERE name = “I.B.M” Entity Resolution : easy for humans, difficult for computers Get the best of both sides : Human power for subjective compare and finding new data Traditional RDBMs for heavy lifting data manipulations

Design considerations Design considerations for CrowdDB Performance and variability : Humans are slow, inaccurate, costly, variable Task design and ambiguity : Natural language is inherently ambiguous Affinity and learning : Workers develop relationships with requesters, skills for certain kind of HITs Open world : Possibly unbounded number of answers

CrowdDB : Overview Query database using CrowdSQL : slight extension of standard SQL Automatic UI generation Automatic interaction with crowdsourcing marketplace Automatically store results obtained from crowd into database for future use.

Amazon Mechanical Turk (AMT) HIT : smallest entity of work a worker can accept to do Assignment : Replicas of same HIT for majority voting HIT Group : Similar HITs put together AMT API : createHIT(HIT parameters) → HitID getAssignmentsForHIT(HitID)→list(asnId,workerId,answer) approveAssignment(asnID)/rejectAssignment(asnID) forceExpireHIT(HitID)

CrowdSQL : Incomplete data Crowdsourced Tables CREATE CROWD TABLE Professors ( name STRING PRIMARY KEY, email STRING UNIQUE, university STRING, department STRING, FOREIGN KEY (university, department) REF Department(university, name) ); special keyword : CROWD Crowdsourced Columns CREATE TABLE Department ( university STRING, name STRING, url CROWD STRING, phone STRING, primary key (university, name) ); special keyword : CNULL Crowd equivalent of NULL CNULL values get filled during queries SELECT url from Department WHERE name = “math”

CrowdSQL : Comparisons CROWDEQUAL SELECT name FROM Professor WHERE department ~= “CS” CROWDORDER SELECT p from picture WHERE subject = “Golden Gate Bridge” ORDER BY CROWDORDER(p, “Which picture visualises better %subject”)

Basic UI generation UI Templates are created at compile time Instantiated at run-time for each tuple Can be edited by application developers

Multi-relational UI generation Foreign key references a non-crowdsourced table: Drop-down box Ajax-based “suggest” function Foreign key references a crowdsourced table: Normalised interface with “suggest” function Denormalized interface

Query Processing Extended operators called crowd operators CrowdProbe collects missing information in CROWD columns/tables CrowdJoin collects new tuples from inner relation CrowdCompare implements CROWDEQUAL, CROWDORDER Overall task of crowd operators Create HITs and HIT groups Collect results from AMT Quality control through majority voting Rule-based optimiser Batch size, number of assignments and price per HIT fixed Predicate push-down, join ordering, delete optimisation Better alternative : Cost-based optimiser

Query Processing : Example

Results : Micro-tasks HITs Group size vs Responsiveness Tradeoff between HITs completed per unit time and % completion of HITs

Results : Micro-tasks Reward vs Responsiveness % of HITs fully completed % of HITs with at least 1 assignment done

Results : Micro-tasks Skewed distribution : Community of fan turkers cultivated No variation in error rate between fan turkers and others

Results : Complex queries Entity resolution on company names 4 non-uniform company names match to 100 companies (10 per HIT) Majority vote produces correct result always Ordering pictures in terms of relevance Majority voting based ranking matches expert ranking SELECT p.name, p.email, d.name, d.phone FROM Professor p, Department d WHERE p.department = d.name AND p.university = d.university AND p.name = "[name of a professor]" Joining Professors and Departments Method 1 : first professor details collected, then department details Method 2 : denormalized version, professor and department details collected together Method 1 outperforms Method 2 in accuracy Unclear instructions, turkers submitted professors’ phone numbers

Other observations Relationship of requester with turkers is long-term Keep workers happy Implement less stringent approval standards Good interface design and precise instructions matter Simple design changes like adding “None of the above” option can improve quality dramatically.

Related Works Related work in database systems : Oracle forms : Automatic interface generation from metadata Dealing with open-world nature using operation limiting, top N optimisations Dealing with performance volatility using adaptive query processing Related work in CrowdSourcing : Qurk : SQL plus UDF (Appeared in CIDR’11) Crowdsourced database using datalog (Appeared in CIDR’11) TurKit , toolkit for programming iterative algorithms in crowd Quality control and response time optimisation in CrowdSearch

Thank You!