CrowdDb.

Slides:



Advertisements
Similar presentations
TASK 8 HUMAN RESOURCES Team 2
Advertisements

Answering Queries using Humans, Algorithms & Databases Aditya Parameswaran Stanford University (Joint work with Alkis Polyzotis, UC Santa Cruz) 1/11/11.
Characteristic Functions. Want: YearCodeQ1AmtQ2AmtQ3AmtQ4Amt 2001e (from fin_data table in Sybase Sample Database) Have: Yearquartercodeamount.
The Entity-Relationship Model
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Databases Week 7 LBSC 690 Information Technology.
Databases Week 6 LBSC 690 Information Technology.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
Technical Writing Vikram Pudi. Vikram © IIIT 2 Dedicated to: My Ph.D advisor Prof. Jayant Haritsa IISc, Bangalore.
ارائه دهندگان: مجتبی بلبلی،احمد رحمانی،مجتبی صادقی استاد درس: دکتر شیخ اسماعیلی درس : پایگاه داده پیشرفته بهار91 دانشگاه آزاد اسلامی واحد علوم و تحقیقات.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
1 On Provenance of Non-Answers for Queries over Extracted Data Jiansheng Huang Ting Chen AnHai Doan Jeffrey F. Naughton.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
SQL Structured Query Language Programming Course.
CS1100: Data, Databases, Queries Action Queries CS11001Advanced Queries.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Views Lesson 7.
Crowdsourced Enumeration Queries Ruihan Shan. Introduction Motivation.
Eight Strategies to Solving Real World Application Problems Sandra Evans.
Technical Writing (Applies to research papers and theses)
COP Introduction to Database Structures
CPSC 603 Database Systems Lecturer: Laurie Webster II, Ph.D., P.E.
DBM 380 AID Focus Dreams/dbm380aid.com
CrowdDb.
Let try to identify the conectivity of these entity relationship
Database Principles.
View Integration and Implementation Compromises
SQL Relational Database Project
Databases We are particularly interested in relational databases
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
So, what was this course about?
DBM 380 aid Education Begins/dbm380aid.com
Deco + Crowdsourcing Summary
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Relational Databases.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Evaluating Statements about Radicals
Exam Structure Exam (1 Hr 30 Minutes)
CrowdDB : Answering queries with Crowdsourcing
SQL: Structured Query Language DML- Queries Lecturer: Dr Pavle Mogin
DBM 380 Competitive Success/snaptutorial.com
DBM 380 AID Lessons in Excellence-- dbm380aid.com.
CIS 515 STUDY Lessons in Excellence-- cis515study.com.
DBM 380 Education for Service/snaptutorial.com
DBM 380 Teaching Effectively-- snaptutorial.com
Database Management Systems (CS 564)
Deco: Declarative Crowdsourcing
Decoding the Cardinality Estimator to Speed Up Queries
A SIMPLE GUIDE TO FIVE NORMAL FORMS (See the next slide for required reading) Prof. Ghandeharizadeh 2018/11/14.
Reading Comprehension Questions
Overview of Database Systems
Lecture#5: Relational calculus
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Selected Topics: External Sorting, Join Algorithms, …
The Relational Model Textbook /7/2018.
Cooperative Query Answering
The Entity-Relationship Model
Unit 7 Normalization (表格正規化).
Probabilistic Databases
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Core Course Knowledge Lesson 6
Topic 15 Lesson 1 – Action queries
Core Course Knowledge Lesson 6
Chen Li Information and Computer Science
Lecture 5- Query Optimization (continued)
You Applied; now what?.
Week 6 LBSC 690 Information Technology
Database Management system
Database Management system
Advanced Topics: Indexes & Transactions
Presentation transcript:

CrowdDb

History Lesson First crowd-powered database At that time, the state of the art was turkit Programming library for the crowd Two other crowd-powered databases at around the same time Deco (Stanford, UC Santa Cruz) Qurk (MIT) Necessarily incomplete, preliminary

Motivation of CrowdDB Two reasons why present DB systems won’t do: Closed world assumption Get human help for finding new data Very literal in processing data SELECT marketcap FROM company WHERE name = “IBM” Get the best of both worlds: human power for processing and getting data traditional systems for heavy lifting/data manip

Issues in building CrowdDB Performance and variability: Humans are slow, costly, variable, inaccurate Task design and ambiguity: Challenging to get people to do what you want Affinity / Learning Workers develop relationships with requesters, skills Open world Possibly unbounded answers

History Lesson Even now (4 years later), there is no real complete, fully-functional crowd-powered database Why?

History Lesson Even now, there is no real complete, fully-functional crowd-powered database Why? No one understands the crowds (EVEN NOW) We were all naïve in thinking that we could treat crowds as just another data source. People don’t seem to want to use crowds within databases Crowdsourcing is a one-off task Crowds have very different characteristics than other data

Still… The ideas are very powerful and applicable everywhere you want data to be extracted Very common use-case of crowds

Semantics Semantics = an understanding of what the query does Regular SQL has very understandable semantics because starting from a given state, you know exactly what state you will be once you execute a statement. Does CrowdSQL have understandable/ semantics? How would you improve it?

Semantics Does CrowdSQL have understandable/ semantics? How would you improve it? Fill in CNULL; LIMIT clause What if you had more than the limit # of tuples already filled in? Overall, very hard. But at the least: A specification of budget? A specification that cost/latency is minimized?

Optimization Techniques Beyond the ones presented in the paper, what other “database style” optimization techniques can you think of?

Optimization Techniques Beyond the ones presented in the paper, what other “database style” optimization techniques can you think of? Paper mentions predicate pushdown, e.g., if you only care about tuples in CA, instantiate interfaces with CA filled in. Not always good – evaluating crowd predicates may be costly Reorder tables such that more “complete” tables are filled first. Reorder predicates such that more “complete” predicates are checked first. SELECT * FROM PROEFESSOR WHERE Dept = “math” AND Email LIKE “%berkeley%”

Recording Data CrowdDB only records either CNULL or the final outcome. Why might this be a bad idea?

Recording Data CrowdDB only records either CNULL or the final outcome. Why might this be a bad idea? Needs and aggregations schemes change An application that requires more accuracy We find that people are more erroneous than we expected Data may get stale

Joins between crowdsourced relations CrowdDB forbids joins between two crowdsourced tables. Is there a case where we may want that?

Joins between crowdsourced relations CrowdDB forbids joins between two crowdsourced tables. Is there a case where we may want that? Sure: People in a department, courses taught in the department What interesting challenges emerge there?

Joins between crowdsourced relations CrowdDB forbids joins between two crowdsourced tables. Is there a case where we may want that? Sure: People in a department, courses taught in the department What interesting challenges emerge there? Get more tuples for one relation or the other. Especially if not K-FK join

FDs? CrowdDB assumes a primary key per table. What if there are other Functional Dependencies? Can we do better?

FDs? CrowdDB assumes a primary key per table. What if there are other Functional Dependencies? Can we do better? Example: Company, City, State

Other things… What other issues did you identify in the paper?

Other things… What other issues did you identify in the paper? CROWDTABLE: What if workers refer to two entities in a slightly different manner: Jiawei Han vs. J. Han Spelling mistakes CROWDPROBE: What if some information is just hard to crowdsource? Bottlenecked?