ارائه دهندگان: مجتبی بلبلی،احمد رحمانی،مجتبی صادقی استاد درس: دکتر شیخ اسماعیلی درس : پایگاه داده پیشرفته بهار91 دانشگاه آزاد اسلامی واحد علوم و تحقیقات.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

What is a Database By: Cristian Dubon.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
CIT 613: Relational Database Development using SQL Introduction to SQL.
Accounting System Design
Relational Databases Chapter 4.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Introduction to Structured Query Language (SQL)
Database Management: Getting Data Together Chapter 14.
Introduction to Structured Query Language (SQL)
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Universe Design Concepts Business Intelligence Copyright © SUPINFO. All rights reserved.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Objectives of the Lecture :
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Database Design - Lecture 1
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Database Technical Session By: Prof. Adarsh Patel.
Introduction: Databases and Database Users
Chapter 6: Modularity Using Functions. In this chapter, you will learn about: – Function and parameter declarations – Returning a single value – Returning.
Chapter 4: Organizing and Manipulating the Data in Databases
CODD’s 12 RULES OF RELATIONAL DATABASE
I Copyright © Oracle Corporation, All rights reserved. Introduction.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
SQL Structured Query Language Programming Course.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Database Systems DBMS Environment Data Abstraction.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ITGS Databases.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
Mr.Prasad Sawant, MIT Pune India Introduction to DBMS.
Session 1 Module 1: Introduction to Data Integrity
DBMS 3. course. Reminder Data independence: logical and physical Concurrent processing – Transaction – Deadlock – Rollback – Logging ER Diagrams.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
SQL Basics Review Reviewing what we’ve learned so far…….
Rationale Databases are an integral part of an organization. Aspiring Database Developers should be able to efficiently design and implement databases.
1 CS122A: Introduction to Data Management Lecture #4 (E-R  Relational Translation) Instructor: Chen Li.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Fundamental of Database Systems
CrowdDb.
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
CrowdDb.
CrowdDB : Answering queries with Crowdsourcing
Chapter 4 Relational Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Deco: Declarative Crowdsourcing
What is a Database and Why Use One?
Chapter 2 Database Environment.
Accounting System Design
Teaching slides Chapter 8.
Database Systems Instructor Name: Lecture-3.
Accounting System Design
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

ارائه دهندگان: مجتبی بلبلی،احمد رحمانی،مجتبی صادقی استاد درس: دکتر شیخ اسماعیلی درس : پایگاه داده پیشرفته بهار91 دانشگاه آزاد اسلامی واحد علوم و تحقیقات کردستان SRB of Islamic Azad University, Kurdistan

 Relational database systems have achieved widespread adoption: ◦ business environments for which they were originally envisioned ◦ many types of structured data like personal, social, and even scientific information  as data creation and use become increasingly democratized through web, mobile and other technologies, the limitations of the technology are becoming more apparent !! SRB of Islamic Azad University, Kurdistan2 Relational database

SRB of Islamic Azad University, Kurdistan3 correctness unambiguity completeness RDBMS

SRB of Islamic Azad University, Kurdistan4 correctness unambiguity completeness RDBMS No| Incomplete | Incorrect Answers

 One obvious situation where existing systems produce wrong answers is when they are missing information required for answering the question SRB of Islamic Azad University, Kurdistan5

 return an empty answer if the company table instance in the database at that time does not contain a record for “I.B.M.”  Reasons : ◦ a data entry mistake may have omitted the I.B.M. record or the record may have been inadvertently deleted. ◦ the record was entered incorrectly, like “I.B.N.” ◦ Traditional systems can erroneously return an empty result even when no errors are made. Like “International Business Machines” instead of “IBM” SRB of Islamic Azad University, Kurdistan6

 relational database systems are based on the Closed World Assumption (information that is not in the database is considered to be false or non-existent).  relational databases are extremely literal: ◦ They expect that data has been properly cleaned and validated before entry and do not natively tolerate inconsistencies in data or queries. SRB of Islamic Azad University, Kurdistan7

 unless the relevance of pictures to specific topics has been previously obtained and stored, there is simply no good way to ask this question of a standard RDBMS SRB of Islamic Azad University, Kurdistan8 SELECT image FROM picture WHERE topic = "Business Success" ORDER BY relevance LIMIT 1; SELECT image FROM picture WHERE topic = "Business Success" ORDER BY relevance LIMIT 1; consider a query to find the best among a collection of images to use in a motivational slide show:

SRB of Islamic Azad University, Kurdistan9 SELECT market_capitalization FROM company WHERE name = "I.B.M."; SELECT market_capitalization FROM company WHERE name = "I.B.M."; SELECT image FROM picture WHERE topic = "Business Success" ORDER BY relevance LIMIT 1; SELECT image FROM picture WHERE topic = "Business Success" ORDER BY relevance LIMIT 1; above queries, however, while unanswerable by today’s relational database systems, could easily be answered by people, especially people who have access to the Internet.

 Recently, there has been substantial interest in harnessing Human Computation for solving problems that are impossible or too expensive to answer correctly using computers.  The approach we take in CrowdDB is to exploit the extensibility of the iterator based query processing paradigm to add crowd functionality into a DBMS.  CrowdDB extends a traditional query engine with a small number of operators that solicit human input by generating and submitting work requests to a microtask crowdsourcing platform. SRB of Islamic Azad University, Kurdistan10

SRB of Islamic Azad University, Kurdistan11  Finding new data :  recall that a key limitation of relational technology stems from the Closed World Assumption. People, on the other hand, aided by tools such as search engines and reference sources, are quite capable of finding information that they do not have readily at hand.  Comparing data :  people are skilled at making comparisons that are difficult or impossible to encode in a computer algorithm. For example, it is easy for a person to tell, if given the right context, if “I.B.M.” and “International Business Machines” are names for the same entity. Likewise, people can easily compare items such as images along many dimensions, such as how well the image represents a particular concept.

 by using SQL we create a declarative interface to the crowd.  Crowd DB provides Physical Data Independence for the crowd.  user interface design is a key factor in enabling questions to be answered by people. leverages schema information to support the automatic generation of effective user interfaces for crowd sourced tasks.  because of its declarative programming environment and operator-based approach, Crowd DB presents the opportunity to implement cost-based optimizations to improve query cost, time, and accuracy. SRB of Islamic Azad University, Kurdistan12

 Crowdsourcing is a distributed problem solving and production model.  A crowdsourcing platform creates a marketplace on which requesters offer tasks and workers accept and work on the tasks SRB of Islamic Azad University, Kurdistan13

 is a crowdsourcing Internet marketplace that enables computer programmers (known as Requesters) to co-ordinate the use of human intelligence to perform tasks that computers are unable to do yet.  AMT provides an API for requesting and managing work, which enables us to directly connect it to the Crowd DB query processor.  In AMT, as part of specifying a task, a requester defines a price/reward (minimum $0.01) that the worker receives if the task is completed satisfactorily.  In AMT, workers from anywhere in the world can participate but requesters must be holders of a US credit card. SRB of Islamic Azad University, Kurdistan14

 HIT: ◦ is the smallest entity of work a worker can accept to do  Assignment: ◦ Every HIT can be replicated into multiple assignments.  HIT Group: ◦ AMT automatically groups similar HITs together into HIT Groups based on the requester, the title of the HIT, the description, and the reward. SRB of Islamic Azad University, Kurdistan15

SRB of Islamic Azad University, Kurdistan16 Requester HIT Groups AMT Groups HITs/ Assignment Workers Workflow :

SRB of Islamic Azad University, Kurdistan17 Worker Worker ‘s web interfaces Main AMT interface Provided by Requester

SRB of Islamic Azad University, Kurdistan18 createHIT(title, description, question, keywords, reward, duration, max Assignments, lifetime) HitID

SRB of Islamic Azad University, Kurdistan19 getAssignmentsForHIT(HitID) list(asnId, workerId, answer)

SRB of Islamic Azad University, Kurdistan20 approveAssignment(asnID) / rejectAssignment(asnID) reward to the worker and the commission to Amazon / no reward to the worker and the commission to Amazon

SRB of Islamic Azad University, Kurdistan21 forceExpireHIT(HitID) Expires a HIT immediately

 The crowd can be seen as a set of specialized processors.  Humans are good at certain tasks (e.g., image recognition)  machines are good at certain tasks (e.g., numbers). SRB of Islamic Azad University, Kurdistan22 Human Machine Crowd

 Performance and Variability  Task Design and Ambiguity  Affinity and Learning  Relatively Small Worker Pool  Open vs. Closed World SRB of Islamic Azad University, Kurdistan23

SRB of Islamic Azad University, Kurdistan24

 Turker Relationship Management  User Interface Management  HIT Manager SRB of Islamic Azad University, Kurdistan 25

 a SQL extension that supports crowdsourcing.  Involve missing data and subjective comparisons  SQL code in the same way as they do for traditional databases.  developers are not aware that their code involves crowdsourcing. SRB of Islamic Azad University, Kurdistan26 CrowdSQL

 SQL DDL Extensions  specific attributes of tuples could be crowdsourced.  entire tuples could be crowdsourced.  special keyword,CROWD SRB of Islamic Azad University, Kurdistan27

SRB of Islamic Azad University, Kurdistan28 Crowdsourced Column In the following Department table, the url is marked as crowdsourced.

SRB of Islamic Azad University, Kurdistan29 Crowdsourced Table This example models a Professor table as a crowdsourced table.

SRB of Islamic Azad University, Kurdistan30  CrowdDB allows any column and any table to be marked with the CROWD keyword.  CrowdDB does not impose any limitations with regard to SQL types and integrity constraints  CROWD tables must have a primary key

 DML (Data Manipulation Language ) ◦ SELECT - UPDATE - DELETE - INSERT INTO  DDL ( Data Definition Language) ◦ CREATE TABLE - ALTER TABLE - DROP TABLE - CREATE INDEX - DROP INDEX  DCL ( Data Control Language) ◦ GRANT -REVOKE SRB of Islamic Azad University, Kurdistan 31

 In order to represent values in crowdsourced columns  It is also possible to specify the value of a CROWD column as part of an INSERT statement SRB of Islamic Azad University, Kurdistan 32

 CrowdDB supports any kind of SQL query on CROWD tables and columns  joins and sub-selects between two CROWD tables are allowed SRB of Islamic Azad University, Kurdistan 33 Query Semantics

SRB of Islamic Azad University, Kurdistan 34 Subjective Comparisons  two new built in functions CROWDEQUAL and CROWDORDER.  CROWDEQUAL ◦ takes two parameters (an lvalue and an rvalue)

SRB of Islamic Azad University, Kurdistan 35 CROWDORDER  rank or order results.

SRB of Islamic Azad University, Kurdistan 36 CrowdSQL in Practice  Minimal extension to SQL  CrowdSQL changes the closed-world to an open-world assumption  cost and response time of queries can be unbounded  provide a way to define a budget for a query  Only one way to specify a budget—using a LIMIT

SRB of Islamic Azad University, Kurdistan 37 Lineage ◦ query the lineage in order to take actions. cleansing of crowdsourced data ◦ entity resolution of crowdsourced data ◦ all CROWD tables must have a primary key

SRB of Islamic Azad University, Kurdistan 38 USER INTERFACE GENERATION  provision of effective user interfaces.  automatically generates user interfaces.  two-step process in CrowdDB  user interfaces are HTML (and JavaScript) forms

SRB of Islamic Azad University, Kurdistan 39 Basic Interfaces  The title of the HTML is the name of the Table.  ask the worker to input the missing information.  copying the known field values into the HTML form.  generates JavaScript code to check for the correct types of input.

SRB of Islamic Azad University, Kurdistan 40 Multi-Relation Interfaces  the foreign key references a non-crowdsourced table.  generated user interface shows a drop-down box.  Ajax-based “suggest” function is provided.  CrowdDB supports two types of user interfaces: ◦ normalized interface. ◦ denormalized interface.

SRB of Islamic Azad University, Kurdistan 41 QUERY PROCESSING  query plan generation and execution follows a largely traditional approach.  The main differences are that the CrowdDB parser.

SRB of Islamic Azad University, Kurdistan 42 Crowd Operators  implements all operators of the relational algebra, just like any traditional database system.  initialized with a user interface template and the standard HIT parameters.  quality control is carried out by a majority vote.

SRB of Islamic Azad University, Kurdistan 43 CrowdDB has three Crowd operators:  CrowdProbe: ◦ crowdsources missing information of CROWD columns.  CrowdJoin : ◦ implements an index nested-loop join over two tables.  CrowdCompare : ◦ implements the CROWDEQUAL and CROWDORDER functions

SRB of Islamic Azad University, Kurdistan 44 Physical Plan Generation  A query is first parsed.  optimized using traditional and crowdspecific optimizations.  only predicate push-down was applied.  the logical plan is translated into a physical plan

SRB of Islamic Azad University, Kurdistan 45 Heuristics  based on a simple rule-based optimizer.  implements several essential query rewriting rules such as predicate push-down.  annotates the query plan with the cardinality predictions between the operators.  re-order the operators to minimize the requests.  in contrast to a cost-based optimizer, a rule-based optimizer is not able to exhaustively explore all parameters.

 We ran over 25,000 HITs on AMT and measured the response time and quality of the answers provided by the workers.  It is important to note that the results presented in this section are highly dependent on the properties of the AMT platform at a particular point in time.  our results are highly-dependent on the specific tasks we sent to the crowd SRB of Islamic Azad University, Kurdistan 46

 We begin by describing the results of experiments with simple tasks requiring workers to find and fill in missing data for a table with two crowdsourced columns: SRB of Islamic Azad University, Kurdistan 47

 In the first set of experiments we examined the response time of assignments as a function of the number of HITs in a HIT Group.  The results show that response times decrease dramatically as the size of the HIT Groups is increased.(figure 4) SRB of Islamic Azad University, Kurdistan 48

 This graph shows that there is a tradeoff between throughput (in terms of HITs completed per unit time) and completion percent.(Figure 5) That is, while the best throughput obtained was for the largest Group size (223.8 out of a Group of 400 HITs), the highest completion rates were obtained with roups of 50 or 100 HITs. SRB of Islamic Azad University, Kurdistan 49

 Figure 6 shows the fraction of HITs (i.e., all five assignments) that were completed as a function of time. The figure shows the results for HITs with a reward of (from the bottom up) 1, 2, 3, and 4 cents per assignment. Obviously, for all rewards, the longer we wait the more HITs are completed. The results show that for this particular task, paying 4 cents gives the best performance, while paying 1 cent gives the worst. SRB of Islamic Azad University, Kurdistan 50

 Comparing Figures 6 and 7, two observations can be made. First, within sixty minutes almost all HITs received at least one answer if a reward of 2, 3, or 4 cents was given (Figure 7), whereas only about 65% of the HITs were fully completed even for the highest reward of 4 cents (Figure 6). Second, if a single response to a HIT is acceptable, there is little incentive to pay more than 2 cents in this case. SRB of Islamic Azad University, Kurdistan 51

 The previous experiments focused on the response time and through-put of the crowd. Here, we turn our attention to two other issues: distribution of work among workers and answer quality.  In this experiment, every HIT had five assignments.  Figure 8 shows the results. For each worker, Figure 8 shows the number of HITs computed by that worker and the number of errors made by that worker. In Figure 8, the workers are plotted along the x-axis in decreasing order of the number of HITs they processed. SRB of Islamic Azad University, Kurdistan 52

 For this experiment, we used a simple company schema with two attributes,company Name and headquarter address, and populated it with the Fortune 100 companies.  In Figure 9 we show results for four different instances of this query. Each HIT involved comparing ten company names with one of the four “non-uniform names”. Furthermore, each HIT had three assignments and the reward was 1 cent per HIT. SRB of Islamic Azad University, Kurdistan 53

 Figure 10 shows the ranking of the Top 8 pictures for the “Golden Gate” subject area. More concretely, Figure 10 shows the picture,the number of workers that voted for that picture, the ranking of that picture according to the majority vote, vs. SRB of Islamic Azad University, Kurdistan 54

 We highlighted two cases where human input is needed: 1_unknown or incomplete data, and 2_subjective comparisons.  We described simple extensions to the SQL DDL and DML for crowdsourcing.  Performance, in terms of latency and cost, is also effected by the way in which CrowdDB interacts with workers. SRB of Islamic Azad University, Kurdistan 55

 Of course, there is future work to do. Answer quality is a pervasive issue that must be addressed. Clear semantics in the presence of the open-world assumption are needed for for both crowd-sourced values and tables.  Cost-based optimization for the crowd involves many new parameters and considerations. Caching and managing crowdsourced answers, if done properly, can greatly improve performance as well. SRB of Islamic Azad University, Kurdistan 56

با تشکر از توجه شما SRB of Islamic Azad University, Kurdistan57