ConQuer: Efficient Management of Inconsistent Databases Presented by: Presented by: Ariel Fuxman (Univ. of Toronto) Ariel Fuxman (Univ. of Toronto) Joint.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Picking out ERD Entities 14 th March Steps in Constructing ERDs 1.Read and re-read the narrative 2.Make assumptions 3.Identify the entities 4.Define.
1 Lecture 5: SQL Schema & Views. 2 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL)
Spreadsheet As a Relational Database Engine Jerzy Tyszkiewicz Institute of Informatics University of Warsaw.
Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.
Group Recommendation: Semantics and Efficiency
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Using MIS 2e Chapter 5 Database Processing MARIA DEL MORAL GROUP F.
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Manajemen Basis Data Pertemuan Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
Efficient Management of Inconsistent and Uncertain Data Renée J. Miller University of Toronto.
February 14, 2006CS DB Exploration 1 Congressional Samples for Approximate Answering of Group-By Queries Swarup Acharya Phillip B. Gibbons Viswanath.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Concepts of Database Management Sixth Edition
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
1 The Big Picture of Databases We are particularly interested in relational databases Data is stored in tables.
CS405G: Introduction to Database Systems Final Review.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Making Database Applications Perform Using Program Analysis Alvin Cheung Samuel Madden Armando Solar-Lezama MIT Owen Arden Andrew C. Myers Cornell.
Computer Science & Engineering 2111 Introduction to Database Management Systems Relationships and Database Creation 1 CSE 2111 Introduction to Database.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Concepts of Database Management Seventh Edition
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
99ATS Turbocharge your Hiring Process !!. ON TARGET Solution offered by 99ATS Overview Introduction Gaps in Recruitment Process Screenshot overview of.
DATABASES Pindaro Demertzoglou – Lally School of Management and Technology.
Lecture 1: Introduction Faculty of Computer Science Technion – Israel Institute of Technology Spring 2015.
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Discussion of Conditional Functional Dependencies Erik Wang.
Databases Unit 3_6. Flat File Databases One table containing data Data must be entered as a whole each time e.g. customer name and address each time (data.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Hippo a System for Computing Consistent Query Answers to a Class of SQL Queries Jan Chomicki University at Buffalo Jerzy Marcinkowski Wroclaw University.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
ConQuer: Efficient Management of Inconsistent Databases Presented by: Presented by: Ariel Fuxman (Univ. of Toronto) Ariel Fuxman (Univ. of Toronto) Joint.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
Databases We are particularly interested in relational databases
CIS 336 Slingshot Academy / cis336study.com
CIS 336 PAPERS Education Your Life-- cis336papers.com.
CIS 336 PAPERS Education for Service-- cis336papers.com.
Managing Inconsistent Data in Data Integration and Data Exchange
Views Automatic View Modifications.
Consistent Query Answering: a personal perspective
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
Evaluate the limit: {image} Choose the correct answer from the following:
Translating Imperative Code into SQL
Marceline Anderson A r e a s o f E x p e r t i s e
Given that {image} {image} Evaluate the limit: {image} Choose the correct answer from the following:
Presentation transcript:

ConQuer: Efficient Management of Inconsistent Databases Presented by: Presented by: Ariel Fuxman (Univ. of Toronto) Ariel Fuxman (Univ. of Toronto) Joint work with: Renée J. Miller (Univ of Toronto) Diego Fuxman (Univ. Nacional del Sur)

Ariel Fuxman, Diego Fuxman, Renée J. Miller 2 A system designed to answer SQL queries over inconsistent databases A system designed to answer SQL queries over inconsistent databases ConQuer 130KMary 110KMary 400KPaul 200KPeter 40KPeter IncomeName name should be the key INCONSISTENT DATABASE

Ariel Fuxman, Diego Fuxman, Renée J. Miller 3 One Application Sales Shipping Customer Support Web Forms Demographic Data IntegratedCustomerDatabase Customer Relationship Management (CRM)

Ariel Fuxman, Diego Fuxman, Renée J. Miller 4 Disagreement Between Sources Which tuple for Peter should we delete? Which tuple for Peter should we delete? Removing both tuples loses consistent information Removing both tuples loses consistent information Deciding the correct income may require human intervention Deciding the correct income may require human intervention 110K… 20 Union Street Mary 400K… 100 Bloor Street Paul …. … 276 College Street address 40KPeter incomename 400K… 100 Bloor Street Paul 130K… 20 Union Street Mary …. … 276 College Street address 200KPeter incomename sales web

Ariel Fuxman, Diego Fuxman, Renée J. Miller 5 Inconsistent Integrated Database name…income Peter…40K Paul…400K Mary…110K name…incomePeter…200K Paul…400K Mary…130K name…incomePeter…40K Peter…200K Paul…400K Mary…110K Mary…130K Sales Web Integrated Database Transfer all conflicting tuples to the integrated database INCONSISTENT DATABASE

Ariel Fuxman, Diego Fuxman, Renée J. Miller 6 Query Answering q=“Get customers who make more than 100K” 130K 110K 400K 200K 40K income web sales sales/web web sales Mary Mary Paul Peter Peter name Peter,Paul,Mary Peter should NOT be offered a Platinum card!! Offering a Platinum credit card…

Ariel Fuxman, Diego Fuxman, Renée J. Miller 7 Semantics of Query Answering Get customers who possibly make more than 100K Get customers who possibly make more than 100K Peter, Paul, Mary Peter, Paul, Mary Get customers who certainly make more than 100K Get customers who certainly make more than 100K Paul, Mary Paul, Mary CONSISTENTANSWER [Arenas et al. 99] custidincome Peter40Ksales Peter200Kweb Paul400Ksales/web Mary110Ksales Mary130Kweb

Ariel Fuxman, Diego Fuxman, Renée J. Miller 8RepairsPeter40K Paul400K Mary110K Peter40KPaul400K Mary130K Peter200KPaul400K Mary110K Peter200KPaul400K Mary130K 130K 110K 400K 200K 40K income web sales sales/web web sales Mary Mary Paul Peter Peter custid Inconsistent database Repairs Key: custid

Ariel Fuxman, Diego Fuxman, Renée J. Miller 9 CONSISTENT ANSWERS Answers obtained no matter which repair we choose Consistent Query Answers Peter40K Paul400K Mary110K Peter40KPaul400K Mary130K Peter200KPaul400K Mary110K Peter200KPaul400K Mary130K q=“Get customers who make more than 100K” q q q q CONSISTENTANSWER={Paul,Mary} Repairs Mary Paul Peter Mary Paul Mary Paul Mary Paul Peter

Ariel Fuxman, Diego Fuxman, Renée J. Miller 10 Problem Potentially HUGE number of repairs!

Ariel Fuxman, Diego Fuxman, Renée J. Miller 11 ConQuer ConQuer is a system designed to compute consistent answers efficiently ConQuer is a system designed to compute consistent answers efficiently avoids explicit construction of repairs avoids explicit construction of repairs reuses commercial database technology reuses commercial database technology

Ariel Fuxman, Diego Fuxman, Renée J. Miller 12 Commercial database engine ConQuer’s Solution Query q Keys Keys Rewritten Q * ConQuer’sRewritingAlgorithm [ICDT 05] [SIGMOD 05] Inconsistentdatabase Consistent answer to q

Ariel Fuxman, Diego Fuxman, Renée J. Miller 13 Contributions Rewriting algorithm Rewriting algorithm From a large class of SPJ SQL queries From a large class of SPJ SQL queries Into SQL queries Into SQL queries Rewriting for queries with grouping and aggregation Rewriting for queries with grouping and aggregation Optimized rewriting Optimized rewriting Exploits precomputed information, if available Exploits precomputed information, if available Experimental evaluation Experimental evaluation Large databases Large databases TPC-H queries TPC-H queries

Ariel Fuxman, Diego Fuxman, Renée J. Miller 14 Demo Present a case study of an inconsistent database about airports and cities Present a case study of an inconsistent database about airports and cities Explain the automatically generated rewritings Explain the automatically generated rewritings Deal with Select-Project-Join queries with grouping and aggregation Deal with Select-Project-Join queries with grouping and aggregation

Ariel Fuxman, Diego Fuxman, Renée J. Miller 15 ConQuer papers A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient Management of Inconsistent Databases, SIGMOD A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient Management of Inconsistent Databases, SIGMOD A. Fuxman and R. J. Miller. First-Order Query Rewriting for Inconsistent Databases, ICDT A. Fuxman and R. J. Miller. First-Order Query Rewriting for Inconsistent Databases, ICDT 2005.