Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Advanced SQL Queries. 2 Example Tables Used Reserves sidbidday /10/04 11/12/04 Sailors sidsnameratingage Dustin Lubber Rusty.
CS4432: Database Systems II
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture#6: Rel. model - SQL part1 (R&G, chapter.
Manajemen Basis Data Pertemuan Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Review Indra Budi Fakultas Ilmu Komputer UI 2 Database Introduction Database vs File Processing Main purpose of database Database Actors.
Review for Final Test Indra Budi
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all.
SQL Review.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 13: Incorporating Uncertainty into Data Integration PRINCIPLES OF DATA INTEGRATION.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Ontology Learning Mining Functional Dependencies from Data Hong Yao and Howard J. Hamilton Presented By Stephen Lynn.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
Computer Science 101 Web Access to Databases SQL – Extended Form.
Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.
Chapter 3 Single-Table Queries
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
1 CSCE Database Systems Anxiao (Andrew) Jiang The Database Language SQL.
Lecture 03 Entity-Relationship Diagram. Chapter Outline.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
Optimizing Query Processing In Sensor Networks Ross Rosemark.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
Day 5 - More Complexity With Queries Explanation of JOIN & Examples Explanation of JOIN & Examples Explanation & Examples of Aggregation Explanation &
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Southern Methodist University CSE CSE 2337 Introduction to Data Management Chapter 2.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Introduction to Databases (2)
Query-by-Example (QBE)
Chapter 3 Introduction to SQL(3)
Concept of Aggregation in SQL
Query Sampling in DB2.
Associative Query Answering via Query Feature Similarity
Data Mining Concept Description
Lecture 16: Probabilistic Databases
Range-Aggregate Query on Distributed Uncertain Database
DBMS with probabilistic model
Query Sampling in DB2.
Session 3 Welcome: To session 3-the sixth learning sequence
The Relational Model Textbook /7/2018.
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
SQL Aggregation.
Lectures 6: Introduction to SQL 5
Lecture 30: Final Review Wednesday, December 6, 2000.
Query Functions.
Probabilistic Databases
Lecture 30: Final Review Wednesday, December 10, 2003.
Theppatorn rhujittawiwat
LINQ to SQL Part 3.
Lecture 14: SQL Wednesday, October 31, 2001.
Instructor: Zhe He Department of Computer Science
Presentation transcript:

Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn

Data Integration Overview  Aggregate Queries  Probabilistic Schema Mapping  Goals/Objectives  Aggregate Processing (3 proposals)  By-Table Algorithm  By-Tuple Algorithm  Evaluation  Analysis

Data Integration Aggregate Queries COUNT, MIN, MAX, SUM, AVG IDPriceQuantity Simple PTIME algorithms to compute

Data Integration Probabilistic Schema Mappings

Data Integration By-Table vs By-Tuple  Tuple – consider all possible mappings for each tuple  Table – single mapping for entire table  P(date→postedDate) = 0.7  P(date→reducedDate) = 0.3

Data Integration Goals/Objectives  Impact Analysis of Probabilistic Schemas on Aggregate Queries  Aggregate Query Algorithms  Time Complexity Analysis  Evaluation

Data Integration Aggregation Methods Range Distribution Expected Value

Data Integration Method Relationships  Distribution  Most time consuming  Most information  Range  Computed directly from distribution  Expected Value  Computed directly from distribution More efficient ways to compute

Data Integration By-Table Algorithm All PTIME computable

Data Integration By-Tuple Algorithm (COUNT) O(n * m)

Data Integration Example By-Tuple (COUNT)

Data Integration Time Complexity

Data Integration Evaluation  Empirical Evaluation  Real-world dataset (eBay)  Synthetic dataset  Evaluate Time Complexity  Vary tuple numbers  Vary attribute mappings

Data Integration Evaluation Results

Data Integration Evaluation Results

Data Integration Evaluation Results

Data Integration Analysis  Strengths  Effect of probabilistic schemas on aggregates  Nice PTIME algorithms  Weaknesses  Evaluation was obvious  By-Table results biased by database optimizations  Future Work  Improve algorithms  Extend to sub-queries  Heuristics