CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

CS 245Notes 141 CS 245: Database System Principles Notes 14: Coping with Limited Capabilities of Sources Hector Garcia-Molina.
CSE 636 Data Integration Data Integration Approaches.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
DBLABNational Taiwan Ocean University1/35 A Document-based Approach to Indexing XML Data Ya-Hui Chang and Tsan-Lung Hsieh Department of Computer Science.
CS 540 Database Management Systems
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Template-Based Wrappers in the TSIMMIS System Joachim Hammer Hector Garcia-Molina Svetlozer Nestorov Ramana Yerneni Marcus Breunig Vasilia Vassalos SIGMOD97.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
R OAD R UNNER : Towards Automatic Data Extraction from Large Web Sites Valter Crescenzi Giansalvatore Mecca Paolo Merialdo VLDB 2001.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Excel 2010 by Robert Grauer, Keith.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
1 Searching and Integrating Information on the Web Seminar 2: Data Integration Professor Chen Li UC Irvine.
Multimedia Databases (MMDB)
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
Database Management 9. course. Execution of queries.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv Jeffrey.
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,
Copyright © Curt Hill Query Evaluation Translating a query into action.
1 Query Processing in the Presence of Limited Source Capabilities Chen Li Information and Computer Science UC Irvine.
Database Systems Part VII: XML Querying Software School of Hunan University
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Mr C Johnston ICT Teacher G042 – Lecture 02 Using Logical Operators To Aid Searching.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Capability-Sensitive Query Processing on Internet Sources
Information Integration(cntd.)
Database Management System
Databases : More about SQL
Capability Based Mediation in TSIMMIS
A Shopping Agent for the WWW
Computing Full Disjunctions
Query Execution Presented by Khadke, Suvarna CS 257
Advance Database Systems
CSE 6408 Advanced Algorithms.
Materializing Views With Minimal Size To Answer Queries
Presentation transcript:

CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006

2 Heterogeneous Databases data DBMS 1 data DBMS 2 data legacy data web site Distributed Database System

3 Limited Capabilities

4 author: title: subject: format: price: must specify at least one of these this attribute not returned cannot query on this attribute menu of choices Example: Amazon.com

5 Example: BarnesAndNoble.com must specify at least one of these can query if one of other attributes specified Menu of choices author: title: subject: format: price:

6 Why Limited Capabilities? Search forms Security Indexes Legacy

7 Capability vs. Content Capability description –Can only search for subject = “art,” “history,” “science” Content description –Source only contains subject = “art,” “history,” “science”

8 Describing source capabilities Extending source capabilities How mediators cope with limited capabilities Mediator capabilities Other topics Outline Mediator Source Wrapper

9 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S

10 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S With output restriction f’ u’ b’ c’[S] o’[S]

11 Example Relation R(X, Y, Z) Description Templates: bu’f, uf’c[z 1, z 2 ] Answerable queries: R(x 1, Y, Z), R(X, Y, z 1 ) Unanswerable queries: R(X, y 1, Z), R(X, Y, z 3 )

12 Extending Source Capabilities amazon Wrapper Query: author=“Freud” AND price > 10 Source: R(author, price,...) Template: b, u,...

13 Extending Source Capabilities Source: R(author, price,...) Template: b, u,... Query: author=“Freud” AND price > 10 Source Query: author=“Freud” Wrapper Filter: price > 10 amazon Wrapper

14 Another Example Barnes&Noble Wrapper Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author

15 Another Example Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author Q1: author = “Freud” AND price < 10 Q2: author = “Jung” AND price < 10 Union Operation Barnes&Noble Wrapper

16 Other Description Mechanisms Tsimmis –Query templates Information Manifold –capability records (# bound attrs, conditions ok,...) Disco Garlic –black box

17 Extending Source Capabilities General scheme: –try many query rewritings –check if query fragments supported by source –check if wrapper can combine answer fragments –do all this very efficiently!! –H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources, ICDE 1999 Tsimmis, Info Manifold: no disjunctive queries DISCO: no query splitting Garlic: only CNF queries

18 Tsimmis Suppose a database contains information about employees and students, the only queries that are accepted by the database are: –Retrieve person records by specifying the last name –Retrieve person records by specifying the first and the last name –Retrieve all person records by issuing the command

19 Tsimmis Query templates –Retrieve person records by specifying the last name O :- }> –Retrieve person records by specifying the first and the last name O :- –Retrieve all person records by issuing the command O :-

20 Tsimmis Directly supported queries Q :- }> Logical supported queries Q:- }> Indirectly supported queries Q :- }>

21 Information Manifold (IM) IM used capability records to capture the two kinds of capabilities: –The ability of sources to apply a number of selections. –The limited forms of variable bindings that an source can accept. A capability records has the form (S in, S out, S sel, min, max) The information must be given bindings for at least min elements of S in, The elements in S out are the parameters that can be returned from the source.

22 Information Manifold (IM) Car reviews database, containing reviews for cars manufactured after V(m, y, r) :- Car(c), Model(c, m), Year(c, y), ProductReview(m, y, r) Capabilities: ({m}, {m, y, r}, {y}, 1, 2)

23 Garlic Allows unrestricted condition expressions The condition expressions are transformed into CNF, and then each clause in the CNF expression is considered for evaluation at the source. –If the source cannot evaluate a clause, it is evaluated by Garlic itself by downloading the source. –This attemp is not only expensive but also may not be allowed by the source.

24 DISCO DISCO does not explore the possibility of splitting the condition expression into parts Only those options in which the source processes the entire condition expression, or no part of it are considered. This strategy limits DISCO’s ability to generate feasible plans for many queries.

25 An Example Suppose we are looking for books written by Freud or Jung on the topic of dreams in the Internet bookstore BarnesAndNoble, which does not allow us to search for two authors at once: Query: (author = “Freud” OR author = “Jung”) AND (title contains “dreams”) Garlic can not evaluate the first clause The second clause extracts over 2,000 entries.

26 An Example (cont.) A better plan is to break up the query into two. –First search for (author = “Freud” AND title contains “dreams”) –Then for (author = “Jung” AND title contains “dreams”) –Union the results of the two queries –The plan extracts fewer than 20 entries

27 Mediator Processing R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper

28 Plan 1 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (1) R(5, Y, Z) (2) T(Z, W, 3) (3) Join answers

29 Plan 2 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (3) Join answers (1) P = T(Z, W, 3) (2) for each (z,w,u)  P: R(5, Y, z)

30 Mediator Plan Generation Need feasible and efficient plan Search space is huge Tsimmis, Info Manifold, Garlic: – exponential algorithms Polynomial algorithms: –often find optimal or near-optimal plan –bounded performance –R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999

31 Conclusion Not all sources are created equal! Need to –describe what sources can do –efficiently process queries with limited sources –describe what mediators can do –exploit content information –deal with unavailable sources

32 References Computing Capabilities of Mediators –Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey D. Ullman –SIGMOD Conference 1999 Describing and Using Query Capabilities of Heterogeneous Sources –Vasilis Vassalos, Yannis Papakonstantinou –VLDB 1997