CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

CS 245Notes 141 CS 245: Database System Principles Notes 14: Coping with Limited Capabilities of Sources Hector Garcia-Molina.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Data Integration Approaches.
CS 540 Database Management Systems
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Answering Queries Using Views Advanced DB Class Presented by David Fuhry March 9, 2006.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Template-Based Wrappers in the TSIMMIS System Joachim Hammer Hector Garcia-Molina Svetlozer Nestorov Ramana Yerneni Marcus Breunig Vasilia Vassalos SIGMOD97.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou.
CS 245Notes 121 CS 245: Database System Principles Notes 12: Distributed Databases Hector Garcia-Molina.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Query Translation of Web Database Integration: Issues, Advances and Directions Fangjiao Jiang.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
1 Searching and Integrating Information on the Web Seminar 2: Data Integration Professor Chen Li UC Irvine.
Multimedia Databases (MMDB)
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv Jeffrey.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
1 Query Processing in the Presence of Limited Source Capabilities Chen Li Information and Computer Science UC Irvine.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Return To Index Excel Microsoft Excel Basics Lesson 02 Excel Help Facilities Help WizardHelp Wizard - 2 Menu Toolbar Help Option - 4 Menu Toolbar.
1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Access 2002 Lab 3 Analyzing Tables and Creating.
Describing and Using Query Capabilities of Heterogeneous Sources Vasilis Vassalos& Yannis Papakonstantinou Presented by Srujan Kothapally.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Mr C Johnston ICT Teacher G042 – Lecture 02 Using Logical Operators To Aid Searching.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Data Integration Approaches
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 1 Overview of Databases and Transaction Processing.
Capability-Sensitive Query Processing on Internet Sources
Information Integration(cntd.)
Capability Based Mediation in TSIMMIS
A Shopping Agent for the WWW
Computing Full Disjunctions
Research Issues in Electronic Commerce
Using “Destiny” to find books
Research on Personal Dataspace Management
Chen Li Information and Computer Science
Research Article Title
Materializing Views With Minimal Size To Answer Queries
Presentation transcript:

CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

2 Heterogeneous Databases data DBMS 1 data DBMS 2 data legacy data web site Distributed Database System

3 Limited Capabilities

4 author: title: subject: format: price: must specify at least one of these this attribute not returned cannot query on this attribute menu of choices Example: Amazon.com

5 Example: BarnesAndNoble.com must specify at least one of these can query if one of other attributes specified Menu of choices author: title: subject: format: price:

6 Why Limited Capabilities? Search forms Security Indexes Legacy

7 Capability vs. Content Capability description –Can only search for subject = “art,” “history,” “science” Content description –Source only contains subject = “art,” “history,” “science”

8 Describing source capabilities Extending source capabilities How mediators cope with limited capabilities Mediator capabilities Other topics Outline Mediator Source Wrapper

9 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S

10 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S With output restriction f’ u’ b’ c’[S] o’[S]

11 Example Relation R(X, Y, Z) Description Templates: bu’f, uf’c[z 1, z 2 ] Answerable queries: R(x 1, Y, Z), R(X, Y, z 1 ) Unanswerable queries: R(X, y 1, Z), R(X, Y, z 3 )

12 Other Description Mechanisms Tsimmis –Query templates Information Manifold –capability records (# bound attrs, conditions ok,...) Disco Garlic –black box Context-free grammars

13 Extending Source Capabilities amazon Wrapper Query: author=“Freud” AND price > 10 Source: R(author, price,...) Template: b, u,...

14 Extending Source Capabilities Source: R(author, price,...) Template: b, u,... Query: author=“Freud” AND price > 10 Source Query: author=“Freud” Wrapper Filter: price > 10 amazon Wrapper

15 Another Example Barnes&Noble Wrapper Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author

16 Another Example Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author Q1: author = “Freud” AND price < 10 Q2: author = “Jung” AND price < 10 Union Operation Barnes&Noble Wrapper

17 Extending Source Capabilities General scheme: –try many query rewritings –check if query fragments supported by source –check if wrapper can combine answer fragments –do all this very efficiently!! –H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources, ICDE 1999 Tsimmis, Info Manifold: no disjunctive queries DISCO: no query splitting Garlic: only CNF queries

18 Mediator Processing R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper

19 Plan 1 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (1) R(5, Y, Z) (2) T(Z, W, 3) (3) Join answers

20 Plan 2 R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) Mediator Source Wrapper (3) Join answers (1) P = T(Z, W, 3) (2) for each (z,w,u)  P: R(5, Y, u)

21 Mediator Plan Generation Need feasible and efficient plan Search space is huge Tsimmis, Info Manifold, Garlic: – exponential algorithms Polynomial algorithms: –often find optimal or near-optimal plan –bounded performance –R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999

22 Conclusion Not all sources are created equal! Need to –describe what sources can do –efficiently process queries with limited sources –describe what mediators can do –exploit content information –deal with unavailable sources

23 References Computing Capabilities of Mediators –Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey D. Ullman –SIGMOD Conference 1999 Describing and Using Query Capabilities of Heterogeneous Sources –Vasilis Vassalos, Yannis Papakonstantinou –VLDB 1997