CS 245Notes 141 CS 245: Database System Principles Notes 14: Coping with Limited Capabilities of Sources Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
H. Lundbeck A/S16-Jan-141 STDM generation in Lundbeck - From Source to Target.
Advertisements

Technical Program Integrity Initiative Susan Hutton Conference Content Manager Meetings & Conferences POCO 18 July 2009.
Learning Station #1 Portals IEEE Educational Activities Board (EAB) Meeting San Juan, Puerto Rico 14 February 2009 Yvonne Pelham Manager, Educational Outreach.
Ben Mack-Crane 9-Feb-141. bp-sajassi-cfm-0711-v01.pdf Flow-Level CFM: CFM functions performed on the user flows. Network-Level CFM: CFM functions performed.
Single Sales PDF Channel Update Karen McCabe Staff Director, Strategic Marketing & Product Development IEEE 802 Plenary; 802 EC 12 July 2010.
Save lives and save (or restore) livelihoods…..
Career & Workforce Policy Committee (CWPC) 2009 Overview Chair – Henry J. Lindborg Vice Chair – Ed Perkins (Policy) Vice Chair – Tarek Lahdhiri (Outreach)
Slide title In CAPITALS 50 pt Slide subtitle 32 pt Towards a Knowledge Society – The Nordic Experience The Convergence on 3G – Partnership of Industry,
History 9 We now have articles ( ) with the concepts public health AND malaria. Click on the link to return to History.
SJS SDI_141 Design of Statistical Investigations Stephen Senn 14 Case Control Studies.
Lecture 141 Macroeconomic Analysis 2003 Fiscal Policy 1: Tax and Spending Multipliers Refer: Public Finance excel file from the hm-treasury.co.uk.
Session 141 Comparative Emergency Management Session 14 Slide Deck.
Generating test cases specifications for BPEL compositions of web services using SPIN José García-Fanjul, Javier Tuya, and Claudio de la Riva Pointner.
Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering Müesteraner GI-Tage 03 GIS COTS.
Efficient Top-k Search across Heterogeneous XML Data Sources Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Rui Zhou 1 1 Swinburne University of Technology.
MSc IT UFCE8K-15-M Data Management Prakash Chatterjee Room 2Q18
MSc IT UFCE8K-15-M Data Management Prakash Chatterjee Room 2Q18
MSc IT UFCE8K-15-M Data Management Prakash Chatterjee Room 2Q18
Copyright © 2008 Cengage Learning Understanding Generalist Practice, 5e, Kirst-Ashman/Hull 137.
Neighbourhood Areas Presentation by Chris Hern 12-Oct-141Prepared by B. Sutkiene.
BPS - 5th Ed. Chapter 141 Introduction to Inference.
CS598CXZ Panel – Next Generation Search Engines Shui-Lung Chuang April 21, 2005.
Virtual Tamper Resistance for a TEE Francisco Corella Karen Lewison 9/30/141 Presentation to the GlobalPlatform.
High Performance Computer Architecture Lesson 60: Introduction to FPGAs All copyrighted figures are copyright of respective authors. Figures may be reproduced.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Logical Plan Rewriting.
CS 540 Database Management Systems
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Efficient Query Evaluation on Probabilistic Databases
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
CS 245Notes 121 CS 245: Database System Principles Notes 12: Distributed Databases Hector Garcia-Molina.
DEiXTo.
Chapter 1 Overview of Databases and Transaction Processing.
Web of Science. Copyright 2006 Thomson Corporation 2 Example: (bird* or avian) and (flu or influenz*) Enter your terms to be searched. Search fields are.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Multimedia Databases (MMDB)
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Access 2002 Lab 3 Analyzing Tables and Creating.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
 Enhancing User Experience  Why it is important?  Discussing user experience one-by-one.
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 1 Overview of Databases and Transaction Processing.
Capability-Sensitive Query Processing on Internet Sources
Information Integration(cntd.)
CS 245: Database System Principles
CS 245: Database System Principles Distributed Databases
Capability Based Mediation in TSIMMIS
Computing Full Disjunctions
Research Issues in Electronic Commerce
CS 245: Database System Principles Notes 13: Distributed Databases
Presentation transcript:

CS 245Notes 141 CS 245: Database System Principles Notes 14: Coping with Limited Capabilities of Sources Hector Garcia-Molina

CS 245Notes 142 Heterogeneous Databases data DBMS 1 data DBMS 2 data legacy data web site Distributed Database System

CS 245Notes 143 Limited Capabilities

CS 245Notes 144 Example: Amazon.com author: title: subject: format: price: must specify at least one of these this attribute not returned cannot query on this attribute menu of choices

CS 245Notes 145 Example: BarnesAndNoble.com author: title: subject: format: price: must specify at least one of these can query if one of other attributes specified Menu of choices

CS 245Notes 146 Why Limited Capabilities? Search forms Security Indexes Legacy

CS 245Notes 147 Capability vs. Content Capability description –Can only search for subject = “art,” “history,” “science” Content description –Source only contains subject = “art,” “history,” “science”

CS 245Notes 148 Outline Describing source capabilities Extending source capabilities How mediators cope with limited capabilities Mediator capabilities Other topics source mediator

CS 245Notes 149 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S

CS 245Notes 1410 Describing Query Capabilities R(X, Y,... Z) Adornments: f: may or may not specify u: cannot be specified b: must be specified c[S]: specified from list S o[S]: optional, chose from S With output restriction f’ u’ b’ c’[S] o’[S]

CS 245Notes 1411 Example Relation R(X, Y, Z) Description Templates: bu’f, uf’c[z 1, z 2 ] Answerable queries: R(x 1, Y, Z), R(X, Y, z 1 ) Unanswerable queries: R(X, y 1, Z), R(X, Y, z 3 )

CS 245Notes 1412 Other Description Mechanisms Tsimmis –query templates Information Manifold –capability records (# bound attrs, conditions ok,...) Disco Garlic –black box Contex-free grammars

CS 245Notes 1413 Extending Source Capabilities amazon wrapper Source: R(author, price,...) Template: b, u,... Query: author=“Freud” AND price > 10

CS 245Notes 1414 Extending Source Capabilities amazon wrapper Source: R(author, price,...) Template: b, u,... Query: author=“Freud” AND price > 10 Source Query: author=“Freud” Wrapper Filter: price > 10

CS 245Notes 1415 Another Example Barnes&Noble wrapper Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price,...) No disjunctive conditions; Price can only be specified with author

CS 245Notes 1416 Another Example Barnes&Noble wrapper Query: (author = “Freud” OR author = “Jung”) AND price < 10 R(author, price, …) No disjunctive conditions; Price can only be specified with author Q1: author = “Freud” AND price < 10 Q2: author = “Jung” AND price < 10 Union Operation

CS 245Notes 1417 Extending Source Capabilities General scheme: –try many query rewritings –check if query fragments supported by source –check if wrapper can combine answer fragments –do all this very efficiently!! [See ICDE99 paper] Tsimmis, Info Manifold: no disjunctive queries DISCO: no query splitting Garlic: only CNF queries

CS 245Notes 1418 Mediator Processing source mediator R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3)

CS 245Notes 1419 Plan 1 source mediator R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) (1) R(5, Y, Z) (2) T(Z, W, 3) (3) Join answers

CS 245Notes 1420 Plan 2 source mediator R(X, Y, Z) f, f, b T(Z, W, U) f, u, b M(X, Y, Z, W, U) = Join(R, T) Query: M(5, Y, Z, W, 3) (2) for each (z,w,u)  P: R(5, Y, u) (1) P = T(Z, W, 3) (3) Join answers

CS 245Notes 1421 Mediator Plan Generation Need feasible and efficient plan Search space is huge Tsimmis, Info Manifold, Garlic: – exponential algorithms Polynomial algorithms: –often find optimal or near-optimal plan –bounded performance –[See ICDT99 Paper]

CS 245Notes 1422 Conclusion Not all sources are created equal! Need to –describe what sources can do –efficiently process queries with limited sources –describe what mediators can do –exploit content information –deal with unavailable sources