Databases: Some Research Opportunities For Latin America Marcelo Arenas Pontificia Universidad Católica de Chile Marcelo Arenas Pontificia Universidad Católica de Chile
Goal Of This Talk Present an interesting area of research in databases Has been identified as an important area Has enough open problems for many research projects Needs theoretical and practical research Has been the subject of some research projects in Latin America Present an interesting area of research in databases Has been identified as an important area Has enough open problems for many research projects Needs theoretical and practical research Has been the subject of some research projects in Latin America
The Problem Of Sharing Data Main challenges: Data may reside at several different sites Data may be stored in several different ways Schema level: name, employee_name, emp_name, … Format level: Relational databases, XML, plain text, … Main challenges: Data may reside at several different sites Data may be stored in several different ways Schema level: name, employee_name, emp_name, … Format level: Relational databases, XML, plain text, …
The Problem Of Sharing Data name Peter John John Phil emp_name Peter Ron Global database
Data Exchange Transform data structured under a source schema into data structured under a target schema ST ∑ ST
Data Exchange Emp(name) Emp(X) Worker(X) Worker(name) Empname Peter John Workername Peter John Workername Ron
Data Exchange: Main Challenges Emp(name)Worker(name) What is a good rule language? Emp(name) Worker(name, salary) Emp(name, phone) Worker(name, salary) Emp(X) Worker(X)
Data Exchange: Main Challenges Emp(X) Y Worker(X,Y) Rule language: precise semantics and good expressive power Emp(name) Worker(name, salary) Empname Peter John How can we translate the source data? Can we do this efficiently? E
Data Exchange: Main Challenges Emp(name) Worker(name, salary) Empname Peter John Workernamesalary Peter100K John120K Workernamesalary PeterNULL JohnNULL Workernamesalary PeterNULL1 JohnNULL2 What is a good translation? Emp(X) Y Worker(X,Y) E
Data Exchange: Main Challenges Emp(name) Worker(name, salary) Empname Peter John Workernamesalary PeterNULL1 JohnNULL2 Does Peter have a salary? What is the salary of Peter? How do we answer target queries? Emp(X) Y Worker(X,Y) E
Data Exchange: Relational Databases Data exchange has been extensively studied in the relational world IBM Almaden, UCSC and UofT It has also been implemented: Clio (DB2) Semantics of data exchange has been precisely defined Efficient algorithms for translating source data and answering target queries have been developed Data exchange has been extensively studied in the relational world IBM Almaden, UCSC and UofT It has also been implemented: Clio (DB2) Semantics of data exchange has been precisely defined Efficient algorithms for translating source data and answering target queries have been developed
Ongoing Work XML data exchange Metadata management XML data exchange Metadata management
XML Data Exchange Transform XML data structured under a source schema into data structured under a target schema. ST ∑ ST What is the difference? XML document: Data is semi-structured XML schema: Powerful schema language XML query language: Navigational capabilities
XML Document: Example <company><employee> Peter Buneman Peter Buneman </employee><employee><name> Ron Ron Fagin Fagin </name></employee></company>
Data Exchange: Relational And XML Relational schema XML schema Empname Peter John Peter John XML schema We can do the same for other data formats!
XML Data Exchange: Our Contribution Ongoing project: U. Edinburgh, UofT and PUC Chile Results: Fundamental problems of XML data exchange has been solved Ongoing project: U. Edinburgh, UofT and PUC Chile Results: Fundamental problems of XML data exchange has been solved
XML Data Exchange: Our Contribution XML schema Semantics of XML data exchange has been precisely defined Rule language: precise semantics and good expressive power Efficient algorithms for translating source data have been developed source data have been developed Efficient algorithms for answering target queries have also been developed
What Else Has To Be Done?
Ongoing Work XML data exchange Metadata management XML data exchange Metadata management
Metadata Management Process of creating schema mappings is time-consuming We need tools to manage schema mappings automatically Process of creating schema mappings is time-consuming We need tools to manage schema mappings automatically
Metadata Management: Composition ST ∑ ST U ∑ TU ∑ SU Composition: ∑ SU = ∑ ST o ∑ TU
Metadata Management: Inverse ST ∑ ST U ∑ UT ∑ SU Composition: ∑ SU = ∑ ST o ∑ TU ∑ TU Inverse: ∑ TU = (∑ UT ) -1 ∑ SU = ∑ ST o (∑ UT ) -1
Metadata Management: More Operators ST ∑ ST U ∑ TU W ∑ SW ∑ WU What do we do in this case?
Metadata Management For Data Exchange Systems General metadata management framework was proposed by Bernstein Based on generic schema-mapping operators: Composition, Inverse,... Has been studied for the case of relational databases Microsoft, IBM Almaden and UCSC Composition operator has been extensively studied General metadata management framework was proposed by Bernstein Based on generic schema-mapping operators: Composition, Inverse,... Has been studied for the case of relational databases Microsoft, IBM Almaden and UCSC Composition operator has been extensively studied
Metadata Management For Data Exchange Systems: Our (proposed) Contribution Starting project: IBM Almaden and PUC Chile Two main components: Continue the study of the relational metadata operators Extend the framework to XML data exchange systems Starting project: IBM Almaden and PUC Chile Two main components: Continue the study of the relational metadata operators Extend the framework to XML data exchange systems
Thank You!
Another Interesting Area: RDF What is RDF? A framework for representing information in the Web (W3C) Graph data model What is RDF? A framework for representing information in the Web (W3C) Graph data model
RDF: Example Employee John Peter Person Microsoft Company rdf:sc rdf:type rdf:type rdf:type works_in
RDF: Possible Applications Web metadata Automatization of information processing on the Web by Agents Web metadata Automatization of information processing on the Web by Agents
RDF Databases: Motivation Large volumes of RDF data Use of RDF data in ways unpredicted when first designed Need to design reliable tools to manage RDF data Large volumes of RDF data Use of RDF data in ways unpredicted when first designed Need to design reliable tools to manage RDF data
RDF Databases: Motivation “Perhaps most interesting is the research opportunities suggested by the term “semantic Web.” While it may be unclear what the concept truly entails, much of the recent work has centered on “ontologies.” [...] The database community should be looking for opportunities to exploit these developments in future database management systems.” The Lowell Database Research Self- Assessment Meeting, May 2003 “Perhaps most interesting is the research opportunities suggested by the term “semantic Web.” While it may be unclear what the concept truly entails, much of the recent work has centered on “ontologies.” [...] The database community should be looking for opportunities to exploit these developments in future database management systems.” The Lowell Database Research Self- Assessment Meeting, May 2003
RDF Databases: Our Contribution Foundations of RDF databases: U. Chile, CWR and UofT Querying RDF databases: U. Chile, CWR, U. Talca and PUC Chile Foundations of RDF databases: U. Chile, CWR and UofT Querying RDF databases: U. Chile, CWR, U. Talca and PUC Chile
Querying RDF Databases Employee John Peter Person Microsoft Company rdf:sc rdf:type rdf:type rdf:type works_in rdf:type
Querying RDF Databases: Our Contribution SPARQL: A query language for RDF Graph-matching query language W3C Candidate Recommendation 6 April 2006 SPARQL: A query language for RDF Graph-matching query language W3C Candidate Recommendation 6 April 2006
SPARQL: Example Employee John Peter Person Microsoft Company rdf:sc rdf:type rdf:type rdf:type works_in ?X :- (?X, works_in, Microsoft) ?X, ?Y :- (?X, rdf:type, Employee) OPTIONAL (?X, , ?Y) ?X Peter ?X?Y Peter
SPARQL: Our Contribution We consider a fragment of SPARQL which encompasses all the main issues yet is simple to formalize. We provide a formal semantics for this fragment. We study the complexity of evaluating queries. Provide complexity bounds. We propose some optimizations techniques. We consider a fragment of SPARQL which encompasses all the main issues yet is simple to formalize. We provide a formal semantics for this fragment. We study the complexity of evaluating queries. Provide complexity bounds. We propose some optimizations techniques.
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.