1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He
2 Reference Data Integration through database federation, L.M. Haas, E.T.Lin, M.A. Roth Towards Heterogeneous Multimedia Information Systems: The Garlic Approach, IBM Almaden Research Center
3 Outline Approaches to data integration Database Federation in IBM DB2 IBM Garlic Project
4 Various Approaches to Data Integration (1) Application-specific solutions Always works Expensive, fragile and hard to extend Application-integration frameworks Protection from changes of data source Do not address data integration issues Workflow frameworks Limited support for comparing and manipulating
5 Various Approaches to Data Integration (2) Digital libraries Meta search engine No combination of data Data warehousing Powerful, high-level query language May not be possible or cost effective, loss of functionality Database federation Virtual data warehouse Performance tradeoff (query rewrite & cost-based optimization)
6 Database Federation Basics of Database Federation DB2 styles of database federation Determining the style of database federation to use
7 Basics of Database Federation What is ‘ database federation ’ (DF) Aka. ‘ mediation ’ An architecture in which middleware, consisting of a relational database management system, provides uniform access to a number of heterogeneous data sources
8 Common Mediation Architecture Data Source Wrapper Mediator Figure 1. Common Mediator Architecture
9 Goals of IBM DF Transparency Support heterogeneity A high degree of function Extensibility Openness Autonomy of individual data sources Query optimization
10 DB2 architecture for DF Figure 2. DB2 architecture for database Federation
11 DB2 Styles of federation Scalar UDFs: Federating function Table UDFs: Federating data Wrappers: Federating function and data Figure 3. Different styles of federation
12 Wrapper Architecture Multi-server integration Multi-dataset integration and multi- operation integration Optimization Transactional integration
13 Determining the style of DF to use Figure 4. Determine the style of federation to use
14 IBM Garlic Project Introduction Overview Architecture Repositories and Databases The Garlic Data Model Queries in Garlic Interface and Application Conclusion
15 Introduction Need Goal Object-Oriented Model
16 Garlic Overview C++ Application Query/Browser Query Services & Runtime System Metadata Repository Repository Wrapper Complex Object Repository Data Repository Figure 5. Garlic System Architecture
17 Garlic Overview Repositories Repository type Repository instance Repository manager Databases Global schema Wrapper schemas (local schemas)
18 Garlic Data Model (1) ODMG-93 object model Objects and values Inheritance Object identity Weak identity – unique, not necessarily immutable Legacy references Implementation-constrained reference
19 Garlic Data Model (2) Extensions Degree of support for alternative implementations of interfaces Type system flexibility - conformity Object-appropriate view definition facility Object-Centered Views Enhance objects by adding or hiding some of their attributes/methods.
20 Queries in Garlic Query language Object-oriented extension of SQL Integrating approximate match query semantics with traditional exact match query semantics. Query Processing Decomposition Interesting Question How to characterize the query power of a repository, in terms of the language subset that its wrapper is capable of processing directly
21 Interfaces and Applications C++ API Compiled applications Dynamic applications Query/Browser A dynamic application Moving back and forth between querying and browsing activities
22 Summary Database Federation A powerful tool for integrating data Future work to improve the ease of use Enhance the performance Garlic Project New research in many dimensions