The information integration wizard (Iwiz) project Report on work in progress Joachim Hammer Presented by Muhammed Al-Muhammed
introduction - people use internet to find information of interest. * that is easy if all information available in the same place. - But this not the case nowadays! - The information of interest could be located in multiple sources. So what we can do? - if all sources use the same tools and data modeling to create and manage their data the problem of finding information of interest is no longer problem!
-but what if these sources use different tools, hardware, software platforms to manage their data (heterogeneity at a peak). What possible problems? Some could be: - schematic problems. - semantics problems. -So what can we do? Obvious solution is to use a tool the can overcome heterogeneity problems and decentralization of information sources. This the reason why the data integration is important.
-So what benefits users get out of data integration tools? The greatest benefits are : 1.the user does not have to worry about what sources are available; 2. where they are located; 3. how the data is represented in each source; 4.and how each data source is queried;
Goals of the project help users get information from heterogeneous sources. How they achieve this goal? Build integration system using hybrid data “warehousing / Mediators” approach. Warehousing stores frequently accessed data. Mediator supports on-demand queries if the data is not available in the warehouse.
What issues must be investigated in order to achieve these goals? 1.Common data model and representation, I.e. what data model can be used to represent the information in the integrated system?. They chose XML for their system. Because it has some nice features such as clear separation of the data and schema.
2. Defining global schema to provide a representation of relevant data tailored to the user’s needs. 3. Semantic heterogeneities (huge problem) what hurdles caused by heterogeneity: - understanding the meaning of the source data - relating it to the global schema. - translate values from source to target context - merging related data
heterogeneity faced at 3 levels: - System level :Hardware, operating system. - Data management :difference in the data models, access commands.. - Semantic level :the difference in the way related or similar data is represented in different sources.
How the three levels of heterogeneity can be overcome? The first two are overcome by translators and adapters. The third one is the serious one! The following diagram gives some idea about the kind of heterogeneities.
So what we can do to deal with heterogeneity problem? * To overcome heterogeneity, mapping needed. -Mapping can be done by two steps 1- schema restructuring; eliminate syntax and semantic inconsistencies between the source schema and global schema. 2- schema merging; removal duplications, removal of inconsistent data..
4- Knowledge representation ; a common metadata knowledgebase to reason about the meaning of and relationships among concepts. “To deal with the issues, they proposed a system called Iwiz.” - Iwiz architecture
Transform from XML source target schema XML
Restructuring and Merging The goals of this are: 1- generate rules for converting data from its native source global schema. 2- populate the global target schema with data. How the data restructured and merged?
Data restructuring and merging
How the data accessed- queries we distinguish 2 versions: version1:
Version 2 – not built yet!