Download presentation
Presentation is loading. Please wait.
Published bySuzanna Ray Modified over 6 years ago
1
Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona
A Knowledge Discovery Approach for Semantic Interoperability among Multiple Heterogeneous Data Sources Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona
2
Need for Integration
3
A Framework for Integration
Schema Translation Describing individual data sources in a common data model, e.g, USM. Interschema Relationship Identification Identifying related data objects from multiple data sources. Integrated Schema Generation Generating an integrated schema of the data sources. Schema Mapping Generation Mapping the integrated schema to local schemas. Most critical and time- consuming. Intensive human interaction
4
Understanding Correspondences
Two weeks later, “That depends, you know.” Domain Experts Letter, phone or fax: “Does their mission start time mean the same as your mission take off time?” The next day, “I maintain the database. But how to interpret the data is up to the domain experts. " Integrator Local DBA Volume: Hundreds of tables, thousands of attributes. MITRE has spent several years, largely on human interaction, to integrate several database systems of the U.S. Air Force.
5
A Knowledge Discovery Approach
Pre-processing Data Mining Post-processing
6
Data Mining Technique Self-Organizing Map
Clustering data objects by "semantic" similarity. Visualizing the semantic clusters on a "Map". Suggesting similar or related objects. Combing all available information Entity name, attribute name, relationship name Schematic information (keys, data types, etc.) Data contents (domains, mean, stddev, %nulls, etc.) Documentation (information retrieval techniques) Usage data (frequency of usage, #users, etc.) Other available information (e.g., business rules)
7
A Motivating Example Legend
8
A Semantic Map of Attributes
Response nodes are labeled with attribute numbers. Gray levels indicate similarities between clusters. The darker a node, the less similar it is to its neighbors. There is almost no boundary between attribute 4 (DB1.courses.name) and attribute 38 (DB2.course. name), indicating that they are very similar. But they are quite different from attribute 17 (DB1. professors. name) and attribute 32 (DB2.faculty. name).
9
Future Work Automate the extraction of semantic information from different types of data sources. Integrate the Interschema Relationship Identification procedure into a complete integration system, which provides the user an interaction interface and guides the user through a semi-automatic integration process. Validate the utility of this approach in real-world data warehousing and web information discovery application.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.