Download presentation
Presentation is loading. Please wait.
Published byErick Arnold Modified over 9 years ago
1
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: 110022478 Award: MSc (Computer & Information Science) Date: 17 th September 2010 Supervisor: Dr. Jixue Liu
2
Field of thesis Schema matching Relations database integration
3
INTRODUCTION What is a database schema? ▫ Structure of a database that describes how its concepts, their relationships and constraints are arranged What is Schema matching? ▫ process of identifying semantic correspondences between elements of database schemas
4
INTRODUCTION What is Schema matching?
5
Schema matching applications ▫ Critical task in any data sharing process ▫ Data warehousing Consolidation of multiple transaction processing databases ▫ database integration processes Eg: two companies merge, integrate employee, inventory, financial databases ▫ Cooperation between government agencies and various institutions. Eg. Police/transport dept, Immigration and universities
6
Importance of the research Currently done manually and semi automatically Doing manually: tedious, error-prone, costly No fully automatic system available require user interaction semantic query processing, mobile web, ecommerce collaboration in enterprises Demand for more scalable, accurate, efficient schema matching technology increasing
7
Research objectives Propose a framework that ▫ adopts a scalable architecture ▫ Offers a library of schema matching algorithms that exploit various information for better accuracy ▫ is independent of any specific application domain
8
Methodology Build a framework by adopting a composite architecture Create a library of matchers at different levels Build a prototype and perform empirical evaluation on it to test accuracy, scalability and efficiency
9
Schema Matching Architecture Input ▫ Represented in SQL DDL format ….. CREATE TABLE StudentDB.Student( studentId INT, studentName VARCHAR(100), studentPhone VARCHAR(50) PRIMARY KEY (studentId) ); …..
10
Schema Matching Architecture Input ▫ Currently supports versions after Oracle9 and SQL Server 2000 Uses a data type conversion table if different DBMS ▫ Input processor extracts schema information Eg: element names, data types, keys
11
Schema Matching Architecture Process (schema matching) ▫ Implements multiple matching algorithms (matchers) Schema level ▫ Element names similarity algorithms Prefix, Suffix, n-gram Tech = Technology (prefix matching) Phone = telephone (suffix matching) Context Con, ont, nte, tex, ext (ngram) ▫ Structural similarities Data type, Field length etc.
12
Schema Matching Architecture Instance Level ▫ Statistical data Statistical data obtained: eg. Range, % alphanumeric characters, statistical properties (eg: mean, std.dev), distinct values etc. ▫ Discovering complex correspondences Mining actual values Match different data types (gender : M,F = 1,2) Ambiguity issues: Jaguar (car or animal)?
13
Schema Matching Architecture Output ▫ Similarity score between attributes obtained in each matching algorithm all scores normalized between 0 to 1 ▫ Match results in similarity cube Attribute level, table level, schema level similarities can be generated
14
Methodology Schema matching prototype in C#.NET
15
Experimental Evaluation Accuracy ▫ Tested on 2 small schemas of 10 tables each with 2-10 attributes ▫ Checked results against manually derived result ▫ Accuracy degrades as schema size increases ▫ 55-60% true matching ▫ Tested on a schema with 140 tables and 1360 attributes 20-40% true matching
16
Experimental Evaluation Efficiency Drastic fall in efficiency as schema size increases
17
Conclusion A basic framework for schema matching is proposed Matching functions performed independently for higher scalability so that additional algorithms can be integrated easily Needs improvement in efficiency by deploying hybrid matching algorithms Requires various different algorithms to assess similarities from different views and increase accuracy
18
END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.