Presentation is loading. Please wait.

Presentation is loading. Please wait.

Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: 110022478 Award: MSc (Computer & Information.

Similar presentations


Presentation on theme: "Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: 110022478 Award: MSc (Computer & Information."— Presentation transcript:

1 Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: 110022478 Award: MSc (Computer & Information Science) Date: 17 th September 2010 Supervisor: Dr. Jixue Liu

2 Field of thesis Schema matching Relations database integration

3 INTRODUCTION What is a database schema? ▫ Structure of a database that describes how its concepts, their relationships and constraints are arranged What is Schema matching? ▫ process of identifying semantic correspondences between elements of database schemas

4 INTRODUCTION What is Schema matching?

5 Schema matching applications ▫ Critical task in any data sharing process ▫ Data warehousing  Consolidation of multiple transaction processing databases ▫ database integration processes  Eg: two companies merge, integrate employee, inventory, financial databases ▫ Cooperation between government agencies and various institutions.  Eg. Police/transport dept, Immigration and universities

6 Importance of the research Currently done manually and semi automatically Doing manually: tedious, error-prone, costly No fully automatic system available  require user interaction semantic query processing, mobile web, ecommerce collaboration in enterprises Demand for more scalable, accurate, efficient schema matching technology increasing

7 Research objectives Propose a framework that ▫ adopts a scalable architecture ▫ Offers a library of schema matching algorithms that exploit various information for better accuracy ▫ is independent of any specific application domain

8 Methodology Build a framework by adopting a composite architecture Create a library of matchers at different levels Build a prototype and perform empirical evaluation on it to test accuracy, scalability and efficiency

9 Schema Matching Architecture Input ▫ Represented in SQL DDL format ….. CREATE TABLE StudentDB.Student( studentId INT, studentName VARCHAR(100), studentPhone VARCHAR(50) PRIMARY KEY (studentId) ); …..

10 Schema Matching Architecture Input ▫ Currently supports versions after Oracle9 and SQL Server 2000  Uses a data type conversion table if different DBMS ▫ Input processor extracts schema information  Eg: element names, data types, keys

11 Schema Matching Architecture Process (schema matching) ▫ Implements multiple matching algorithms (matchers) Schema level ▫ Element names similarity algorithms  Prefix, Suffix, n-gram  Tech = Technology (prefix matching)  Phone = telephone (suffix matching)  Context  Con, ont, nte, tex, ext (ngram) ▫ Structural similarities  Data type, Field length etc.

12 Schema Matching Architecture Instance Level ▫ Statistical data  Statistical data obtained: eg. Range, % alphanumeric characters, statistical properties (eg: mean, std.dev), distinct values etc. ▫ Discovering complex correspondences  Mining actual values  Match different data types (gender : M,F = 1,2)  Ambiguity issues: Jaguar (car or animal)?

13 Schema Matching Architecture Output ▫ Similarity score between attributes obtained in each matching algorithm  all scores normalized between 0 to 1 ▫ Match results in similarity cube  Attribute level, table level, schema level similarities can be generated

14 Methodology Schema matching prototype in C#.NET

15 Experimental Evaluation Accuracy ▫ Tested on 2 small schemas of 10 tables each with 2-10 attributes ▫ Checked results against manually derived result ▫ Accuracy degrades as schema size increases ▫ 55-60% true matching ▫ Tested on a schema with 140 tables and 1360 attributes 20-40% true matching

16 Experimental Evaluation Efficiency Drastic fall in efficiency as schema size increases

17 Conclusion A basic framework for schema matching is proposed Matching functions performed independently for higher scalability so that additional algorithms can be integrated easily Needs improvement in efficiency by deploying hybrid matching algorithms Requires various different algorithms to assess similarities from different views and increase accuracy

18 END


Download ppt "Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: 110022478 Award: MSc (Computer & Information."

Similar presentations


Ads by Google