Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

XML DOCUMENTS AND DATABASES
Technical BI Project Lifecycle
Management Information Systems, Sixth Edition
Transaction.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Integrated Taxonomic Information System Janet Gomon, Deputy Director, ITIS Smithsonian Institution Museum of Natural History The.
Lecture Microsoft Access and Relational Database Basics.
Toward Online Schema Evolution for Non-Stop Systems Amol Deshpande, University of Maryland Michael Hicks, University of Maryland.
Informatiesystemen in de Bouw 7M711 Joran Jessurun en Jos van Leeuwen Week 3.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Automatic Data Ramon Lawrence University of Manitoba
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
An Introduction to Database Management Systems R. Nakatsu.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
Mapping ERM to relational database
MIS 710 Module 0 Database fundamentals Arijit Sengupta.
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
CS370 Spring 2007 CS 370 Database Systems Lecture 2 Overview of Database Systems.
Systems analysis and design, 6th edition Dennis, wixom, and roth
1 Introduction An organization's survival relies on decisions made by management An organization's survival relies on decisions made by management To make.
Web-Enabled Decision Support Systems
Representing taxonomy MarBEF-IODE workshop Oostende, March 2007.
Management Information Systems By Effy Oz & Andy Jones
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Microsoft Access 2003 Define some key Access terminology: Field – A single characteristic or attribute of a person, place, object, event, or idea. Record.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
© 2007 by Prentice Hall 1 Introduction to databases.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Dimitrios Skoutas Alkis Simitsis
IBM Software Group ® Overview of SA and RSA Integration John Jessup June 1, 2012 Slides from Kevin Cornell December 2008 Have been reused in this presentation.
Organizing Data Revision: pages 8-10, 31 Chapter 3.
Module 3: Creating Maps. Overview Lesson 1: Creating a BizTalk Map Lesson 2: Configuring Basic Functoids Lesson 3: Configuring Advanced Functoids.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Mail Merge in WordProcessingML Article by Sheela E.N Sonata Software Limite GI1 10 張筱懿.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
Taxonomic Concept Transfer Schema Robert Kukla. Transfer Schema Taxonomic units of interest? Which details do we need to record? What relationships between.
INFORMATION MANAGEMENT Unit 2 SO 4 Explain the advantages of using a database approach compared to using traditional file processing; Advantages including.
Database. Data Base A database is a collection of related data, and the software used in databases to store, organize and retrieve the data is called.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Requirements of a Taxonomy Database Tcl-DB a Prototype.
XML and Database.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Session 1 Module 1: Introduction to Data Integrity
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Extending the biogeographical model Africamuseum 6 (7?) June 2013.
Chapter 2 Database Environment.
1 Chapter 2 Database Environment Pearson Education © 2009.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Geographic Information Systems GIS Data Databases.
CS422 Principles of Database Systems Course Overview
Microsoft Access 2003 Illustrated Complete
Database Management  .
Chapter 2 Database Environment Pearson Education © 2009.
MANAGING DATA RESOURCES
Physical Database Design
Database Systems Instructor Name: Lecture-3.
Metadata The metadata contains
Database System Concepts and Architecture
Understanding Core Database Concepts
Chapter 2 Database Environment Pearson Education © 2009.
overview today’s ideas relational databases
Geographic Information Systems
Presentation transcript:

Experience from Mapping Existing Models to the Transfer Schema Robert Kukla

Introduction Three test databases: –ITIS (plants part) –Berlin Model (mosses/higher plants) –Taxonomer (fishes) Imported into mySQL Java program to generate XML Three main aspects: –Identifying concepts –Extracting relationships –Concept details No CharacterCircumscription, SpecimenCircumscription No hybrids as implications are not fully understood

ITIS Integrated Taxonomic Information System “authoritative” taxonomic information Continuously evolving: –New records get added –Existing records get updated (!) taxonomic units (97741 plants) concepts Most explored DB

ITIS - Identifying Concepts ITIS’ own concepts (type = revision) –taxonomic unit –usage = “accepted” Synonyms (type = referenced) –usage = “not accepted” –referenced from synonym table Vernaculars (type = vernacular) –from vernacular table

ITIS: Extracting Relationships Concept Circumscription –parent_tsn field Synonymy Relationships –Explicit synonyms –Vernaculars Lineage Relationships –to concept of same name according to different publication

ITIS – concept details Names: –up to 4 epithets (only 3 used) plus 4 category indicators to be interpreted depending on rank –authorTeam from separate table –NameSimple calculated Publications: –Multiple publication per taxon_unit –Not completely atomised - compromise

Berlin Model - Mosses/(German Higher Plants) Database of Taxonomic Concepts –Records will not change –Explicit concept relationships + (name-) synonymy –24368 concepts – concepts

Berlin Model - Identifying Concepts From table pTaxon

Taxonomer Relational data model for managing information relevant to taxonomic research Records get added; not changed “Assertion” – mention of a taxonomic name in the taxonomic literature “Protonym” – taxonomic name in the context of its first publication Relationships between assertions assertions – concepts

Taxonomer - Identifying Concepts Concepts (type=referenced) –from table tbl_Assertions –ReliabilityID >= 4 (4-revision, 5 original/new combination)

Taxonomer – extracting relationships ConceptCircumscription –ParentAssertionID Relationships –Table not populated

Taxonomer – concept details Number of fields in the database suggested a complexity that was not supported by the data (not all fields filled) Atomised name difficult to recreate as only terminal epithet is stored – omitted it Use of cheat fields for NameSimple Large number of AccordingTo (>4000) Publication data transferred 1:1

Technical Aspects Database consistency e.g. –getting all publication records –no relationships to non-existant concepts Charset –assume windows-1252 code page Slow! –indexes essential –fewer queries with big result sets faster Recursive approach is more suitable for wrapper –guarantees small, consistent subset

Mapping software Universal transformation software to convert relational data to XML (XMlizer) –Often GUI based; filling in a skeleton XML file –Relate a single query (table or join) to collection of XML nodes –Map fields from that query to attributes or child elements of the XML node Problems –No mechanism to use multiple sources (queries) for one –No conditional transformation –No splitting of fields –Limited merging of fields Write our own universal mapping software –addresses first 2 problems

Conclusion Conversion of legacy data is possible but –information missing –information will be lost Data in original DB is open to interpretation so expert should be consulted Required computing resources should not be underestimated