Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.

Similar presentations


Presentation on theme: "Experience from Mapping Existing Models to the Transfer Schema Robert Kukla."— Presentation transcript:

1 Experience from Mapping Existing Models to the Transfer Schema Robert Kukla

2 Introduction Three test databases: –ITIS (plants part) –Berlin Model (mosses/higher plants) –Taxonomer (fishes) Imported into mySQL Java program to generate XML Three main aspects: –Identifying concepts –Extracting relationships –Concept details No CharacterCircumscription, SpecimenCircumscription No hybrids as implications are not fully understood

3 ITIS Integrated Taxonomic Information System “authoritative” taxonomic information Continuously evolving: –New records get added –Existing records get updated (!) 331886 taxonomic units (97741 plants) - 206649 concepts Most explored DB

4 ITIS - Identifying Concepts ITIS’ own concepts (type = revision) –taxonomic unit –usage = “accepted” Synonyms (type = referenced) –usage = “not accepted” –referenced from synonym table Vernaculars (type = vernacular) –from vernacular table

5 ITIS: Extracting Relationships Concept Circumscription –parent_tsn field Synonymy Relationships –Explicit synonyms –Vernaculars Lineage Relationships –to concept of same name according to different publication

6 ITIS – concept details Names: –up to 4 epithets (only 3 used) plus 4 category indicators to be interpreted depending on rank –authorTeam from separate table –NameSimple calculated Publications: –Multiple publication per taxon_unit –Not completely atomised - compromise

7 Berlin Model - Mosses/(German Higher Plants) Database of Taxonomic Concepts –Records will not change –Explicit concept relationships + (name-) synonymy –24368 concepts – 24368 concepts

8 Berlin Model - Identifying Concepts From table pTaxon

9 Taxonomer Relational data model for managing information relevant to taxonomic research Records get added; not changed “Assertion” – mention of a taxonomic name in the taxonomic literature “Protonym” – taxonomic name in the context of its first publication Relationships between assertions 36305 assertions – 14971 concepts

10 Taxonomer - Identifying Concepts Concepts (type=referenced) –from table tbl_Assertions –ReliabilityID >= 4 (4-revision, 5 original/new combination)

11 Taxonomer – extracting relationships ConceptCircumscription –ParentAssertionID Relationships –Table not populated

12 Taxonomer – concept details Number of fields in the database suggested a complexity that was not supported by the data (not all fields filled) Atomised name difficult to recreate as only terminal epithet is stored – omitted it Use of cheat fields for NameSimple Large number of AccordingTo (>4000) Publication data transferred 1:1

13 Technical Aspects Database consistency e.g. –getting all publication records –no relationships to non-existant concepts Charset –assume windows-1252 code page Slow! –indexes essential –fewer queries with big result sets faster Recursive approach is more suitable for wrapper –guarantees small, consistent subset

14 Mapping software Universal transformation software to convert relational data to XML (XMlizer) –Often GUI based; filling in a skeleton XML file –Relate a single query (table or join) to collection of XML nodes –Map fields from that query to attributes or child elements of the XML node Problems –No mechanism to use multiple sources (queries) for one –No conditional transformation –No splitting of fields –Limited merging of fields Write our own universal mapping software –addresses first 2 problems

15 Conclusion Conversion of legacy data is possible but –information missing –information will be lost Data in original DB is open to interpretation so expert should be consulted Required computing resources should not be underestimated


Download ppt "Experience from Mapping Existing Models to the Transfer Schema Robert Kukla."

Similar presentations


Ads by Google