The Prometheus Database Project bbsrc biotechnology and biological sciences research council.

Slides:



Advertisements
Similar presentations
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Advertisements

The Library of Life Federated Description Services and the Library of Life or What can we do with SDD anyway? Kevin Thiele Centre for Biological Information.
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Health IT Workforce Curriculum Version 1.0 Fall Networking and Health Information Exchange Unit 4e Basic Health Data Standards Component 9/Unit.
Database Systems: Design, Implementation, and Management Tenth Edition
Napier University School of Computing Capturing Botanical Descriptions for Taxonomy Prometheus I Taxonomic Database POET OODB Jessie Kennedy, Cédric Rageneaud.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Introduction to Databases
Alyxia Banks ex R.Br. Alyxia rubricaulis subsp poyaensis Boiteau Alyxia rubricaulis (Baill.) Guillaumin genus: species: subspecies:
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
File Systems and Databases
Fundamentals, Design, and Implementation, 9/e COS 346 Day 2.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
1 PSAMP Protocol Specifications IPFIX IETF-64 November 10th, 2005 Benoit Claise Juergen Quittek Andrew Johnson.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
XML Extensible Markup Language. Markup Languages u What does this number (100) mean? –Actually, it’s just a string of characters! –A markup language can.
Database Design - Lecture 2
Database System Concepts and Architecture
Economic Botany Mark Jackson Giardino d’Inverno 16:00-17:30TDWG 2013 Florence.
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 2/1 Copyright © 2004 Please……. No Food Or Drink in the class.
Concepts and Terminology Introduction to Database.
The Prometheus Database for Plant Taxonomy Cédric Raguenaud, Jessie Kennedy, Peter Barclay Napier University, Edinburgh
Approaches to Storing and Querying Structural Information in Botanical Specimen Descriptions. Trevor Paterson bbsrc biotechnology and biological sciences.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Phenote: new developments and new communities. A basic screen shot Entries are sorted by 'entity'
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
The european ITM Task Force data structure F. Imbeaux.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Taxonomic Concept Transfer Schema Robert Kukla. Transfer Schema Taxonomic units of interest? Which details do we need to record? What relationships between.
By Rashid Khan Lesson 6-Building a Directory Service.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Chapter 4 Automated Tools for Systems Development Modern Systems Analysis and Design Third Edition 4.1.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
DCMI Abstract Model Analysis Resource Model Jorge Morato– Information Ingeneering Universidad Carlos III de Madrid
Where now for the taxon transfer schema and related work: collaboration possibilities? Jessie Kennedy.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
Concept Relationship Editor: A visual interface to support the assertion of synonymy relationships between taxonomic classifications Paul Craig & Jessie.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Combined Metamodel for UCM Contributed by Anthony B. Coates, Londata 17 February, 2008.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Prometheus II: Capturing and Relating Character Concept Definitions in Plant Taxonomy The Biological Problem Concepts describe objects and people invariably.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Geographic Information Systems GIS Data Databases.
Development of the Amphibian Anatomical Ontology
UML to XSD.
MANAGING DATA RESOURCES
Database.
File Systems and Databases
Data Model.
Overview of Oracle Site Hub
The ultimate in data organization
Geographic Information Systems
Presentation transcript:

The Prometheus Database Project bbsrc biotechnology and biological sciences research council

Prometheus II: Capturing Botanical Descriptions for Taxonomy Oracle RDB A novel model for composing and recording taxonomic descriptions, using an ontology of defined terms. JK, PB, GR, CR, Trevor Paterson, MP, MW, Sarah McDonald, Kate Armstrong PII Visualisation and Data Entry Tools JK, MW, TP, Alan Cannon Prometheus I: A Taxonomic Database POET OODB A novel object-based representation of multiple overlapping taxonomic hierarchies – based on specimen circumscription. Jessie Kennedy, Cédric Raguenaud, Mark Watson, Martin Pullan, Mark Newman, Peter Barclay PI refactor Oracle RDB JK, Gordon Russell, Andrew Cumming PI Visualisation Tools JK, Martin Graham

Problems with Taxonomic Characters  No formal methodology or model for recognizing and describing Characters  Language (terminology) used in descriptions is ill-defined (natural language based)  A taxonomic revision is often the work of one individual and can be highly idiosyncratic – (lack formalism, personal terminology)  What IS a character? a state ? a property ? a property + state ? a property + potential states ?

 Only Characters of interest (to a given revision) are recorded, and raw data (the proforma) is often discarded  Subsequent taxonomists cannot unambiguously interpret and reuse data  Character data is not easily compared between projects - as definitions are not captured etc.  Previous work is often repeated (there is no culture of data reuse) Consequences

Taxonomy could be assisted by promoting and providing a methodology for better Data Integration  Using standardized, defined ‘terms’ to record character descriptions  Produce a standard conceptual model for the composition of character descriptions  Encourage the scoring of ‘quantitative characters’ (discourage ‘qualitative characters’)  Store description data in electronic/database form according to an agreed global schema A standard data model and terminology will facilitate meaningful and unambiguous comparisons between character descriptions

Approach  Model the process of data capture for character descriptions (form data model and database schema)  Develop an ‘ontology’ of ‘defined terms’ to use in character descriptions  Provide a database and interface for creating the ontology (terms, definitions, relationships between terms)  Provide a database and interface for recording specimen character descriptions (automatic interface generation from a description ontology).

Taxonomic Data Transfer Standard XML Standard Schema for exchange of Taxonomic Data between various providers/models Jessie Kennedy & Robert Kukla SEEK/GBIF/TDWG

The Prometheus II Character Model (1) Representing Characters as Atomic Statements: Description Elements Numerical Properties angle, diameter, length, width, density, height, number (count) Qualitative Properties lifecycle, shape, sex, symmetry, orientation, texture, colour

(2) Increasing the flexibility of Description Scores with Modifiers

(3) Creating a defined description terminology to Provide Consistency and Comparability

.....Defined Terms A Defined Term has a TERM leaf - DEFINITION big flappy thing - AUTHOR Kennedy - CITATION ‘Oor Wullie Annual’ (ID in Database) A Defined Term might be a STRUCTURE TERM leaf, hair, apex... - PROPERTY TERM length, shape... - STATE TERM tomentose, obovate... - MODIFIER TERM before, more than... - UNIT TERM

(3) Creating a defined description terminology to Provide Consistency and Comparability

The PartOf relationship in the ontology specifies all the possible compositional relationships between anatomical structures  a given structure can potentially be PartOf a number of Parent Structures  PartOf forms an acyclic, directed graph  Can be materialized as a tree hierarchy (by duplicating structures with more than one potential parent)

Ontology: Structural Hierarchy 1. Define PartOf Relations B PartOfA CPartOf A EPartOf A DPartOf B EPartOf B DPartOf C BA DC D E E 3. Nodes defined by Materialized Paths: A BA DBA EBA CA DCA EA

Specifying an exact structural context from the optional Part_Of hierarchy Androecium Column FlowerFloret Inflorescence

 where only the structures of interest for a given project are included  and ‘generic structures’ and ‘regions’ are explicitly added to the compositional tree  This then represents a description template or ‘Proforma’ used for a particular project  Therefore the actual structural ‘Paths’ used vary from project to project On a Project by Project basis a filtered version of the Ontology can be specified (A ‘Proforma’ Ontology):

BA DC D E H 9 8 J F I E 7 L K E K ONTOLOGY BA DC D E E L spine hair Generic Structures lower surface upper surface apex base centre Regions PROFORMA ONTOLOGY Creating a Proforma Ontology

Representing multiple copies of Structures BA DC D E E L ‘Leaf’ #1 proforma ontology 1.When finalising the project level Ontology  Structure B (Leaf) is cloned  The path of the Leaf structures B, D and E in the proforma ontology has to include its ‘clone’ identity i.e. B#1 or B#2. 2.When scoring specimen data  We might want to record data for multiple instances of each Leaf, and include an ‘instance’ identity: eg B#1: instance1,2,3... BD E ‘Leaf’ #2 [B#1] [B#2]

Ontology: Structural Types  Botanists and Taxonomists frequently refer to structures as Types_Of another structure e.g. berries and capsules are types of fruit  the types share all identifying features of the supertype  but can be distinguished by possession of a collection of states that are always true e.g. berries always soft and fleshy, capsules dry and dehiscent  For simplification we exclude types from the Part_Of hierarchy, representing them as an attribute of the parent structure e.g.Structure: Fruit  Might allow ‘automatic’ scoring of sets of states

A Demonstration Description Ontology SCOPE:  Angiosperms chosen, and limited to ‘classical’ anatomical structures and morphological characteristics  Attempt to pick a taxon level where can get agreement across users for the terminology  Hope that ontologies develop bottom up and are adopted by increasingly wide user community

A Demonstration Description Ontology STRUCTURES:  >1000 defined terms (term + definiton + citation)  24 Regions, 46 Generic Structures and 269 Structures (of which 126 are defined as Types)  160 optional Part Of relationships (only 19 Structure Terms currently described as potentially part of more than one superstructure)  Each of the 536 structure nodes in the tree is identifiable by its path; of these 331 are leaf nodes.

QUALITATIVE STATES & PROPERTIES:  Taxonomists could not readily assign many qualitative states to a ‘qualitative property’ de novo  They were however able to organize the states into ‘usage groups’  These seem to circumscribe a hierarchical taxonomy of properties, where the structural context/usage may also contribute to the circumscription of the ‘property’  State Terms are distributed between 72 State Groups/Properties (with between 2 and 79 members of each group) A Demonstration Description Ontology

The Project Definition Interface (Ontology  Proforma) Central organizing relationship for the ontology – ‘Partof’ hierarchy of structures. (Select desired structures, add necessary regions and generic structures.) Properties applicable to selected structure. (Expand to show states available). Modifiers for the scored property. (Spatial, relational etc.)

A Completed Proforma Interface

The Specimen Scoring Interface (Data Entry)

Taxonomic Data Transfer Standard XML Standard Schema for exchange of Taxonomic Data between various providers/models Jessie Kennedy & Robert Kukla SEEK/GBIF/TDWG

TaxonConcepts

Descriptions

DescriptionElements

Modifiers

 a novel, flexible model for representing taxonomic character descriptions  a format for capturing defined terminology specifications (as a simple ontology) Summary: In order to facilitate and promote data integration, comparability and reuse (1) we propose: (2) we provide:  a tool for specifying description ontologies  a demonstration ontology for the description of angiosperms  a tool which automatically uses an ontology to specify project description templates (proformas)  and facilitates recording of specimen descriptions in terms of the ontology

 user testing of tools and system: creating demonstration proformas collecting sample specimen data  create new ontologies for other taxa, investigate whether ontologies can be shared or extended amongst users  integrate Prometheus II descriptions into the Prometheus I representation of taxonomic hierarchies  can we represent the Prometheus model in SDD? Ongoing/Future Work:

Jessie Kennedy, Cédric Raguenaud, Mark Watson, Martin Pullan, Mark Newman, Peter Barclay, Martin Graham, Gordon Russell, Andrew Cumming, Sarah MacDonald, Kate Armstrong, Alan Cannon, Robert Kukla, Trevor Paterson