Representing taxonomy MarBEF-IODE workshop Oostende, 19-23 March 2007.

Slides:



Advertisements
Similar presentations
Extending Taxonomic Visualisations Dealing with Large Datasets, Structural Markers and Synonymy.
Advertisements

Introduction to Databases
Taxonomy MarBEF/IODE training workshop Oostende, March 2007.
Ocean Biodiversity Information – 29/11-1/12/20041 European Register of Marine Species version 2.0 data management, current status and plans for the future.
Managing Data Resources
From Class Diagrams to Databases. So far we have considered “objects” Objects have attributes Objects have operations Attributes are the things you record.
File Systems and Databases
SESSION 7 MANAGING DATA DATARESOURCES. File Organization Terms and Concepts Field: Group of words or a complete number Record: Group of related fields.
FADA workshop, 5-7 December 2008 in Bruges (Belgium) World Register of Marine Species and Aphia IT platform Ward Appeltans
Relational Databases What is a relational database? What would we use one for? What do they look like? How can we describe them? How can you create one?
Exercise – Bird Count Database TablesFields Species Species_ID, Common Name LocationHabitat, Coordinates VisitTime, Weather, Date SurveyMinute, species,
Chapter 1 1 © Prentice Hall, 2002 Database Design Dr. Bijoy Bordoloi Introduction to Database Processing.
IST Databases and DBMSs Todd S. Bacastow January 2005.
ACCESS CHAPTER 1. OBJECTIVES Tables Queries Forms Reports Primary and Foreign Keys Relationship.
2.3 Organising Data for Effective Retrieval
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
GIS Concepts ‣ What is a table? What is a table? ‣ Queries on tables Queries on tables ‣ Joining and relating tables Joining and relating tables ‣ Summary.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Designing and Implementing Web Data Services in Perl
Access Primer UoN workshop Naivasha, 30 July – 4 August 2006.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Access Primer Africamuseum 5 June MS Access  Relational Database Management System Data/information resides in series of related tables Principle.
RELATIONSHIPS Generally there are two main database types: flat-file and relational.
6 Chapter Databases and Information Management. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits.
© 2007 by Prentice Hall 1 Introduction to databases.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 9 – Building Links,
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Oleh Munawar Asikin. Principles of Information Systems, Seventh Edition 2  Database management system (DBMS): group of programs that manipulate database.
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
Access Chapter 2: Relational Database Objectives Design data Create tables Understand table relationships Understand data types, key, & field properties.
1 Outline  What is a Primary Key?  AutoNumber primary keys  Single-field primary keys  Composite-field primary key  About Foreign Keys  Database.
Normalisation Africamuseum 5 June What is ‘Normalisation’?  Theoretical: satisfying the requirements of the different ‘Normal Forms’, as spelled.
Grade 11 Computer Science. Relational Databases  Using the link below, answer questions in your notebooks  Look at Kites.accdb database to refresh your.
Architectural Patterns Support Lecture. Software Architecture l Architecture is OVERLOADED System architecture Application architecture l Architecture.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
DATABASE What exactly is a database How do databases work? What's the difference between a spreadsheet database and a "real" database?
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Requirements of a Taxonomy Database Tcl-DB a Prototype.
Relational Database vs. Data Files By Willa Zhu JISAO/UW - PMEL/NOAA March 25, 2005.
Databases,Tables and Forms Access Text by Grauer Chapters 1 & 2.
ITGS Databases.
Database: Relational Vs Flat File. Databases - Structure Flat file database, contains only one table Relational database, contains more than one table.
What have we learned?. What is a database? An organized collection of related data.
Database revision.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
Flat Files Relational Databases
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Extending the biogeographical model Africamuseum 6 (7?) June 2013.
Declarative Languages and Model Based Development of Web Applications Besnik Selimi South East European University DAAD: 15 th Workshop “Software Engineering.
1 Geog 357: Data models and DBMS. Geographic Decision Making.
Introduction to Databases Angela Clark University of South Alabama.
CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.
Department of Mathematics Computer and Information Science1 CS 351: Database Management Systems Christopher I. G. Lanclos Chapter 4.
1 Agenda TMA02 M876 Block 4. 2 Model of database development data requirements conceptual data model logical schema schema and database establishing requirements.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 2 Objectives: Understanding and Creating Table.
Managing Data Resources File Organization and databases for business information systems.
Short History of Data Storage
Databases Key Revision Points.
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
CIS 155 Table Relationship
ICT Database Lesson 1 What is a Database?.
Data Structures: Disjoint Sets
MANAGING DATA RESOURCES
Database.
Spreadsheets, Modelling & Databases
HCSI 709: Healthcare Databases
Presentation transcript:

Representing taxonomy MarBEF-IODE workshop Oostende, March 2007

Philosophy  Structure has to be as simple as possible  But not any simpler!!  Alternatives to represent classification and hierarchy  Alternatives to represent synonymy

Hierarchy: flat table  Every rank in the hierarchy is represented by a field in the table  Simplest solution Easy to create Easy to query

Hierarchy: flat table

 Problems not normalised! Not a real problem if a quick-and-dirty solution is all that is needed Difficult to maintain hierarchy in the long term ‘Standard’ problems with non-normalised database  Possible conflicting information, inefficient storage… Cfr MASDEA; too simple

Hierarchy: normalised tables  Every rank is represented by a separate table  Not very difficult to write a query to regenerate flat table  Every taxon can have additional information Extra fields with description…

Hierarchy: cascading tables

Hierarchy: normalised tables  Avantages Easy to maintain and query Normailised, possible to add information at any level of the hierarchy  Drawbacks Ranks are hard-wired on the structure of the database New rank would require change of the structure of the database And probably of the user interface, web interface… Number of tables Lot of functionality duplicated

Taxonomic reality  Ranks used depend on the taxonomic group Botany: mainly infra-specific; zoology: mainly on higher levels Many of the ranks are only sparsely used  Needs for a more flexible system  Much of the functionality is the same across all ranks ‘parent’, synonymy Authority, description…

‘Open Hierarchy’  Possible to define new ranks without having to rewrite the structure of the database  All taxonomic names are stored in a field in a single table; other fields indicate parent and rank  Many-to-one relation: a single parent, several descendants Include ID of parent in the record of the descendant

Open Hierarchy  Avantages Completely normalised Flexible  Drawbacks Difficult to query classification Queries of the type ‘all species of the Echinodermata’… Solution: ‘Calculated field’ Programmatical (loop in computer language) Recursive query

Synonymy  Every taxon can have several synonyms; in principle, only one valid name for any synonym Many-to-one relation: one valid name, many synonymous names Include ID of the valid name in the record of the synonymous name Other fields for the type of synonymy…

Implementation in OBIS (PostgreSQL)

Calculated field: ‘stored path’  Calculate a field, as a concatenation of id of parent, parent of parent…  E.g. x5x45x65x 5: Animalia 45: Arthropoda 65: Crustacea Stored path of all taxa belonging to Crustacea start with x5x45x65x

Query the Stored Path  Get all species from Echinodermata: select * from obis.tnames where storedpath~(select '^'||storedpath||id||'x' from obis.tnames where tname='Echinodermata')::text and rank_id=220

Recursive query  All taxa belonging to given taxon: with recursive includedtaxa(id, tname) as ( select id, tname from obis.tnames where tname='Semelidae' union select tnames.id, tnames.tname from obis.tnames inner join includedtaxa on tnames.parent_id=includedtaxa.id ) select * from includedtaxa order by tname

The other way  Finding parent of given rank of a species with recursive parenttaxa(id, parent_id, tname) as ( select id, parent_id, tname from obis.tnames where tname='Abra alba' union select tnames.id, tnames.parent_id, tnames.tname from obis.tnames inner join parenttaxa on parenttaxa.parent_id=tnames.id and tnames.rank_id>=140 ) select * from parenttaxa order by tname

Rest of the taxonomic model  Ranks should be in a separate table Information on the level of the rank can be added Possibility of extra quality control Rank of a parent as compared to rank of descendants Rank of siblings should be same

Documentation  Documenting sources of information  Add sources/references ‘Audit trail’: source of the information in the database Taxonomic information: reference of the original description Type of the source: expert, database, publication  Date and person responsible for the last revision of the record

Sources  Many-to-many relation Every source can contain information on several taxa A single taxon can be documented in several sources  Necessitates an extra table to represent the relationship Divide one many-to-many in to one-to-many relationships

Add distribution  Localities from where a taxon has been reported  Many-to-many relation One locality has several taxa One taxon is found on several localities  Relation must be qualified Source! Validity of the observation