CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

EBSCO Discovery Service
Understanding Relational Databases Basic Concepts and Applications for Qualitative Content Analysis.
Resource Navigator Discovering, delivering and managing your information resources.
CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004.
Virtualizing Entomology Collection Student: Di Wang (Alan) Sponsors: John Marris: Curator, Entomology Research Museum Stuart Charters: Department of Applied.
YOUR LOGO HERE YOUR LOGO HERE Amy Brink Comparing caTissue Plus to caTissue 1.3.6A Amy Brink March 5 th, 2014.
QUT Payroll Services Sessional eForm Presented by Christine Delaney, QUT Payroll Manager with Technical Support from Edward Eacock, QUT Financial Systems.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang.
> a patent search service supplied by Patents & Technology Surveys Ltd PROFESSIONAL ONLINE PATENT INFORMATION SERVICE.
© 2008 Map of Medicine Ltd. Commercial and in confidence. Training Foundation Module 1 - Introduction to Localisation January 2012.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Corals and sea anemones on line: a functioning biodiversity database D. G. Fautin R. W. Buddemeier University of Kansas: Department of Ecology and Evolutionary.
1 NewSouth HR Inquiries Emergency Contacts. 2 Select New South HR by a left mouse click once on NewSouth HR icon.
MLA International Bibliography June 2007 Update. Chadwyck-Healey Platform.
An On-line Atlas of Marine Diversity and a growing inventory of others.
Introduction to Databases and Database Languages
MEGS+ Michigan Electronic Grants System Plus Office of Special Education May 2012.
MarLIN - CSIRO Marine Laboratories Information Network CAAB - Codes for Australian Aquatic Biota plus other systems of interest... Tony Rees Divisional.
Welcome 2013 User Group Meeting Voting on New Features.
Primavera Highlights During COLLABORATE  Primavera Key Note: Making the Most of Your Oracle Primavera Investment Dick Faris, Primavera Co-Founder & Oracle.
TAXAMATCH, a “fuzzy” matching algorithm for taxon names, and potential applications in taxonomic databases Tony Rees CSIRO Marine and Atmospheric Research,
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
Pam Fuller U.S. Geological Survey Gainesville, FL Nancy Elder U.S. Geological Survey Western Fisheries Research Center Marrowstone Marine Station Nonindigenous.
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
YourClassPage.com TM Inservice Quickstart Guide Getting YOUR classpage up and running!
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
An Online Knowledge Base for Sustainable Military Facilities & Infrastructure Dr. Annie R. Pearce, Branch Head Sustainable Facilities & Infrastructure.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
FireRMS NEMSIS (Part 2) Presented by Laura Small FireRMS Quality Assurance.
Clinical Trials Reporting Program CTRP Release 3.9 Registration & Accruals Application Enhancements April 3, 2013.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
The Development of the Ceramics and Glass website Mia Ridge Museum Systems Team Museum of London.
MarLIN CSIRO Marine Laboratories Information Network update April 1999 Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart acknowledgements:
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Proposed IEEE Vtools Enabled Speaker Database Proposed by: Tim Schoenfelder, Sat Basu,
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Map of Medicine Release 3. Accenture is the East & East Midlands and North East Local Service Provider for the National Program for IT. Accenture is the.
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
Collections Management Proposal for a Simplified Structure for EMu Chicago, Oct 2005.
FILES AND DATABASES. A FILE is a collection of records with similar characteristics, e.g: A Sales Ledger Stock Records A Price List Customer Records Files.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
MyFloridaMarketPlace Vendor Performance Tracking Overview: Vendor Interface February 23, 2006.
Ontario Data Documentation, Extraction Service and Infrastructure.
MarLIN - CSIRO Marine Laboratories Information Network.
Online Catalog Tutorial. Introduction Welcome to the Online Catalog Tutorial. This is the place to find answers to all of your online shopping questions.
WI COMMODITY ORDERING SYSTEM. Commodity Allocation & Receipt Summary (CARS) Report Access.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
Extending the biogeographical model Africamuseum 6 (7?) June 2013.
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
CAAB - Codes for Australian Aquatic Biota Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
System concept and development by: Tony Rees Divisional Data Centre CSIRO Marine Research, Australia c-squares - a new method for representing, querying,
Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas
Charles Copp, Neil Caithness & Richard White.  Evaluation, selection and acquisition of existing thesauri  Thesaurus modelling - logical and physical.
OpenSpecimen Monthly Community Call Krishagni Solutions (India)
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Databases.
Tony Rees CSIRO Marine Research 29 November 2004
Essential Skills Wales
SharePoint Site Admin Training
Template library tool and Kestrel training
EBSCO Discovery Service
INTAKE OF NEW PORTFOLIO AND INVOICES
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
USING CONFLUENCE AS YOUR CMS
Presentation transcript:

CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart

Linking taxonomic resources db B (e.g. specimen coll.) Taxon Taxon etc. db A (e.g. field surveys) Taxon Taxon etc. db C (e.g. supporting info) Taxon Taxon etc. Organisation 1 Available for any taxon: - scientific name (may or may not be consistent across db’s) - internal ID in database (may or may not be common across db’s) - possibly 1 or more external ID’s Master query system Organisation 2 Organisation 3... User searches by scientific or other name db query

Name-based system db B (e.g. specimen coll.) Taxon Taxon etc. db A (e.g. field surveys) Taxon Taxon etc. db C (e.g. supporting info) Taxon Taxon etc. Organisation 1 Names may vary across db’s (different opinions, old data, typographic errors, variant spellings, authorities present/absent, subgenera present/absent, etc.) Names may change in future Master query system name 1a = name 1b = name 1c etc. Organisation 2 Organisation 3... db query (all possible names) (high maintenance overhead at master db level) poss. name 1d not found

Example (doughboy scallop) Chlamys asperrimum Chlamys asperrima Chlamys (Mimamachlamys) asperrimum Chlamys (Mimamachlamys) asperrima Mimachlamys asperrimum Mimachlamys asperrima - not really a synonymy, just a partial “potential variants” list; a full list would include versions with/without authors, any other synonyms, “near” matches (possible typographic errors), etc…

Name-based system db B (e.g. specimen coll.) Taxon Taxon etc. db A (e.g. field surveys) Taxon Taxon etc. db C (e.g. supporting info) Taxon Taxon etc. Organisation 1 Names may vary across db’s (different opinions, old data, typographic errors, variant spellings, authorities present/absent, subgenera present/absent, etc.) Names may change in future Master query system name 1a = name 1b = name 1c etc. Organisation 2 Organisation 3... db query (all possible names) (high maintenance overhead at master db level) poss. name 1d not found

External ID-based system db B (e.g. specimen coll.) Taxon (ID1) Taxon (ID2) etc. db A (e.g. field surveys) Taxon (ID1) Taxon (ID2) etc. db C (e.g. supporting info) Taxon (ID1) Taxon (ID2) etc. Organisation 1 Data searching is name independent (user agencies can follow own wishes re consistency, formats, timing of updates etc.) Master query system name 1 = ID1 etc. Organisation 2 Organisation 3... db query (single ID) (low maintenance overhead at master db level)

Essential/desirable properties of external taxon identifiers Essential – ability to cover all taxonomic groups of interest – ability to cope with numbers of taxa potentially required – translation system (codes:names) readily accessible – codes can be created in realistic time frame for taxa needed Desirable – systematic/meaningful approach to code allocation (cf. telephone numbering system) - understandable to humans – not too many digits – codes are stable (preferably NOT dependent on genus/species name) – taxon names are reliable (i.e., content is subject to ongoing QC and maintenance as needed) – compatibility/interoperability with emerging global standards

CMR’s “CAAB” system” – able to cover all taxonomic groups – up to 999,999 codes (optionally 3 million) per “major category” (phylum or similar) – web interface for codes/names access – local (i.e. Australian) control of content (rapid data addition possible) – systematic/meaningful approach to code allocation (category number, family number and species/taxon number) – not too many digits (2 digits for category and 6 for family+species) – codes are stable (not dependent on genus/species name) – taxon name maintenance can be devolved to relevant specialists – cross-mapping to ITIS and other codes incorporated in current database structure – possible candidate or model for a national system?

Other CAAB features searchable by scientific or common name synonyms/variants, as entered in the database (useful as entry point) comments fields available for external display and/or admin use holds custom links for database querying at CMR via the web holds on-line links to other information resources CAAB administration and data entry can be carried out remotely by relevant persons (uses web access tools and user/domain authentication) special sections of CAAB are available to deal with family-level groups, other species groupings as needed, and informal/agency-designated taxa

Portion of main “taxon details” table in CAAB

ITIS codes - another option Pluses... – possibly a global standard in the future – able to cover all taxonomic groups – no limit to number of taxa which can be covered – web interface for codes/names access – not too many digits (typically 5 or 6) Minuses... – codes are non meaningful (just a number, semi-random allocation) – codes are fixed to genus/species name, however, cross-mapping is maintained within the database for synonyms, where held on the system – would need to investigate how locally supplied content might be able to be added to the master system, in a realistic time frame

Topics arising - for consideration for “Australian virtual museum”... Names or (name independent) taxon ID’s to be used for database linkages? If names, is a master list achievable in real time? Who will undertake continuous update required? What performance implications may arise? If taxon ID’s, what is the best route, e.g…. – ITIS (with upgraded Australian content)? – CAAB or other existing Austr. system (extended)? – A new system designed from the ground up? If any of the above, what is the best way to manage and resource the process? What time frame is realistic?

Sample scientific name search (NB: also searches synonyms as held in the database)

Sample scientific name search (go to demo if available)