A continuously updated All Genera Index: an achievable goal for Biodiversity Informatics? Tony Rees – CSIRO Marine and Atmospheric Research, Australia.

Slides:



Advertisements
Similar presentations
IRMNG – the Interim Register of Marine and Nonmarine Genera: rationale and current status Talk prepared for GN-CoL names and taxonomy.
Advertisements

THE STEPS OF SEARCH You have opened a new veterinary clinic in a small town, and want people in the vicinity to know about it. You need some new ideas.
Presentation of Iraqi Legal Database Phase III. Presentation Outline 1.The ILD’s homepage. 2.Searching by Reference. 3.Searching by Subject. 4.Searching.
Compiled by Helene van der Sandt. Is a search engine that searches for scholarly literature Can search across many disciplines Searches for articles,
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang.
Taxonomic Literature Standards and Synergies TDWG 2006 Anna L. Weitzman & Christopher H. C. Lyal.
An introduction to Cambridge Collections Online… Full online access to collections of classic and newly- published scholarly titles in PDF format Contains.
1 Using the Appendices and the Checklist of CITES Species CITES Secretariat.
Integrated Taxonomic Information System Janet Gomon, Deputy Director, ITIS Smithsonian Institution Museum of Natural History The.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Next Steps in the Catalogue of Life Frank Bisby, Sp2000 and Thomas Orrell, ITIS Catalogue of Life Partnership.
Using Social Care Online: an overview Version 1.0 April 2015.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
United Nations Statistics Division Recoding the business register to ISIC Rev.4.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
ACRIN 6698 Diffusion-weighted MRI Biomarkers for Assessment of Breast Cancer Response to Neoadjuvant Treatment: An I-SPY 2 Trial Substudy Presented by:
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
TAXAMATCH, a “fuzzy” matching algorithm for taxon names, and potential applications in taxonomic databases Tony Rees CSIRO Marine and Atmospheric Research,
June Overview of Operations & the INIS Record INIS Training Seminar 2-6 June 2003 Vienna, Austria Seyda RIEDER INIS Section Supervisor, Bibliographic.
Overview report of a series of FVO fact- finding missions and audits carried out in 2012 and 2013 in order to evaluate the systems put in place to give.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
Overview of the EU Food Safety Requirements
Evidence Based Medicine Meta-analysis and systematic reviews Ross Lawrenson.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
Migrant Student Information Exchange (MSIX) MSIX Lab: Using MSIX Reports 2011 OME Conference November 14-16, 2011 Nashville, Tennessee.
A curation interface for reconciliation of species names for India. Thomas Vattakaven and R. Prabhakar, India Biodiversity Portal, Strand Life Sciences,
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
How to Request Materials Tutorial Order, Borrow, Renew made easy (expected running time: ~7 minutes) Oregon State Library.
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
Christina Flann Species 2000 October 2014 Catalogue of Life Indexing The World’s Known Species Connecting the taxonomic community and the names infrastructure.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
UoS Libraries 2011 EndNote X5 - basic graduate session.
Introduction to Website Evaluation Patricia Heeter EME6415.
1 Smart Searching Techniques Fall 2006 the Library.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
CAAB - Codes for Australian Aquatic Biota Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
Types of Reference Sources If you are finding information there are several ways to do this..
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Purchase Order Creation Manually or automatically With or without reference to other documents Data Documents: Purchase requisition, Purchase order, RFQ,
African Register of Marine Species AfReMas Leen Vandepitte On behalf of WoRMS data management team.
TOPSpro Special Topics Data Detective II: Data Integrity and Payment Points.
Quality control of biodiversity data: tools & techniques Leen Vandepitte On behalf of WoRMS, EurOBIS & LifeWatch data management teams.
Lihong Zhu Interim Cataloging Manager/Monographic Cataloging Librarian Washington State University Libraries
HTBN Batches These slides are intended as a starting point for further discussion of how eTime might be extended to allow easier processing of HTBN data.
Review of literature S. Balakrishnan. What is literature review? The terms literature search, literature review and literature survey are one and the.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Comprehensive Continuous Improvement Plan(CCIP) Training Module 4 Funding Application Pages.
Using Social Care Online: an overview
Searching the research literature
IRMNG – the Interim Register of Marine and Nonmarine Genera: rationale and current status Talk prepared for GN-CoL names and taxonomy.
Tony Rees CSIRO Marine Research 29 November 2004
Instructions Dear author(s),
Comments on ASFA Input Helen Wibley, FAO 2016 ASFA Advisory Board Meeting – Hanoi, Viet Nam.
PAF 101 Module 2, Lecture 1 “An educated person is one who has learned that information almost always turns out to be at best incomplete and very often.
Search Techniques and Advanced tools for Researchers
How to Use “Indian Citation Index (ICI)”
Chapter 13 Quality Management
Introduction of KNS55 Platform
Welcome to the Quantitative Trait Loci (QTL) Tutorial
CREATING DISTRIBUTION IDS IN details Online
Big Data Needs Little CRUD:
Presentation transcript:

A continuously updated All Genera Index: an achievable goal for Biodiversity Informatics? Tony Rees – CSIRO Marine and Atmospheric Research, Australia TDWG Conference, October 2011

Tony Rees: Continuously Updated All Genera Index Why an All Genera Index? All-species index(es) will take time to complete, all-genera potentially more tractable: ~10x smaller task (~2m valid species, maybe 250k genera) leverage off existing genus-level compilations e.g. ING for plant names, Nomenclator Zoologicus for legacy animals, maybe ZooBank for future animal names, IPNI/others for plants prokaryote, virus names also well curated and accessible Aim for horizontal coverage first (no missing tax. sectors, also include both extant + fossil names), vertical completeness e.g. to species level can be secondary consideration Can carry the burden of tax. assignments – then species merely need to be attached to the correct genus instance Genera can have significant nomenclatural and taxonomic interest i.e. valid vs. invalid names, author / year and place of publication (i.e. original work), genus-level synonyms and homonyms Can carry other attributes / assertions e.g. all species have trait “x”, occur in habitat “y”, within geological range “z”

Tony Rees: Continuously Updated All Genera Index Continuing a distinguished tradition… D. Patterson, Nature, 2003 Remsen & Patterson, TDWG, 2007 D. Remsen, in “The Linnaean Ark”, 2010

Tony Rees: Continuously Updated All Genera Index Different use cases, different approaches Remsen / Patterson / uBio approach (if correctly understood) Assemble largest possible list of taxonomic names from multiple sources / provenance, reconciliation / deduplication / assignment to tax. hierarchy is subsequent activity Main initial use case is for information retrieval / query expansion (multiple variants of name authorship are seen as valuable) Author / OBIS interest and approach Starting point is a tax. hierarchy (kingdom through family), all names must live in this structure Names from “trusted sources” given precedence, others used sparingly and subject to additional verification, multiple variants of name authorship are rationalized to single preferred version Important focus (after tax. assignment) for OBIS is on attributes, in particular marine vs. nonmarine, extant vs. fossil – i.e. use the power of the list for non-tax. as well as taxonomic purposes Linkages to primary taxonomic literature also of potential value (allows harvesting of attributes, expanded understanding of original tax. concepts, more…)

Tony Rees: Continuously Updated All Genera Index Leverage existing genus-level compilations

Tony Rees: Continuously Updated All Genera Index Leverage existing genus-level compilations (Nomenclator Zoologicus extract)

Tony Rees: Continuously Updated All Genera Index Characteristics of nomenclator-style compilations Emphasis is on nomenclatural information i.e. facts (name X was established by Y in publication Z on date D) and nomenclatural synonyms / rationale, subsequent tax. treatment (“opinions”) may or may not be included Literature citations seen as critical component (excellent!), often verified from the original – i.e. a nomenclator can be considered a proxy for the primary literature Recent / on-line nomenclators often have full citation information / reference modules (e.g. Catalog of Fishes, Index Fungorum, Systema Dipterorum, more…) ING and Nomenclator Zoologicus use the more terse “nomenclator style” or microcitation (no article title, full authorship or page range included) – less obvious for verifying/sourcing relevant attributes, or cross-linking to bibliographic lists Non-taxonomic attributes may also be included in some compilations, but not all.

Tony Rees: Continuously Updated All Genera Index Assembling the “desired” data set In practice, for the full set of desired information it may be necessary to supplement information from nomenclators with that from other sources i.e. subsequent tax. treatments and opinions, bibliographies / literature indexes, sources for attributes such as eco- and geo- characteristics Additional effort may be needed to massage supplied fragmentary / inconsistent taxonomies into a coherent whole at higher levels Higher tax. itself is a moving target too – e.g. for Angiosperms (APG, APG II, APG III…), protists, viruses and prokaryotes Information varies from readily available / well curated / comprehensive / current (for “examplar” groups) to fragmentary / out-of-date / hard-to-access / no recent overviews for others Desired level of detail is not available at genus level from current Cat. of Life, need to go to contributing GSDs, checklists, primary literature and elsewhere at this time (also to relevant sources for fossil taxa).

Tony Rees: Continuously Updated All Genera Index Author’s experience to date First “cut” in as names indexing operation for OBIS, ramped up in 2006 as IRMNG, the Interim Register of Marine and Nonmarine Genera Concept name follows ERMS, the European Register of Marine Species (now WoRMS), also including “Interim” for incomplete / provisional, but hopefully useable in its present state Initial guesstimate to complete was 3-6 months (slight underestimate!) All names sourcing and ingestion based on manual data loading at this time, would like to move to automated data feeds / updates as available in future versions Uploading initial batches of data straightforward, problems come with subsequent ones required for gap filling, i.e.: Duplicate and near-duplicate detection Genus-level homonyms are a significant issue Dealing with data conflicts – same name, different tax. opinions or orthographies for supplied information.

Tony Rees: Continuously Updated All Genera Index A portion of the IRMNG master genus table (as at Oct 2011)

High-level overview + relevant statistics for “all life” (currently possible for names, in future for valid taxa) Navigate the tax. hierarchy in any direction Generate hierarchical lists Generate alphabetic lists Sort / filter by any desired criteria, both taxonomic and non-taxonomic Generate lists of homonyms, within or across Codes Indicate current tax. hierarchy, nomenclatural / taxonomic status, and attributes (to varying degrees) for any input name Holds partial species lists for selected genus names e.g. from Cat.of Life (with permission) and elsewhere (could be developed further as desired) Indicate near match targets to any input name (“did you mean…”) – using TAXAMATCH fuzzy matching (latter also adopted by iPlant, PESI, GNI, more…) Tony Rees: Continuously Updated All Genera Index Services / views this currently supports

Tony Rees: Continuously Updated All Genera Index IRMNG-generated statistics for “all life” (web query 6 Oct 2011) (NB, can also generate these lists as required via the web, by navigating the hierarchy, or enter the hierarchy at any level)

Tony Rees: Continuously Updated All Genera Index Current IRMNG status >450k genus names, in 17k+ families as at October 2011 (however significant subset, ~30%, still await family-level allocation) Start made on resolving genus-level synonyms on group-by-group basis, but much more to do Genus coverage considered >95% complete , less so for more recent data:

Tony Rees: Continuously Updated All Genera Index Some questions for this meeting Is this a worthwhile effort more generally i.e. as a community resource, cf. ongoing equivalent activities e.g. Catalogue of Life, GSDs, ITIS, PaleoDB, more… If so, where should it reside, who should manage/curate for the future To what extent can it leverage or synergise with emerging GN* activities and infrastructure To what degree can existing manual data upload / infill processes be automated How best to achieve continuing population and currency, e.g. as new names appear (~2k genera, 25k new species / yr if relevant).

Contact Us Phone: or Web: Thank you Visit IRMNG at Thanks to data sources and funders who have contributed to development of IRMNG to date!

Tony Rees: Continuously Updated All Genera Index Supplementary slide

Tony Rees: Continuously Updated All Genera Index The emerging GN* world… – which elements relevant to this task?