On-line biological data concepts at CSIRO Marine Research, Australia Tony Rees & Kim Finney Divisional Data Centre CSIRO Marine Research, Hobart, Australia.

Slides:



Advertisements
Similar presentations
CSIRO Marine Research Divisional Data Centre Current and Future Activities Tony Rees, Data Centre Manager April 2004.
Advertisements

The North American Carbon Program Google Earth Collection Peter C. Griffith, NACP Coordinator; Lisa E. Wilcox; Amy L. Morrell, NACP Web Group Organization:
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
Spatial Information Integration Services (SIIS) ISO/TC211 Workshop on Standards in Action Adelaide, South Australia October 2001 Mr. Neil Sandercock, SA.
The Natural History Museum Speaker: Charles Hussey Science Data Co-ordinator Department of Information and Library Systems
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
Spatial Indexing, Search, and Mapping for Species level databases Tony Rees, CSIRO Marine and Atmospheric Research (CMAR), Hobart, Tasmania, Australia.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
For Mapping Biodiversity Data Data Management Options.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Planned Title: Review of Evaluation of Geospatial Search Allan Doyle.
Introducing the CUAHSI Hydrologic Information System Desktop Application (HydroDesktop) and Open Development Community Jiří Kadlec, Daniel Ames, Teva Velupillai.
The material in this slide show is provided free for educational use only. All other forms of storage or reproduction are subject to copyright- please.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
Databases & Data Warehouses Chapter 3 Database Processing.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
MarLIN - CSIRO Marine Laboratories Information Network CAAB - Codes for Australian Aquatic Biota plus other systems of interest... Tony Rees Divisional.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Metadata and Data Management activities at CSIRO Marine Research, Australia Kim Finney & Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart.
DNR Data for Comprehensive Planning Shannon Fenner, DNR Statewide Land Use Team Leader.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
CORE 2: Information systems and Databases CENTRALISED AND DISTRIBUTED DATABASES.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
The material in this slide show is provided free for educational use only. All other forms of storage or reproduction are subject to copyright - please.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
OBIS and species distributions Tony Rees discussion presentation, March 2003 Some fundamental intentions for OBIS... –Choose any species and discover its.
Tony Rees Divisional Data Centre CSIRO Marine Research, Australia Metadata concepts, issues and experiences – lessons from 8 years.
MarLIN CSIRO Marine Laboratories Information Network update April 1999 Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart acknowledgements:
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
Australia’s National Vegetation Information System (NVIS)
Overview PlantCollections – Publish information about public garden collections – Using existing infrastructure Morphbank – Goals and capabilities of.
NDD (National Oceans Office Data Directory) development overview as at 1 July 2002 Tony Rees/Miroslaw Ryba CSIRO Marine Research, Hobart.
NeMys: an evolving biological information system, a state of art Deprez, Tim (UGent) Vincx, Magda (UGent) Vanden Berghe, Edward (VLIZ) Mees, Jan (VLIZ)
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
U.S. Department of the Interior U.S. Geological Survey The Biological Data Profile Extending the FGDC Metadata Standard Kirsten Larsen.
MarLIN - CSIRO Marine Laboratories Information Network.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
TSS Database Inventory. CIRA has… Received and imported the 2002 and 2018 modeling data Decided to initially store only IMPROVE site-specific data Decided.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
A superior collections management system for the world’s largest: Museums Art Galleries Historical Societies Herbaria Botanic Gardens KE EMu.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
Metadata Content Entering Metadata Information. Discovery vs. Access vs. Understanding Cannot search on content if it is not documented. Cannot access.
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
CAAB - Codes for Australian Aquatic Biota Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
The Proliferation of Metadata Standards and the Evolution of NASA’s Global Change Master Directory (GCMD) Standard for Uses in Earth Science Data Discovery.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
- How to draw a clear distinction between a client and a server(there is often no clear distinction) - A server may continuously act as a client - Distinction.
System concept and development by: Tony Rees Divisional Data Centre CSIRO Marine Research, Australia c-squares - a new method for representing, querying,
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Ideas on Opening Up GEOSS Architecture and Extending AIP-5 Wim Hugo SAEON.
US Army Corps of Engineers BUILDING STRONG ® eCoastal: Overview & Status eCoastal Workshop Rose Dopsovic Bowhead Science & Technology, LLC for the Spatial.
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
Discovery and Metadata March 9, 2004 John Weatherley
ICAO Seminar on Aeronautical spectrum management (Cairo, 7 – 17 June 2006) SAFIRE Spectrum and Frequency Information Resource (presented by Eurocontrol)
Flanders Marine Institute (VLIZ)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
ICAO Seminar on Aeronautical spectrum management (Cairo, 7 – 17 June 2006) SAFIRE Spectrum and Frequency Information Resource (presented by Eurocontrol)
Presentation transcript:

On-line biological data concepts at CSIRO Marine Research, Australia Tony Rees & Kim Finney Divisional Data Centre CSIRO Marine Research, Hobart, Australia

Our website:

Pre-existing situation at CMR (before 1997) Data in a variety of databases and flat files No metadata or digital documentation No web access to any data or metadata CAAB (taxon coding system) in existence but coverage patchy and compliance variable

Our implementation path Stage 1 ( )... Construct a searchable, web-accessible metadata system and start population it with information - MarLIN v1 Upgrade CAAB to form a comprehensive taxon dictionary for MarLIN (also accessible by SQuID) Build a pilot data store and visualisation system with a web- driven GUI (Java applet) - SQuID v1 Stage 2 (2000-)... Build SQuID v2 (onwards) to become a comprehensive data store, with upgraded links to MarLIN and CAAB Implement linkage between MarLIN and Australia-wide, distributed metadata search system Stage 3… ???

Our system overview Subsets of information shared with other metadata directory systems Entry point to data Display relevant metadata Data directory (metadatabase) - holds info at “dataset” level (e.g. survey, species range) Master data storage (includes index layer) - holds info at the atomic data level Taxon dictionary

Digression #1: Taxon matching Simplistic view: – text match on one field (“scientific name”) or two (genus + species) More comprehensive approach: – 10 or more fields required, e.g. in CAAB we define the following: Genus Subgenus Species Qualifier also need to flag: Subspecies - Is botanical or zoological code applicable? Variety - Species name latin or informal (“sp. A”, etc.)? Original Author/s - Has name changed from original? (even if Original Date no revising author/date stored) Revising Author/s Revision Date Authority Addendum Examples from our database: Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton & Oates, 1983 … a coccolithophorid Heteroclinus sp. 1 [in Gomon et al, 1994].. Kuiter's weedfish

Taxon matching … continued We have standardised on taxon codes, rather than names for data storage and matching … names are stored as an attribute of the code (and can be updated in the future as needed) Our “CAAB” coding system has evolved over 20+ years - earlier generations of codes are maintained on the system New web-based access facility for retrieving latest name for a code, searching for a taxon, etc. Same CAAB codes are also used by other marine science/fisheries agencies around Australia Facility newly implemented in CAAB to hold ITIS codes, for cross-reference to international systems in the future

CAAB services available Retrieve current sci. name, common name(s), taxon code, taxon report CAAB user interface Initiate a MarLIN search, ITIS report, FishBase report User searches by scientific name, common name or taxon code (or portion thereof) List taxa by CAAB category or family Application- level requests Generate scientific name, common name, current code (if applicable) for a given taxon code Call a CAAB taxon report List taxa matching query Translate an ITIS number to a CAAB code (or vice versa)

CAAB web interface (current version)

Digression #2: taxonomy keywords CAAB uses “major categories” (mostly = phyla) MarLIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of CAAB codes (in current implementation) NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS) - c. 50 currently relevant … could also cross-map to GEMET (EC) list (c.200) EARTH SCIENCE >> Biosphere >> Zoology >> Amphibians EARTH SCIENCE >> Biosphere >> Zoology >> Anemones EARTH SCIENCE >> Biosphere >> Zoology >> Arachnids EARTH SCIENCE >> Biosphere >> Zoology >> Arthropods EARTH SCIENCE >> Biosphere >> Zoology >> Birds EARTH SCIENCE >> Biosphere >> Zoology >> Centipedes EARTH SCIENCE >> Biosphere >> Zoology >> Corals EARTH SCIENCE >> Biosphere >> Zoology >> Crustaceans EARTH SCIENCE >> Biosphere >> Zoology >> Echinoderms EARTH SCIENCE >> Biosphere >> Zoology >> Fish EARTH SCIENCE >> Biosphere >> Zoology >> Flatworms EARTH SCIENCE >> Biosphere >> Zoology >> Insects EARTH SCIENCE >> Biosphere >> Zoology >> Invertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Jellyfish EARTH SCIENCE >> Biosphere >> Zoology >> Mammals EARTH SCIENCE >> Biosphere >> Zoology >> Millipedes EARTH SCIENCE >> Biosphere >> Zoology >> Mollusks EARTH SCIENCE >> Biosphere >> Zoology >> Reptiles EARTH SCIENCE >> Biosphere >> Zoology >> Roundworms EARTH SCIENCE >> Biosphere >> Zoology >> Segmented Worms EARTH SCIENCE >> Biosphere >> Zoology >> Sponges EARTH SCIENCE >> Biosphere >> Zoology >> Vertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Amoebae EARTH SCIENCE >> Biosphere >> Microbiota >> Bacteria EARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green Algae EARTH SCIENCE >> Biosphere >> Microbiota >> Ciliates EARTH SCIENCE >> Biosphere >> Microbiota >> Coccolithophore EARTH SCIENCE >> Biosphere >> Microbiota >> Diatoms EARTH SCIENCE >> Biosphere >> Microbiota >> Flagellates EARTH SCIENCE >> Biosphere >> Microbiota >> Foraminifers EARTH SCIENCE >> Biosphere >> Microbiota >> Microalgae EARTH SCIENCE >> Biosphere >> Microbiota >> Microphyte EARTH SCIENCE >> Biosphere >> Microbiota >> Phytoplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Plankton EARTH SCIENCE >> Biosphere >> Microbiota >> Protist EARTH SCIENCE >> Biosphere >> Microbiota >> Radiolarians EARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton EARTH SCIENCE >> Biosphere >> Vegetation >> Algae EARTH SCIENCE >> Biosphere >> Vegetation >> Flowering Plants EARTH SCIENCE >> Biosphere >> Vegetation >> Lichens EARTH SCIENCE >> Biosphere >> Vegetation >> Macroalgae EARTH SCIENCE >> Biosphere >> Vegetation >> Macrophyte EARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton

Taxonomy keyword cross-mapping (examples) Invertebrates Sponges Jellyfish Anemones Corals Flatworms Roundworms Segmented Worms Mollusks Arthropods Insects Arachnids Echinoderms Crustaceans Vertebrates Fish Amphibians Reptiles Birds Mammals invertebrate … S709 poriferan … S744 coelenterate … S737 coral … S738 nematode … S743 annelid … S mollusc … S740 cephalopod … S741 gastropod … S742 arthropod … S713 insect … S chelicerate … S echinoderm … S739 crustacean … S717 vertebrate … S649 fish … S754 amphibian … S reptile … S bird … S mammal … S GCMD listGEMET list

MarLIN - used for data discovery MarLIN - based on an Oracle database containing dataset, project, and survey descriptions, plus on-line links to data and web resources Holds metadata according to regional (ANZLIC and “Blue Pages”) standards, with additional agency-constructed fields (“extended ANZLIC”) Web interface for searching and metadata contribution/update, using HTML, Oracle Web Server and custom PL/SQL application Produces lists of datasets, or dataset reports, as requested Includes links to pre-formatted data “packets” (now) and to SQuID (in future), for access to the data NB: no data visualising capability, apart from “thumbnails” showing data extent

MarLIN - behind the scenes Some 25+ tables, holding the following: – text-based fields (e.g. title, abstract, contributors, references, etc.) – keywords, handled as numeric ID’s (including taxonomic keywords) – species/species groups, handled as CAAB codes – spatial extent, handled as bounding coordinates (max and min. latitude and longitude) – time extent, handled as earliest and latest collection date for items in the dataset – originator organisation, present custodian, survey, contact person, etc, handled as numeric ID’s Initial search set up by keyword/ID type, spatial coordinates, time period (if desired) Then search/browse by subject categories, keywords, taxon names, contributing project, vessel/voyage identifier, location of data, etc. Free text search also supported

MarLIN search interface

Example MarLIN search result - by taxonomic group subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item. Invertebrates Cephalopods Squids 1.. Crustaceans Prawns & Shrimps 2 Fishes 4.. Breams 1.. Dories 1.. Leatherjackets 1.. Perches 3.. Redfishes 1.. Roughies 1.. Snappers 4.. Whales 1

Example MarLIN search result - by species subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item Nototodarus gouldi.. Gould's squid Metanephrops boschmai.. Boschma's scampi Metanephrops velutinus.. velvet scampi Ibacus alticrenatus.. deepwater bug Ibacus pubescens.. [a shovel-nosed/slipper lobster] Saurida undosquamis.. brushtooth lizardfish Saurida sp. 2 [in Sainsbury et al, 1985].. grey lizardfish Gephyroberyx darwinii.. Darwin's roughy Beryx splendens.. alfonsino 1 (etc.)

Example MarLIN search result - dataset titles You searched on the following criteria: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf CAAB Species: Saurida undosquamis There are 3 datasets matching your criteria in MarLIN at this time. Click on the dataset title to view the metadata record for any dataset. Southern Surveyor Voyage SS 02/90 - Biological Data Overview Southern Surveyor Voyage SS 04/91 - Biological Data Overview Southern Surveyor Voyage SS 08/95 - Biological Data Overview

SQuID - data repository and visualisation tool Oracle relational database containing c. 45 tables (present version) Holds point, poly-line, and polygon based, geo-referenced data (also time and depth referenced) Client runs as Java applet, connects to Oracle data store by Remote Method Invocation (RMI) and JDBC Search by spatial coordinates, time period, data “stream” … can subset by survey if desired Retrieve atomic-level data for inspection or upload to user’s system Basic plotting routines provided, such as: – geographic distribution of data (sampling points, vessel tracks) – vertical plots (e.g. temperature, salinity, oxygen vs depth) – time-based plots (e.g. water temperature measurement through a voyage) – pie charts for catch composition by number or weight – length-frequency data, aggregated or by sex of individual Taxon handling using CAAB codes (system includes legacy data with obsolete codes) Links to MarLIN to display relevant metadata

SQuID user interface - version 1.0

Example SQuID search result

SQuID atomic level data - example

Time series data in SQuID

SQuID vs MarLIN / CAAB - two different approaches SQuID - a data-rich browser environment Large files uploaded to the browser to allow interactive functions (zoomable maps, on-demand display of sample details, cursor tracking, browser-generated plots) Disadvantages: more complex applet to load, longer waits for queries to be serviced, performance on user’s machine may be limiting MarLIN & CAAB - a minimal browser environment No reliance on JAVA version control, browser plugins etc, no load time at startup All processing takes place on the server (can maximise performance there) - less stringent requirements for users in hardware terms Disadvantage: less real-time interactivity provided (although some workarounds possible) … May look at a hybrid solution for SQuID v2 - prioritise what level of interactivity/data upload is really needed, handle more at server level

some considerations for OBIS... For agency-specific reasons, we have arrived at separate metadata/data systems. OBIS might want to integrate these two aspects more fully Automated generation/maintenance of metadata might be possible (at least in part) and is certainly desirable Where would OBIS metadata reside? (centrally or replicated or fully distributed?) - Australian “ASDD” is an example of a fully distributed system, NASA “GCMD” is a centralised one Need to decide on taxon handling for OBIS (names or codes), plus standard(s) for higher level searching OBIS software should aim to tolerate a diversity of agency- level systems, while encouraging/facilitating “best practice” data management

The End

CAAB web search