Download presentation
Presentation is loading. Please wait.
Published byCharla Watts Modified over 9 years ago
1
On-line biological data concepts at CSIRO Marine Research, Australia Tony Rees & Kim Finney Divisional Data Centre CSIRO Marine Research, Hobart, Australia http://www.marine.csiro.au/datacentre/
2
Our website: http://www.marine.csiro.au/datacentre/
3
Pre-existing situation at CMR (before 1997) Data in a variety of databases and flat files No metadata or digital documentation No web access to any data or metadata CAAB (taxon coding system) in existence but coverage patchy and compliance variable
4
Our implementation path Stage 1 (1997-2000)... Construct a searchable, web-accessible metadata system and start population it with information - MarLIN v1 Upgrade CAAB to form a comprehensive taxon dictionary for MarLIN (also accessible by SQuID) Build a pilot data store and visualisation system with a web- driven GUI (Java applet) - SQuID v1 Stage 2 (2000-)... Build SQuID v2 (onwards) to become a comprehensive data store, with upgraded links to MarLIN and CAAB Implement linkage between MarLIN and Australia-wide, distributed metadata search system Stage 3… ???
5
Our system overview Subsets of information shared with other metadata directory systems Entry point to data Display relevant metadata Data directory (metadatabase) - holds info at “dataset” level (e.g. survey, species range) Master data storage (includes index layer) - holds info at the atomic data level Taxon dictionary
6
Digression #1: Taxon matching Simplistic view: – text match on one field (“scientific name”) or two (genus + species) More comprehensive approach: – 10 or more fields required, e.g. in CAAB we define the following: Genus Subgenus Species Qualifier also need to flag: Subspecies - Is botanical or zoological code applicable? Variety - Species name latin or informal (“sp. A”, etc.)? Original Author/s - Has name changed from original? (even if Original Date no revising author/date stored) Revising Author/s Revision Date Authority Addendum Examples from our database: Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton & Oates, 1983 … a coccolithophorid Heteroclinus sp. 1 [in Gomon et al, 1994].. Kuiter's weedfish
7
Taxon matching … continued We have standardised on taxon codes, rather than names for data storage and matching … names are stored as an attribute of the code (and can be updated in the future as needed) Our “CAAB” coding system has evolved over 20+ years - earlier generations of codes are maintained on the system New web-based access facility for retrieving latest name for a code, searching for a taxon, etc. Same CAAB codes are also used by other marine science/fisheries agencies around Australia Facility newly implemented in CAAB to hold ITIS codes, for cross-reference to international systems in the future
8
CAAB services available Retrieve current sci. name, common name(s), taxon code, taxon report CAAB user interface Initiate a MarLIN search, ITIS report, FishBase report User searches by scientific name, common name or taxon code (or portion thereof) List taxa by CAAB category or family Application- level requests Generate scientific name, common name, current code (if applicable) for a given taxon code Call a CAAB taxon report List taxa matching query Translate an ITIS number to a CAAB code (or vice versa)
9
CAAB web interface (current version)
10
Digression #2: taxonomy keywords CAAB uses “major categories” (mostly = phyla) MarLIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of CAAB codes (in current implementation) NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS) - c. 50 currently relevant … could also cross-map to GEMET (EC) list (c.200) EARTH SCIENCE >> Biosphere >> Zoology >> Amphibians EARTH SCIENCE >> Biosphere >> Zoology >> Anemones EARTH SCIENCE >> Biosphere >> Zoology >> Arachnids EARTH SCIENCE >> Biosphere >> Zoology >> Arthropods EARTH SCIENCE >> Biosphere >> Zoology >> Birds EARTH SCIENCE >> Biosphere >> Zoology >> Centipedes EARTH SCIENCE >> Biosphere >> Zoology >> Corals EARTH SCIENCE >> Biosphere >> Zoology >> Crustaceans EARTH SCIENCE >> Biosphere >> Zoology >> Echinoderms EARTH SCIENCE >> Biosphere >> Zoology >> Fish EARTH SCIENCE >> Biosphere >> Zoology >> Flatworms EARTH SCIENCE >> Biosphere >> Zoology >> Insects EARTH SCIENCE >> Biosphere >> Zoology >> Invertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Jellyfish EARTH SCIENCE >> Biosphere >> Zoology >> Mammals EARTH SCIENCE >> Biosphere >> Zoology >> Millipedes EARTH SCIENCE >> Biosphere >> Zoology >> Mollusks EARTH SCIENCE >> Biosphere >> Zoology >> Reptiles EARTH SCIENCE >> Biosphere >> Zoology >> Roundworms EARTH SCIENCE >> Biosphere >> Zoology >> Segmented Worms EARTH SCIENCE >> Biosphere >> Zoology >> Sponges EARTH SCIENCE >> Biosphere >> Zoology >> Vertebrates EARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Amoebae EARTH SCIENCE >> Biosphere >> Microbiota >> Bacteria EARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green Algae EARTH SCIENCE >> Biosphere >> Microbiota >> Ciliates EARTH SCIENCE >> Biosphere >> Microbiota >> Coccolithophore EARTH SCIENCE >> Biosphere >> Microbiota >> Diatoms EARTH SCIENCE >> Biosphere >> Microbiota >> Flagellates EARTH SCIENCE >> Biosphere >> Microbiota >> Foraminifers EARTH SCIENCE >> Biosphere >> Microbiota >> Microalgae EARTH SCIENCE >> Biosphere >> Microbiota >> Microphyte EARTH SCIENCE >> Biosphere >> Microbiota >> Phytoplankton EARTH SCIENCE >> Biosphere >> Microbiota >> Plankton EARTH SCIENCE >> Biosphere >> Microbiota >> Protist EARTH SCIENCE >> Biosphere >> Microbiota >> Radiolarians EARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton EARTH SCIENCE >> Biosphere >> Vegetation >> Algae EARTH SCIENCE >> Biosphere >> Vegetation >> Flowering Plants EARTH SCIENCE >> Biosphere >> Vegetation >> Lichens EARTH SCIENCE >> Biosphere >> Vegetation >> Macroalgae EARTH SCIENCE >> Biosphere >> Vegetation >> Macrophyte EARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton
11
Taxonomy keyword cross-mapping (examples) Invertebrates Sponges Jellyfish Anemones Corals Flatworms Roundworms Segmented Worms Mollusks Arthropods Insects Arachnids Echinoderms Crustaceans Vertebrates Fish Amphibians Reptiles Birds Mammals invertebrate … S709 poriferan … S744 coelenterate … S737 coral … S738 nematode … S743 annelid … S711 ++ mollusc … S740 cephalopod … S741 gastropod … S742 arthropod … S713 insect … S719 ++ chelicerate … S714 ++ echinoderm … S739 crustacean … S717 vertebrate … S649 fish … S754 amphibian … S 650 ++ reptile … S691 ++ bird … S654 ++ mammal … S 664 ++ GCMD listGEMET list
12
MarLIN - used for data discovery MarLIN - based on an Oracle database containing dataset, project, and survey descriptions, plus on-line links to data and web resources Holds metadata according to regional (ANZLIC and “Blue Pages”) standards, with additional agency-constructed fields (“extended ANZLIC”) Web interface for searching and metadata contribution/update, using HTML, Oracle Web Server and custom PL/SQL application Produces lists of datasets, or dataset reports, as requested Includes links to pre-formatted data “packets” (now) and to SQuID (in future), for access to the data NB: no data visualising capability, apart from “thumbnails” showing data extent
13
MarLIN - behind the scenes Some 25+ tables, holding the following: – text-based fields (e.g. title, abstract, contributors, references, etc.) – keywords, handled as numeric ID’s (including taxonomic keywords) – species/species groups, handled as CAAB codes – spatial extent, handled as bounding coordinates (max and min. latitude and longitude) – time extent, handled as earliest and latest collection date for items in the dataset – originator organisation, present custodian, survey, contact person, etc, handled as numeric ID’s Initial search set up by keyword/ID type, spatial coordinates, time period (if desired) Then search/browse by subject categories, keywords, taxon names, contributing project, vessel/voyage identifier, location of data, etc. Free text search also supported
14
MarLIN search interface
15
Example MarLIN search result - by taxonomic group subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item. Invertebrates 4.... Cephalopods 1...... Squids 1.. Crustaceans 2.... Prawns & Shrimps 2 Fishes 4.. Breams 1.. Dories 1.. Leatherjackets 1.. Perches 3.. Redfishes 1.. Roughies 1.. Snappers 4.. Whales 1
16
Example MarLIN search result - by species subject categories | custodian organisations | vessels | voyages | projects | taxonomic groups | species | habitats | parameters | equipment The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24, East=122) Click on any hyperlink to see the full listing for that item. 23 636004 Nototodarus gouldi.. Gould's squid 1 28 786002 Metanephrops boschmai.. Boschma's scampi 1 28 786005 Metanephrops velutinus.. velvet scampi 1 28 821001 Ibacus alticrenatus.. deepwater bug 1 28 821002 Ibacus pubescens.. [a shovel-nosed/slipper lobster] 1 37 118001 Saurida undosquamis.. brushtooth lizardfish 3 37 118016 Saurida sp. 2 [in Sainsbury et al, 1985].. grey lizardfish 3 37 255004 Gephyroberyx darwinii.. Darwin's roughy 1 37 258002 Beryx splendens.. alfonsino 1 (etc.)
17
Example MarLIN search result - dataset titles You searched on the following criteria: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf CAAB Species: 37 118001 - Saurida undosquamis There are 3 datasets matching your criteria in MarLIN at this time. Click on the dataset title to view the metadata record for any dataset. Southern Surveyor Voyage SS 02/90 - Biological Data Overview Southern Surveyor Voyage SS 04/91 - Biological Data Overview Southern Surveyor Voyage SS 08/95 - Biological Data Overview ------------ -----------
18
SQuID - data repository and visualisation tool Oracle relational database containing c. 45 tables (present version) Holds point, poly-line, and polygon based, geo-referenced data (also time and depth referenced) Client runs as Java applet, connects to Oracle data store by Remote Method Invocation (RMI) and JDBC Search by spatial coordinates, time period, data “stream” … can subset by survey if desired Retrieve atomic-level data for inspection or upload to user’s system Basic plotting routines provided, such as: – geographic distribution of data (sampling points, vessel tracks) – vertical plots (e.g. temperature, salinity, oxygen vs depth) – time-based plots (e.g. water temperature measurement through a voyage) – pie charts for catch composition by number or weight – length-frequency data, aggregated or by sex of individual Taxon handling using CAAB codes (system includes legacy data with obsolete codes) Links to MarLIN to display relevant metadata
19
SQuID user interface - version 1.0
20
Example SQuID search result
21
SQuID atomic level data - example
22
Time series data in SQuID
23
SQuID vs MarLIN / CAAB - two different approaches SQuID - a data-rich browser environment Large files uploaded to the browser to allow interactive functions (zoomable maps, on-demand display of sample details, cursor tracking, browser-generated plots) Disadvantages: more complex applet to load, longer waits for queries to be serviced, performance on user’s machine may be limiting MarLIN & CAAB - a minimal browser environment No reliance on JAVA version control, browser plugins etc, no load time at startup All processing takes place on the server (can maximise performance there) - less stringent requirements for users in hardware terms Disadvantage: less real-time interactivity provided (although some workarounds possible) … May look at a hybrid solution for SQuID v2 - prioritise what level of interactivity/data upload is really needed, handle more at server level
24
some considerations for OBIS... For agency-specific reasons, we have arrived at separate metadata/data systems. OBIS might want to integrate these two aspects more fully Automated generation/maintenance of metadata might be possible (at least in part) and is certainly desirable Where would OBIS metadata reside? (centrally or replicated or fully distributed?) - Australian “ASDD” is an example of a fully distributed system, NASA “GCMD” is a centralised one Need to decide on taxon handling for OBIS (names or codes), plus standard(s) for higher level searching OBIS software should aim to tolerate a diversity of agency- level systems, while encouraging/facilitating “best practice” data management
25
The End
26
CAAB web search
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.