Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tony Rees CSIRO Marine Research 29 November 2004

Similar presentations


Presentation on theme: "Tony Rees CSIRO Marine Research 29 November 2004"— Presentation transcript:

1 Tony Rees CSIRO Marine Research 29 November 2004
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang Rutgers University, N.J.

2 OBIS Basics... OBIS – the Ocean Biogeographic Information System
Single access point for distribution records for marine species from multiple sources over the internet, with onward access to analytical tools and maps (including correlations with environmental data etc.) – ultimately to be a 3-d and 4-d atlas of marine species distributions Designated role as the data and information management component of the Census of Marine Life (CoML operational lifetime: 2000–2010) OBIS (brief!) history Vision developed during a series of workshops, 1999–2000 8 initial data providers funded to develop content for OBIS, 2000–2001 Initial version of OBIS Portal went “live” in January 2002 – based at Rutgers University, N.J. ( Additional technical development and expanding content, 2002 – ongoing Focus of this talk... “Behind the scenes” look at OBIS architecture, and how the OBIS Portal has evolved over the past 2+ years – focus on features supporting user searches. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

3 OBIS Version 1 (Jan 2002–Feb 2004)
Mapping tool 2 Mapping tool 3 C-squares mapper 3 (optional): pass to 3rd party tools for mapping, etc. www user 1 data provider 1 1: submit search request 2: retrieve matching data www user 2 OBIS Portal data provider 2 www user 3 data provider 3 (etc.) (etc.) = custom database wrapper Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

4 Advantages, disadvantages of this approach
Technically simple to implement, portal simply relays queries and processes / integrates results No custodianship issues – data remain with providers Disadvantages System performance reliant on factors outside portal’s control – provider/s may be slow, off line, etc. at query time No prior knowledge of what species are worth querying on, how much data will be returned, etc. Have to wait for all data to be returned before passing to mappers, etc.; species handled serially (one at a time) Spatial searches slow (have to parse millions of point data records) No facility to search by taxonomic group, search for “near matches”, etc. (such facilities not supported at provider end) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

5 Analogy with internet search engines (e.g. Google ® etc.)
... Would be impossibly slow to search 8 billion web pages in real time to service a user’s initial request Google, etc., construct locally held indexes (e.g. sorted by term etc.) and search these in the first instance – can provide very fast results Indexes are constructed by continuously crawling the web for new or updated content (note: may be a currency issue here) Also, Google constructs a local data cache of all content – remains accessible even if original provider is off line. Equivalents for OBIS: a name index and a spatial index – to support name / category, and spatial searching; also a locally held data cache. Cache is built by crawling the remote data providers (and refreshed at intervals) Name and spatial indexes are built by parsing the Cache content. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

6 The OBIS Index The OBIS Index – actually a small relational database
Main table “obis_species” has 1 row for every species name currently in OBIS (currently c. 110,000), i.e. substantially smaller than overall number of records in the system (currently 5m+ and rising) “obis_groups” table has the custom taxonomic hierarchy currently used in OBIS “obis_distributions” table has the spatial index – higher level concept than the GBIF index (which is a data item level listing); intended for rapid, concise taxon-level information to be returned prior to any actual data extraction. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

7 Fragment of the “OBIS groups” (taxonomic categories) table
Intention is to provide popular / recognisable groupings (not necessarily equivalent to strict phylum, order, family treatment) Hierarchical coding allows simple interrogation at any level of the hierarchy. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

8 Underpinning the spatial index. allocation to global 0. 5º x 0
Underpinning the spatial index ... allocation to global 0.5º x 0.5º global grid squares, using “c-squares” hierarchical notation... 7500:499 7500 7500:4 7500:499:4 0.5º x 0.5º squares (shown in red) are current units of spatial indexing for OBIS 60º N 50º N 010º W 000º W Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

9 Fragment of the spatial index...
Index “knows” all the squares within which any species has data records (out of global set of 259,200) Now simple to retrieve either: - all species for a given square (at any level of the hierarchy) - all squares for a given species Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

10 Spatial index also supports high level mapping direct from the index
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

11 Spatial index also supports high level mapping direct from the index
“Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster) Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired Also functions as a clickable GUI to query individual data points in a region. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

12 Other information held in the name index...
Full list of taxon names from data providers, plus additional “names without data” from the Catalogue of Life compilation (to allow matching on names even when no data yet available, status assessment, etc.) Metadata on every species (how many records, which data providers, what date range), for user display prior to “get OBIS data” request Taxonomic group allocation for all taxa, plus common names as available “Near match” versions of all names to support “fuzzy” search option Hiding / screening of species deemed “non-marine” – according to pre-formatted list/s, also of some “junk” data – e.g. “unknown species A” etc. – supplied by data providers Preliminary reconciliation of junior synonyms / known misspellings, based on Cat. of Life information as available Onward links to Cat. of Life where appropriate, for further taxonomic information, full synonymy, etc. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

13 Putting it all together...
OBIS home page – includes clickable map for spatial searching – currently set to 10º x 10º squares (could be finer as more data available) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

14 Full scientific name / category search page
(etc.) Examples: all fishes beginning with “a”, or “all whales”, or all species of “Lutjanus”... Common name, browse category searches also available Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

15 Example search result for “all whales” (initial portion of page)
Note, all results presented with metadata, “Quick maps”, etc... (all from index content) “Get OBIS data” link initiates a data extraction from the Cache (e.g. 1 – 40,000+ points) Also included: names without data (from Cat. of Life), plus “near matches” if applicable Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

16 Revised architecture: OBIS Version 2 (March 2004-current)
data provider 1 (etc.) data provider 2 data provider 3 Mapping tool 3 Mapping tool 2 Provider crawling C-squares mapper 3b: “get OBIS data” for relevant species, send to mappers, etc. OBIS Cache 3a (optional): “Quick map” for any species (“Stage 2” query) Index building www user 1 1: submit search request www user 2 OBIS Portal OBIS Index (“Stage 1” query) www user 3 (etc.) 2: retrieve matching names + metadata Cat. of Life Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

17 Comparison between versions...
New version supports far better user experience – improved performance, many new features, enhanced browse / preview functionality, etc. etc. Data cache and “metadata layer” (index) mean that most / all queries have effectively been run in advance, saving search time However, new system is technically more challenging to design / install / maintain, with consequent resourcing implications Also, move away from live point-of-origin queries (for performance reasons) means that index, cache, etc. must be continually updated (to avoid currency issues). Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

18 Next steps... More data, data providers every month (scalability, performance monitoring issues) Completion / bedding down / more automation of current functionality (most is fairly new and still undergoing a degree of real-world testing) Desirable to support gazetteer, polygon-based spatial searching (how?) May wish to represent more than simple presence/absence data – biomass / numbers, tracking / tagging, effort, etc. – does this affect options for the index? Inclusion of some data screening / flagging at point (=Cache) level (currently, operates only at taxon, = index level) Plus ++ ?? (system and concepts are still to a degree an evolving entity). Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

19 Thank You Contacts Additional information:
Dr Tony Rees Manager, Divisional Data Centre CSIRO Marine Research, Australia Phone: Dr Phoebe Zhang OBIS Portal Manager IMCS, Rutgers, the State University of New Jersey, USA Phone: Thank You Additional information: OBIS web site – C-squares web site –

20 Supplementary slides

21 Intermediate quadrants:
c-squares notation... 7500:499:4 Square “minimum” corner is at 59.5º N, 009.5º W, according to the following formula: 7500:499 is at 59º N, 009º W (7500:499, 7500:499) Next digit “4” indicates next intermediate quadrant Initial digit “7” indicates NW global quadrant. 4 3 2 1 3 4 1 2 Global quadrants: Intermediate quadrants: 2 1 4 3 1 2 3 4 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

22 A “different” / rapid approach to spatial searching (programmatically speaking)
Normal spatial query would be (example for a 10-degree square): select distinct species code (or tax_id) where lat between 50 and 60 and long between -10 and 0 Equivalent c-squares spatial query: where csquares like ‘%7500%’ ...runs faster, and makes use of efficiencies where multiple records occur in a single square (duplicates are eliminated during the index building process) Could also operate with a single, long (skinny) table of species : c-square pairs if advantageous (indexed on both columns) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

23 Now, can now do mapping (“Quick maps”) direct from the index – no need to get the data (in first instance) Example: for minke whale, Balaenoptera acutorostrata – 1,700 records in the system, in 400 squares ... codes for the latter: Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

24 Example “quick map”... “Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster) Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired Also functions as a clickable GUI to query individual data points in a region. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System


Download ppt "Tony Rees CSIRO Marine Research 29 November 2004"

Similar presentations


Ads by Google