Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang.

Similar presentations


Presentation on theme: "Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang."— Presentation transcript:

1 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang Rutgers University, N.J.

2 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System OBIS Basics... OBIS – the Ocean Biogeographic Information System  Single access point for distribution records for marine species from multiple sources over the internet, with onward access to analytical tools and maps (including correlations with environmental data etc.) – ultimately to be a 3-d and 4-d atlas of marine species distributions  Designated role as the data and information management component of the Census of Marine Life (CoML operational lifetime: 2000–2010) OBIS (brief!) history  Vision developed during a series of workshops, 1999–2000  8 initial data providers funded to develop content for OBIS, 2000–2001  Initial version of OBIS Portal went “live” in January 2002 – based at Rutgers University, N.J. (www.iobis.org)www.iobis.org  Additional technical development and expanding content, 2002 – ongoing Focus of this talk...  “Behind the scenes” look at OBIS architecture, and how the OBIS Portal has evolved over the past 2+ years – focus on features supporting user searches.

3 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System OBIS Version 1 (Jan 2002–Feb 2004) data provider 1 (etc.) Mapping tool 3 Mapping tool 2 OBIS Portal C-squares mapper 2: retrieve matching data = custom database wrapper www user 2 www user 3 (etc.) www user 1 data provider 2 data provider 3 1: submit search request 3 (optional): pass to 3 rd party tools for mapping, etc.

4 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Advantages, disadvantages of this approach Advantages  Technically simple to implement, portal simply relays queries and processes / integrates results  No custodianship issues – data remain with providers Disadvantages  System performance reliant on factors outside portal’s control – provider/s may be slow, off line, etc. at query time  No prior knowledge of what species are worth querying on, how much data will be returned, etc.  Have to wait for all data to be returned before passing to mappers, etc.; species handled serially (one at a time)  Spatial searches slow (have to parse millions of point data records)  No facility to search by taxonomic group, search for “near matches”, etc. (such facilities not supported at provider end)

5 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Analogy with internet search engines (e.g. Google ® etc.)... Would be impossibly slow to search 8 billion web pages in real time to service a user’s initial request - Google, etc., construct locally held indexes (e.g. sorted by term etc.) and search these in the first instance – can provide very fast results - Indexes are constructed by continuously crawling the web for new or updated content (note: may be a currency issue here) - Also, Google constructs a local data cache of all content – remains accessible even if original provider is off line. --------------------------------- Equivalents for OBIS: a name index and a spatial index – to support name / category, and spatial searching; also a locally held data cache. -Cache is built by crawling the remote data providers (and refreshed at intervals) -Name and spatial indexes are built by parsing the Cache content.

6 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System The OBIS Index The OBIS Index – actually a small relational database  Main table “obis_species” has 1 row for every species name currently in OBIS (currently c. 110,000), i.e. substantially smaller than overall number of records in the system (currently 5m+ and rising)  “obis_groups” table has the custom taxonomic hierarchy currently used in OBIS  “obis_distributions” table has the spatial index – higher level concept than the GBIF index (which is a data item level listing); intended for rapid, concise taxon-level information to be returned prior to any actual data extraction.

7 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Fragment of the “OBIS groups” (taxonomic categories) table  Intention is to provide popular / recognisable groupings (not necessarily equivalent to strict phylum, order, family treatment)  Hierarchical coding allows simple interrogation at any level of the hierarchy.

8 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Underpinning the spatial index... allocation to global 0.5º x 0.5º global grid squares, using “c-squares” hierarchical notation...  0.5º x 0.5º squares (shown in red) are current units of spatial indexing for OBIS 7500 7500:4 7500:499 7500:499:4 50º N 60º N 010º W000º W

9 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Fragment of the spatial index...  Index “knows” all the squares within which any species has data records (out of global set of 259,200)  Now simple to retrieve either: - all species for a given square (at any level of the hierarchy) - all squares for a given species

10 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Spatial index also supports high level mapping direct from the index

11 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Spatial index also supports high level mapping direct from the index  “Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call  Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster)  Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired  Also functions as a clickable GUI to query individual data points in a region.

12 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Other information held in the name index...  Full list of taxon names from data providers, plus additional “names without data” from the Catalogue of Life compilation (to allow matching on names even when no data yet available, status assessment, etc.)  Metadata on every species (how many records, which data providers, what date range), for user display prior to “get OBIS data” request  Taxonomic group allocation for all taxa, plus common names as available  “Near match” versions of all names to support “fuzzy” search option  Hiding / screening of species deemed “non-marine” – according to pre- formatted list/s, also of some “junk” data – e.g. “unknown species A” etc. – supplied by data providers  Preliminary reconciliation of junior synonyms / known misspellings, based on Cat. of Life information as available  Onward links to Cat. of Life where appropriate, for further taxonomic information, full synonymy, etc.

13 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Putting it all together...  OBIS home page – includes clickable map for spatial searching – currently set to 10º x 10º squares (could be finer as more data available)

14 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Full scientific name / category search page  Examples: all fishes beginning with “a”, or “all whales”, or all species of “Lutjanus”...  Common name, browse category searches also available  (etc.)

15 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Example search result for “all whales” (initial portion of page)  Note, all results presented with metadata, “Quick maps”, etc... (all from index content)  “Get OBIS data” link initiates a data extraction from the Cache (e.g. 1 – 40,000+ points)  Also included: names without data (from Cat. of Life), plus “near matches” if applicable

16 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Revised architecture: OBIS Version 2 (March 2004-current) Mapping tool 3 Mapping tool 2 OBIS Portal C-squares mapper 2: retrieve matching names + metadata www user 2 www user 3 (etc.) www user 1 data provider 1 (etc.) data provider 2 data provider 3 1: submit search request 3a (optional): “Quick map” for any species OBIS Cache OBIS Index 3b: “get OBIS data” for relevant species, send to mappers, etc. Cat. of Life Provider crawling Index building (“Stage 1” query) (“Stage 2” query)

17 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Comparison between versions...  New version supports far better user experience – improved performance, many new features, enhanced browse / preview functionality, etc. etc.  Data cache and “metadata layer” (index) mean that most / all queries have effectively been run in advance, saving search time  However, new system is technically more challenging to design / install / maintain, with consequent resourcing implications  Also, move away from live point-of-origin queries (for performance reasons) means that index, cache, etc. must be continually updated (to avoid currency issues).

18 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Next steps...  More data, data providers every month (scalability, performance monitoring issues)  Completion / bedding down / more automation of current functionality (most is fairly new and still undergoing a degree of real-world testing)  Desirable to support gazetteer, polygon-based spatial searching (how?)  May wish to represent more than simple presence/absence data – biomass / numbers, tracking / tagging, effort, etc. – does this affect options for the index?  Inclusion of some data screening / flagging at point (=Cache) level (currently, operates only at taxon, = index level)  Plus ++ ?? (system and concepts are still to a degree an evolving entity).

19 Thank You Contacts Dr Tony Rees Manager, Divisional Data Centre CSIRO Marine Research, Australia Phone: +61 3 6232 5318 Email:Tony.Rees@csiro.au www.marine.csiro.au/datacentre Additional information: OBIS web site – www.iobis.org C-squares web site – www.marine.csiro.au/csquares Dr Phoebe Zhang OBIS Portal Manager IMCS, Rutgers, the State University of New Jersey, USA Phone: +1 732 932 6555 Email:phoebe@marine.rutgers.edu www.iobis.org

20 Supplementary slides

21 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System c-squares notation...  Square “minimum” corner is at 59.5º N, 009.5º W, according to the following formula:  7500:499 is at 59º N, 009º W (7500:499, 7500:499)  Next digit “4” indicates next intermediate quadrant  Initial digit “7” indicates NW global quadrant. 7500:499:4 Global quadrants: 7 1 5 3 Intermediate quadrants: 4 3 2 1 3 4 1 2 2 1 4 3 1 2 3 4

22 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System A “different” / rapid approach to spatial searching (programmatically speaking)  Normal spatial query would be (example for a 10-degree square): select distinct species code (or tax_id) where lat between 50 and 60 and long between -10 and 0  Equivalent c-squares spatial query: select distinct species code (or tax_id) where csquares like ‘%7500%’...runs faster, and makes use of efficiencies where multiple records occur in a single square (duplicates are eliminated during the index building process)  Could also operate with a single, long (skinny) table of species : c-square pairs if advantageous (indexed on both columns)

23 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Now, can now do mapping (“Quick maps”) direct from the index – no need to get the data (in first instance)  Example: for minke whale, Balaenoptera acutorostrata – 1,700 records in the system, in 400 squares... codes for the latter:

24 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Example “quick map”...  “Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call  Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster)  Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired  Also functions as a clickable GUI to query individual data points in a region.


Download ppt "Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang."

Similar presentations


Ads by Google