Tony Rees CSIRO Marine Research 29 November 2004

Slides:



Advertisements
Similar presentations
Tony Rees Divisional Data Centre CSIRO Marine Research, Australia Application of c-squares spatial indexing to an archive of remotely.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang.
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
ALEPH 500 Union Catalogue Overview Judy Levi Senior Product Analyst Ex Libris Ltd. November 2004.
Rapid spatial indexing and web mapping using the “c-squares” global grid Tony Rees Manager, Divisional Data Centre 23 March 2007 CSIRO.
TFACTS Private Provider Financial/Invoicing Overview 1.
OBIS Australia – Regional Node for the Ocean Biogeographic Information System (OBIS) OBIS Australia is an operational component of the Census of Marine.
Spatial Indexing, Search, and Mapping for Species level databases Tony Rees, CSIRO Marine and Atmospheric Research (CMAR), Hobart, Tasmania, Australia.
Tony Rees and Glenelg Smith Divisional Data Centre + Remote Sensing Facility CSIRO Marine Research, Australia Application of c-squares.
Databases & Data Warehouses Chapter 3 Database Processing.
This chapter is extracted from Sommerville’s slides. Text book chapter
Portal User Group Meeting September 14, Agenda Welcome Updates Reminders.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
OBIS and species distributions Tony Rees discussion presentation, March 2003 Some fundamental intentions for OBIS... –Choose any species and discover its.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
CSIRO Marine Research Data Centre linked databases - CAAB, MarLIN and Divisional Data Warehouse.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
 Dr. Syed Noman Hasany.  Review of known methodologies  Analysis of software requirements  Real-time software  Software cost, quality, testing and.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
CAAB and taxon management at CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
MarLIN: a research data metadatabase for CSIRO Marine Research Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart contact:
CAAB - Codes for Australian Aquatic Biota Tony Rees Divisional Data Centre CSIRO Marine Research, Hobart
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
System concept and development by: Tony Rees Divisional Data Centre CSIRO Marine Research, Australia c-squares - a new method for representing, querying,
C-squares concept: Data items are represented by the grid squares in which they are located 1: Data items2: Data items and relevant grid squares 3: Grid.
V7 Foundation Series Vignette Education Services.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Alec Lajoie (Peoples United Bank) & Clint Bagdon.
CYBER-GIS FOR SCIENTIFIC DISCOVERIES. Global Forest Change Hansen, M. C. et al (2013). High-Resolution Global Maps of 21st-Century Forest Cover Change.
Chapter 8 Using Document Collaboration, Integration, and Charting Tools Microsoft Word 2013.
Using Social Care Online: an overview
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
An Overview of Concepts and Navigation
Please review these important Webinar Etiquette guidelines
Flanders Marine Institute (VLIZ)
MICROSOFT OUTLOOK and Outlook service Provider
Lesson 6: Databases and Web Search Engines
GBIF Governing Board 20 12th Global Nodes Meeting
Materials Engineering Product Data Management (ePDM)
FedEx Billing Online (FBO) Non-Revenue Quick Guide
SpringerLink Training August 2010
Find Me the Money!!.
Tools of Software Development
Penn State Educational Programming Record (EPR) Guide
Title: MPS500 & Workstation (New System) Keycode Retrieval System (KRS) User Guide Generating and Retrieving Keycode License using URN.
To the ETS – PNG Continuation: Online Training Course
Patents e-Commerce Update: Public and Private PAIR
Lesson 6: Databases and Web Search Engines
The Relational Database Model
Patents e-Commerce Update: Public and Private PAIR
Data Warehousing Concepts
Information Retrieval and Web Design
Chapter 8 Using Document Collaboration and Integration Tools
Presentation transcript:

Tony Rees CSIRO Marine Research 29 November 2004 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System Tony Rees CSIRO Marine Research 29 November 2004 Phoebe Zhang Rutgers University, N.J.

OBIS Basics... OBIS – the Ocean Biogeographic Information System Single access point for distribution records for marine species from multiple sources over the internet, with onward access to analytical tools and maps (including correlations with environmental data etc.) – ultimately to be a 3-d and 4-d atlas of marine species distributions Designated role as the data and information management component of the Census of Marine Life (CoML operational lifetime: 2000–2010) OBIS (brief!) history Vision developed during a series of workshops, 1999–2000 8 initial data providers funded to develop content for OBIS, 2000–2001 Initial version of OBIS Portal went “live” in January 2002 – based at Rutgers University, N.J. (www.iobis.org) Additional technical development and expanding content, 2002 – ongoing Focus of this talk... “Behind the scenes” look at OBIS architecture, and how the OBIS Portal has evolved over the past 2+ years – focus on features supporting user searches. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

OBIS Version 1 (Jan 2002–Feb 2004) Mapping tool 2 Mapping tool 3 C-squares mapper 3 (optional): pass to 3rd party tools for mapping, etc. www user 1 data provider 1 1: submit search request 2: retrieve matching data www user 2 OBIS Portal data provider 2 www user 3 data provider 3 (etc.) (etc.) = custom database wrapper Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Advantages, disadvantages of this approach Technically simple to implement, portal simply relays queries and processes / integrates results No custodianship issues – data remain with providers Disadvantages System performance reliant on factors outside portal’s control – provider/s may be slow, off line, etc. at query time No prior knowledge of what species are worth querying on, how much data will be returned, etc. Have to wait for all data to be returned before passing to mappers, etc.; species handled serially (one at a time) Spatial searches slow (have to parse millions of point data records) No facility to search by taxonomic group, search for “near matches”, etc. (such facilities not supported at provider end) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Analogy with internet search engines (e.g. Google ® etc.) ... Would be impossibly slow to search 8 billion web pages in real time to service a user’s initial request Google, etc., construct locally held indexes (e.g. sorted by term etc.) and search these in the first instance – can provide very fast results Indexes are constructed by continuously crawling the web for new or updated content (note: may be a currency issue here) Also, Google constructs a local data cache of all content – remains accessible even if original provider is off line. --------------------------------- Equivalents for OBIS: a name index and a spatial index – to support name / category, and spatial searching; also a locally held data cache. Cache is built by crawling the remote data providers (and refreshed at intervals) Name and spatial indexes are built by parsing the Cache content. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

The OBIS Index The OBIS Index – actually a small relational database Main table “obis_species” has 1 row for every species name currently in OBIS (currently c. 110,000), i.e. substantially smaller than overall number of records in the system (currently 5m+ and rising) “obis_groups” table has the custom taxonomic hierarchy currently used in OBIS “obis_distributions” table has the spatial index – higher level concept than the GBIF index (which is a data item level listing); intended for rapid, concise taxon-level information to be returned prior to any actual data extraction. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Fragment of the “OBIS groups” (taxonomic categories) table Intention is to provide popular / recognisable groupings (not necessarily equivalent to strict phylum, order, family treatment) Hierarchical coding allows simple interrogation at any level of the hierarchy. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Underpinning the spatial index. allocation to global 0. 5º x 0 Underpinning the spatial index ... allocation to global 0.5º x 0.5º global grid squares, using “c-squares” hierarchical notation... 7500:499 7500 7500:4 7500:499:4 0.5º x 0.5º squares (shown in red) are current units of spatial indexing for OBIS 60º N 50º N 010º W 000º W Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Fragment of the spatial index... Index “knows” all the squares within which any species has data records (out of global set of 259,200) Now simple to retrieve either: - all species for a given square (at any level of the hierarchy) - all squares for a given species Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Spatial index also supports high level mapping direct from the index Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Spatial index also supports high level mapping direct from the index “Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster) Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired Also functions as a clickable GUI to query individual data points in a region. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Other information held in the name index... Full list of taxon names from data providers, plus additional “names without data” from the Catalogue of Life compilation (to allow matching on names even when no data yet available, status assessment, etc.) Metadata on every species (how many records, which data providers, what date range), for user display prior to “get OBIS data” request Taxonomic group allocation for all taxa, plus common names as available “Near match” versions of all names to support “fuzzy” search option Hiding / screening of species deemed “non-marine” – according to pre-formatted list/s, also of some “junk” data – e.g. “unknown species A” etc. – supplied by data providers Preliminary reconciliation of junior synonyms / known misspellings, based on Cat. of Life information as available Onward links to Cat. of Life where appropriate, for further taxonomic information, full synonymy, etc. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Putting it all together... OBIS home page – includes clickable map for spatial searching – currently set to 10º x 10º squares (could be finer as more data available) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Full scientific name / category search page (etc.) Examples: all fishes beginning with “a”, or “all whales”, or all species of “Lutjanus”... Common name, browse category searches also available Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Example search result for “all whales” (initial portion of page) Note, all results presented with metadata, “Quick maps”, etc... (all from index content) “Get OBIS data” link initiates a data extraction from the Cache (e.g. 1 – 40,000+ points) Also included: names without data (from Cat. of Life), plus “near matches” if applicable Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Revised architecture: OBIS Version 2 (March 2004-current) data provider 1 (etc.) data provider 2 data provider 3 Mapping tool 3 Mapping tool 2 Provider crawling C-squares mapper 3b: “get OBIS data” for relevant species, send to mappers, etc. OBIS Cache 3a (optional): “Quick map” for any species (“Stage 2” query) Index building www user 1 1: submit search request www user 2 OBIS Portal OBIS Index (“Stage 1” query) www user 3 (etc.) 2: retrieve matching names + metadata Cat. of Life Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Comparison between versions... New version supports far better user experience – improved performance, many new features, enhanced browse / preview functionality, etc. etc. Data cache and “metadata layer” (index) mean that most / all queries have effectively been run in advance, saving search time However, new system is technically more challenging to design / install / maintain, with consequent resourcing implications Also, move away from live point-of-origin queries (for performance reasons) means that index, cache, etc. must be continually updated (to avoid currency issues). Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Next steps... More data, data providers every month (scalability, performance monitoring issues) Completion / bedding down / more automation of current functionality (most is fairly new and still undergoing a degree of real-world testing) Desirable to support gazetteer, polygon-based spatial searching (how?) May wish to represent more than simple presence/absence data – biomass / numbers, tracking / tagging, effort, etc. – does this affect options for the index? Inclusion of some data screening / flagging at point (=Cache) level (currently, operates only at taxon, = index level) Plus ++ ?? (system and concepts are still to a degree an evolving entity). Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Thank You Contacts Additional information: Dr Tony Rees Manager, Divisional Data Centre CSIRO Marine Research, Australia Phone: +61 3 6232 5318 Email: Tony.Rees@csiro.au www.marine.csiro.au/datacentre Dr Phoebe Zhang OBIS Portal Manager IMCS, Rutgers, the State University of New Jersey, USA Phone: +1 732 932 6555 Email: phoebe@marine.rutgers.edu www.iobis.org Thank You Additional information: OBIS web site – www.iobis.org C-squares web site – www.marine.csiro.au/csquares

Supplementary slides

Intermediate quadrants: c-squares notation... 7500:499:4 Square “minimum” corner is at 59.5º N, 009.5º W, according to the following formula: 7500:499 is at 59º N, 009º W (7500:499, 7500:499) Next digit “4” indicates next intermediate quadrant Initial digit “7” indicates NW global quadrant. 4 3 2 1 3 4 1 2 7 1 Global quadrants: Intermediate quadrants: 5 3 2 1 4 3 1 2 3 4 Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

A “different” / rapid approach to spatial searching (programmatically speaking) Normal spatial query would be (example for a 10-degree square): select distinct species code (or tax_id) where lat between 50 and 60 and long between -10 and 0 Equivalent c-squares spatial query: where csquares like ‘%7500%’ ...runs faster, and makes use of efficiencies where multiple records occur in a single square (duplicates are eliminated during the index building process) Could also operate with a single, long (skinny) table of species : c-square pairs if advantageous (indexed on both columns) Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Now, can now do mapping (“Quick maps”) direct from the index – no need to get the data (in first instance) Example: for minke whale, Balaenoptera acutorostrata – 1,700 records in the system, in 400 squares ... codes for the latter: Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System

Example “quick map”... “Quick maps” use the CSIRO Marine Research c-squares mapper, accessed via a web call Again, efficiencies here in only sending the list of squares to be mapped rather than all the data points (may execute e.g. 5–50 times faster) Effectively a “data preview” function, prior to getting the data for more sophisticated mapping / analysis if desired Also functions as a clickable GUI to query individual data points in a region. Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System