1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist Richard White, Andrew Jones, Computer Science, Cardiff University, UK Frank Bisby Plant Sciences, University of Reading, UK
2October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis The Species 2000 programme Species 2000, together with its partner ITIS, operates a federated environment which: gathers data from specialist species data providers delivers the Catalogue of Life: Species 2000 global Dynamic Checklist (species; hierarchy); regional species checklist for Europe –(prototype for further regional hubs, etc) Plan to complete Catalogue in 2011
3October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Main topics The Species 2000 federated environment Interoperability conventions and standards adopted
4October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis The federation: organisation Species 2000 assembles sectors “side by side”: Taxonomic hierarchy (or hierarchies) Species Global species databases (GSDs) and interim checklists: the catalogue of life GSD interim checklists Species information sources (SISs): regional faunas and floras, specialist or sectoral databases, web pages etc. SIS
5October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Uses for the system on-line reference tool (available) index to further Web-based species resources (planned; rudiments implemented for some taxonomic groups) “synonymy server”, exposed as a Web service (available, but to be improved)
6October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Species 2000 home page User about to click on “Dynamic Checklist” …
7October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Dynamic Checklist search page User interested in Dwarf Gourami; knows its genus …
8October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Found some species User interested in Colisa laelia (Dwarf Gourami) and about to click on this name …
9October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Colisa laelia standard data (1) Scroll to bottom …
10October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Colisa laelia standard data (2) Follow further information link …
11October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Colisa laelia in FishBase Information from FishBase in this case
12October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Spice for Species 2000 Currently provides Common Access System (CAS) for Species 2000 implements a hub gathers data from providers via wrappers integrates and caches makes data available to users and other software
13October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Recent progress with Species 2000 (1) EuroCat project: added many new data providers for further taxonomic sectors improved Spice set up “Species 2000 europa” regional hub (using Spice) experimented with “cross-mapping”, using Litchi gained better understanding of the dynamics of developing and incorporating new GSDs New wrapper-writing resources made available
14October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Recent progress with Species 2000 (2) Current activities include: Secretariat at Reading At least 4 new databases have become available in the last few months: people are busy working on various sectors Annual checklist: Long term plan: snapshot of dynamic checklist Currently parallel development in Philippines ≥ 8 new databases being added 2007 expected ≥ 1,000,000 species
15October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Components of the architecture Main data and software components of the Catalogue: Autonomous species databases (GSDs) GSD wrappers “Hubs” (portals) to assemble data from wrappers; provide data to clients Interfaces for users for software Maintenance and administration software tools (e.g. metadatabase)
16October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Species 2000 protocols – overview How (GSDs) interoperate in this federation... four levels: 1.Organisational model for a federation in which data providers provide data about “taxonomic sectors”; hub assembles complete catalogue (see above) 2.Framework for information exchange based on a number of defined requests 3.Human-readable Common Data Model (CDM): abstract definition for requests; responses; data exchanged 4.Specific computer-readable interface definitions, implementing CDM
17October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Species 2000 protocols and data standards Activities at the “federation” level 1 described above. Levels 2, 3 and 4: Species 2000 defines internal data standards Intended to be open standards
18October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Interoperability level 2: Informal data request and response model Describes informally how information is exchanged: between federation components, including: data providers, the hub and software clients of the hub by means of (currently six) requests defined for specific purposes with correspondingly defined response data This model: avoids need for providers to handle general database queries treats GSDs as “black boxes”
19October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Request types The request types sent by the CAS to SPICE wrappers: 0:get version of CDM the wrapper implements 1:look up species name or ambiguous search string 2:get “standard data” for a given species name includes accepted name, synonym(s), common name(s), distribution data, reference(s), latest taxonomic scrutiny, and links to other online resources about the species 3:obtain metadata concerning source database & data provider 4:move one step up taxonomic hierarchy 5:move one step down taxonomic hierarchy
20October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Interoperability level 3: Formal data and request/response model Human-readable Common Data Model (CDM) for reference purposes provides abstract definition for the requests and responses, including parameters, etc candidate set of operations for retrieval of species-related data more generally defines the components of data transmitted and received Data model defined specifically for Species 2000 “standard data set” doesn’t define programming-language or technology- specific implementations (Also available: “Species 2000 standard data set”, which summarises CDM briefly)
21October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Interoperability level 4: Interface definitions Computer-readable interface definitions, following the CDM, for use with particular implementations, including Corba IDL, XML DTD and XML Schema for: requests from hub to wrappers requests from external client software to hub
22October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Requests from hub to wrappers Spice hub communicates with GSD wrappers using HTTP: “CGI” GET requests are sent to a wrapper, which returns an XML document in response An XML Schema (XSD) defines the specific XML requests and responses Corba used within SPICE; corresponding IDL document NB CDM 1.20 is being updated to reflect minor modifications recently made to XSD, etc.
23October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Requests from external client software to hub A SOAP Web Service to allow programmatic access to dynamic checklist (including by the user interface), to interrogate Spice global & European hubs: (location and definition may change) CAS Web Service version 1.0 informal definition & WSDL:
24October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Further information Species 2000 programme and Species 2000 & ITIS Catalogue of Life: Species 2000 protocols and practices: Spice: Biodiversity Software Repository at Cardiff for access to Spice, other software and some wrappers:
25October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Collaboration in open standards and software We would like to see future progress as a community effort for developing data standards interoperable software Especially interoperation with emerging standards, e.g. TCS
26October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Opportunities for standardisation We would welcome consideration of the request/response model as a useful data representation-independent basis for interrogating sources of species related information
27October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Join us in enhancing SPICE & associated software Areas for work include sophisticated management tools revision of SPICE code-base reusable software for wrapper writers addition of new protocols and schemas
28October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Towards the Species Banks of the future Some Species 2000 GSDs currently provide “onward links” to rich species information Plan to investigate link-bases in which the Catalogue of Life can play an important part in the species banks of the future
29October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Date for your diaries 1-day symposium to discuss Species 2000 Phase 2: progressing beyond 1 million species to the target 1.75 million The University of Reading, UK March 2007 (probably 29 th )
30October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Summary These protocols and standards are intended to be open and available for others to use when building similar federated information systems Our 6 operations are a candidate set for interchange of taxonomic data (possibly needing augmentation) They are described further in Species 2000 data standards documents at:
31October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Acknowledgements Funding: BBSRC, European Commission, GBIF Species 2000 Project Team and Directors Data providers