Presentation is loading. Please wait.

Presentation is loading. Please wait.

Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”,

Similar presentations


Presentation on theme: "Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”,"— Presentation transcript:

1 http://www.systematics.rdg.ac.uk/spice/ Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”, Reading, 15 July 20042 SPICE for Species 2000 Funded in the UK by the BBSRC/EPSRC Bioinformatics Initiative Universities of Cardiff & Reading http://www.systematics.rdg.ac.uk/spice/

2 Species 2000 The story so far... Species 2000 is an international collaborative project to create and provide access to an authoritative and up-to-date checklist and index to all the world’s species. How is it going to do this?

3 http://www.systematics.rdg.ac.uk/spice/ Species 2000 services to users Dynamic Checklist Annual Checklist Web site, including database links submitted by users or producers Distribution media, including downloaded data Index to species information (hyperlinks to SISs) Packaged functions providing services to other software

4 http://www.systematics.rdg.ac.uk/spice/ Species 2000 organisation Taxonomic hierarchy (or hierarchies) Species Global species databases (GSDs) and interim checklists: the species index GSD interim checklists Species information sources (SISs): regional faunas and floras, specialist or sectoral databases, web pages etc. SIS

5 http://www.systematics.rdg.ac.uk/spice/ Merging & Linking Merging The original databases are physically copied into a new combined database Linking The original databases remain separate, but are accessed through a single system

6 http://www.systematics.rdg.ac.uk/spice/ Merging 1.The original databases are physically copied into a new combined database. 2.The user interacts with the new combined database.

7 http://www.systematics.rdg.ac.uk/spice/ Linking 1.The user interacts with an access system which does not itself contain data. 2.When the user requests data, it is fetched from the appropriate database.

8 http://www.systematics.rdg.ac.uk/spice/ Architecture of Species 2000 User interface Data collector Wrapper GSD Wrapper GSD Wrapper GSD CAS (Common Access System) or “harness” Protocol Distributed array of databases

9 http://www.systematics.rdg.ac.uk/spice/ Need for communication Different people are building the various components of the system: –GSDs –wrappers –CAS –user interface We need to ensure they all have a common understanding of the data to avoid embarrassing mistakes

10 http://www.systematics.rdg.ac.uk/spice/ Database wrappers Only the interface to the CAS needs to speak CORBA Wrappers must: –Translate CAS requests into a form suitable for the GSD (e.g. SQL) and translate responses back –Deal with other kinds of heterogeneity, including schema heterogeneity

11 http://www.systematics.rdg.ac.uk/spice/ Data flow through a wrapper Divided wrapper GSD Wrapper interface CAS External wrapper XML Strings e.g. CGI

12 http://www.systematics.rdg.ac.uk/spice/ Common Data Model We need a Common Data Model (CDM) –A definition of the information being passed to and fro –Human-readable, not machine-readable –This is used as a reference when creating specific implementations for CGI/XML (DTD, XML Schema), Web Services, etc.

13 http://www.systematics.rdg.ac.uk/spice/ What does the CDM look like? It defines the input (“request”) and output (“response”) for six fundamental operations which the system needs to be able to carry out

14 http://www.systematics.rdg.ac.uk/spice/ Request Types 0-6 –Type 0: Get version of the CDM with which the GSD complies –Type 3: Get information about the GSD –Type 1: Search for a name in the GSD –Type 2: Fetch “standard data” about a chosen species –Type 4: Move up the taxonomic hierarchy –Type 5: Move down the taxonomic hierarchy

15 http://www.systematics.rdg.ac.uk/spice/ Type 0 Request Request: –(nothing) Response: –CDMVersion

16 http://www.systematics.rdg.ac.uk/spice/ Type 3 Request Request: –GSDIdentifier Response: –GSDInfo (a set of fields including its name, date of last editing, etc.)

17 http://www.systematics.rdg.ac.uk/spice/ Type 1 Request Request: –SearchString, SearchType (scientific name, common name, unknown), SearchLimit (including higher taxon, maximum number of names to return) Response: –Number, SpeciesName[0:N]

18 http://www.systematics.rdg.ac.uk/spice/ Type 2 Request Request: –Identifier, GSDIdentifier Response: –StandardData (approximately the same as the Standard Data defined by Species 2000 and seen by the user)

19 http://www.systematics.rdg.ac.uk/spice/ Type 4 Request Request: –Identifier, GSDIdentifier Response: –HigherTaxon[0:N]

20 http://www.systematics.rdg.ac.uk/spice/ Type 5 Request Request: –Identifier, SearchLimit Response: –Taxon[0:N]

21 http://www.systematics.rdg.ac.uk/spice/ The “standard data” This comprises the information about a species which Species 2000 wishes to provide: –AVCNameWithRefs –SynonymWithRefs –CommonNameWithRefs –Family (or other agreed higher taxon) –Comment –Scrutiny –DataLink (links to the GSD’s or other web pages) –Geography (list of places)

22 http://www.systematics.rdg.ac.uk/spice/ Where are we now? Is the Spice Project finished? –We have a fairly stable CDM (version 1.20 is about to be replaced with version 1.21) –XML DTD exists –Several CGI/XML implementations in Java and PHP, and a Web Service –We have a working Spice system –A few changes are anticipated: geographical information linking to further information sources infraspecific taxa

23 http://www.systematics.rdg.ac.uk/spice/ “Intelligent” linking Species 2000 is –not just a catalogue (which lists things) –It is an index (which points to things) It plans to provide links to take a user –from a species entry (from a GSD) –to further sources of information about that particular species (Species Information Sources or SISs)

24 http://www.systematics.rdg.ac.uk/spice/ “Intelligent” linking There are experimental “unintelligent” links already (as in the ILDIS GSD), which rely on exact name matching But there are issues in making links more intelligent

25 http://www.systematics.rdg.ac.uk/spice/ Data quality (again!) How do we know the information is reliable? One problem is the differing interpretation of species names (species concepts) in different resources

26 http://www.systematics.rdg.ac.uk/spice/ LITCHI Project A rule-based tool for the detection and repair of conflicts and merging of data in taxonomic databases

27 http://www.systematics.rdg.ac.uk/spice/ Summary of Litchi project We modelled the knowledge integrity rules in a taxonomic treatment. The knowledge tested is implicit in the assemblage of scientific names and synonyms used to represent each taxon. Practical uses include detecting and resolving taxonomic conflicts when merging or linking two databases. Version 2 now implemented focusses on the creation of “cross-maps”

28 http://www.systematics.rdg.ac.uk/spice/ Example 1 Checklist A Caesalpinia crista L. [accepted name] Checklist B Caesalpinia crista L. [accepted name] Caesalpinia bonduc (L.) Roxb. [accepted name] Caesalpinia crista L., p.p. [synonym]

29 http://www.systematics.rdg.ac.uk/spice/ Example 2 In the case of the species Cytisus scoparius Treatment A will list it as Cytisus scoparius (synonym Sarothamnus scoparius) Treatment B will list it as Sarothamnus scoparius (synonym Cytisus scoparius) Genus Cytisus Genus Sarothamnus Genus Cytisus Cytisus scoparius Sarothamnus scoparius Cytisus striatus Sarothamnus striatus Cytisus multiflorus Cytisus praecox Treatment A recognises one genus, Cytisus Treatment B recognises two genera, Cytisus and Sarothamnus

30 http://www.systematics.rdg.ac.uk/spice/ Cross-mapping So how can we make intelligent links work? One way to make links appear more intelligent is to create and maintain “cross-maps” which describe how one or more taxa in one resource (such as the Species 2000 index) relate to one or more taxa in another resource

31 http://www.systematics.rdg.ac.uk/spice/ Litchi 2.2 in use Checklist AChecklist B Rules Heuristics Concept relationships Cross-map Taxonomic intelligence Read into system Write Conflict detection Inference of concept relationships

32 http://www.systematics.rdg.ac.uk/spice/ More about cross-maps They may be created and maintained –manually by experts –automatically or semi-automatically by LITCHI (as above) –by monitoring the behaviour of users following species links –by analysing data sets describing the taxa, when sufficient such data is available, using the usual species taxonomy tools (phenetic and cladistic analyses)

33 http://www.systematics.rdg.ac.uk/spice/ More about cross-maps They may be held –by individual GSDs, describing how to link their species to selected related resources, as ILDIS has done for linking to the Northern Eurasia (aka USSR) database) –by Species 2000 as a repository and service to facilitate intelligent species links –by an “intelligent linking engine”, as planned for Species 2000 Europa to link its two hubs

34 http://www.systematics.rdg.ac.uk/spice/ A dream A system for managing intelligent species links using taxonomic concept relationships would maximise the potential of the plethora of species-based catalogues, indexes and rich species resources currently being assembled all over the world Perhaps on the Web, as with the current Spice/Species 2000 prototype Or...

35 http://www.systematics.rdg.ac.uk/spice/ The Grid Or maybe on the Grid –One of the aims of which is to provide access to such knowledge sources as species checklists, synonymy servers, rich species data sets, and cross-maps, for example in the Biodiversity World project


Download ppt "Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”,"

Similar presentations


Ads by Google