Presentation is loading. Please wait.

Presentation is loading. Please wait.

The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Malcolm Scoble The Natural History Museum

Similar presentations


Presentation on theme: "The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Malcolm Scoble The Natural History Museum"— Presentation transcript:

1 The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Andrew.C.Jones@cs.cardiff.ac.uk Malcolm Scoble The Natural History Museum M.Scoble@nhm.ac.uk

2 2Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Purpose of talk Malcolm & Andrew are both investigators in BiodiversityWorld (BDW) There are many problems BDW doesnt solve yet … … and the funding runs out tomorrow! Well present –BiodiversityWorld as a framework to support biodiversity research –Other projects in which biodiversity informatics problems have been addressed individually Major challenge: draw these disparate efforts together

3 Part 1 (Andrew Jones)

4 4Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Why Biodiversity Informatics is hard Need to integrate data & tools of different kinds for interesting in silico analyses Various computer science issues, e.g. –Human-Computer Interaction Design of environments to support scientific research –Interoperability –Complexity & heterogeneity of data Differences of scientific opinion Data quality problems

5 5Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 The BiodiversityWorld project 3 year e-Science project funded by BBSRC Partners: The University of Reading, Cardiff University, The Natural History Museum, Southampton University Aim: –Build a Biodiversity Grid (Problem Solving Environment to support Biodiversity research) –Support discovery & use of arbitrary tools & data sources for interesting in silico experiments –Provide environment to get beyond the cutting and pasting into Word documents approach to data integration and analysis

6 6Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Example problems for BiodiversityWorld How should conservation efforts be concentrated? –(example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? –(example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in selection among possible phylogenetic trees? –(example of Phylogenetic Analysis & Palaeoclimate Modelling)

7 7Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 BiodiversityWorld architecture BiodiversityWorld-GRID Interface (BGI) The GRID Workflow enactment engine Wrapped resources Native Biodiversity- World Resources Metadata repository Presentation BGI API User interface

8 8Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

9 9

10 10Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Some problems not fully solved in BDW Flexible data access –BGI designed to make BDW maintainable, but currently assumes each resource has a predefined set of operations –BioDA project investigated use of OGSA-DAI in BDW HCI issues –A much more exploratory approach to workflow construction might be appropriate? Semantic interoperability & data quality –Metadata repository: basic information only –Only basic solution to species naming problems (SPICE) –Other problems of descriptive terms, differences of expert opinion, etc., remain to be addressed

11 11Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Complexity of biodiversity data: a multi-dimensional problem Same specimen might be described with differences of: –Terminology –Opinion about identification –Opinion about whether a particular feature is present –Accuracy Experts may differ as to: –Circumscription associated with a given scientific name (So may not be describing the same concept) –Terminology used to describe a given taxon –Accepted name for a species in a taxonomic checklist There may be errors!...

12 12Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 SPICE for Species 2000 BBSRC/EPSRC- and EU-funded SPecies 2000 Interoperability Co-ordination Environment Aims: –build scalable, federated scientific name catalogue organised by taxon (species, etc.) –provide synonymy server, enriching information retrieval Issue: how to build an architecture to integrate specialist, heterogeneous databases, providing a consistent federated view of broader scope? Common Data Model sufficed … –data requirements of federation identical for each database –small set of canned queries adequate for the catalogue

13 13Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 SPICE internal architecture GSD Wrapper (e.g. JDBC) Wrapper (e.g.CGI/XML + ODBC) User (Web Browser) User (Web browser) …… (in some cases, generic) CORBA wrapper element of GSD Wrapper User Server module (HTTP) Query co-ordinator CAS knowledge repository (taxonomic hierarchy, annual checklist, genus and other caches,...) Common Access System (CAS) CORBA

14 14Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 LITCHI BBSRC/EPSRC- and EU- funded Logic-based Integration of Taxonomic Conflicts in Heterogeneous Information systems Aim: detect conflicts between species checklists and either –Assist in producing a consistent checklist, or –Generate correspondences between checklists (cross-map) Addressing problems of species classification & naming variations when accessing species-related data More general, semantic interoperability issue: –detecting conflicts between different expert views of same subject matter; –supporting data access based on these views

15 15Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 LITCHI example Checklist 1 –Caragana arborescens Lam. (accepted name) Caragana sibirica Medikus (synonym) Checklist 2 –Caragana sibirica Medikus (accepted name) Caragana arborescens Lam. (synonym) (Lam. = Lamark) A full name which is not a pro-parte name may not appear as both an accepted name and a synonym in the same checklist

16 16Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Name relationships (LITCHI 2)

17 17Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 myViews Not funded yet – limited proof-of-concept prototype only Addresses problem that an expert may wish to generate taxon descriptions which are: –Coherent; –Mapped explicitly to other taxon descriptions, and –Based directly on existing documentation (monographs, etc), rather than completely re- coded in some restrictive formalism with a new vocabulary

18 18Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Example: describing the same things? Description A: –Sarothamnus scoparius (L.) Wimm. ex Koch. –Broom –... a bush which is 50-200 cm high... Description B: –Cytisus scoparius –Yellow broom –... a small shrub up to 6ft or more... native in its yellow form... Description C: –Cytisus scoparius (L.) Link. –Broom –... a deciduous shrub growing to 2.4m by 1m at a fast rate... scented flowers... Description D: –Common Broom –Cytisus scoparius –... covered in profuse golden-yellow flowers... shrub about 1-3m tall... Description E: –Broom –Cytisus scoparius –... Like a spineless edition of gorse... with larger scentless flowers... Similar problems apply to individual specimen descriptions

19 19Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Things we might want to do In a system where –data is held in as raw a form as possible, to avoid information loss, but –we can impose various views and hypotheses we might wish to … Create our own view of the data –For a given piece of knowledge, we could accept it unaltered accept but re-express in our terms (e.g. different scientific name; different units;...) state it is equivalent to another piece of knowledge (e.g. minor differences in measurements) flag it as wrong... –In relation to anothers view, we might include or ignore it declare some mapping applicable to a group of items (e.g. every species of Sarothamnus is mapped to Cytisus)... Reason with differing levels of precision simultaneously (e.g. binary/continuous characters derived from same features)

20 20Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 An experimental prototype Proof of concept... –arbitrary, small data set from various sources: Cytisus & Genista species –No real front end or back end yet! Implemented in Prolog (a logic programming language) Formalisms to record complex assertions & their sources Ontological knowledge not currently separated out explicitly; rules perform inference User makes his/her own assertions about (for example) –synonymy; –which assertions of others to accept; –...... both very specific and more general rules Main purpose: illustrate handling multiple opinions/hypotheses

21 21 Sample knowledge base extracts assertion(1, association(2, 3, absent(scent(flowers)))). assertion(1, property(2, yellow(flowers))). assertion(1, label(2, common('Broom'))). assertion(1, label(2, species('Cytisus', 'scoparius'))). assertion(4, property(5, shrublet(whole))). assertion(4, property(5, deciduous(whole))). assertion(4, property(5, size(6, in, whole))). assertion(4, property(5, deep_yellow(flowers))). assertion(4, property(5, small(leaves))). assertion(4, label(5, species('Cytisus', 'ardoinii'))). assertion(4, property(7, size(6, ft, whole))). assertion(4, label(7, species('Cytisus', 'scoparius'))). assertion(12, label(13, common('Broom'))). assertion(12, label(13, common('Scotch Broom'))). assertion(12, property(13, compound('sparteine'))). assertion(12, property(13, compound('tyramine'))). assertion(12, label(13, species('Sarothamnus', 'scoparius'))). assertion(14, label(15, species('Sarothamnus', 'scoparius'))). assertion(14, property(15, size_range(50, 200, cm, whole))). assertion(14, property(15, bright_yellow(flowers))). assertion(16, label(17, species('Cytisus', 'scoparius'))). assertion(16, property(17, max_height(2.4, m, whole))). assertion(16, property(17, max_width(1, m, whole))). assertion(16, property(17, present(scent(flowers)))). assertion(8, property(9, golden_yellow(flowers))). assertion(8, property(9, size_range(1, 3, m, whole))). assertion(8, label(9, species('Cytisus', 'scoparius'))). Source 12 asserts that item 13s label is common name Scotch Broom

22 22Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Deducing from the knowledge base ?- display_accepted_props('Cytisus', 'ardoinii'). shrublet(whole) deciduous(whole) size(6, in, whole) deep_yellow(flowers) small(leaves) Yes ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [present(scent(flowers)), absent(scent(flowers))] Yes

23 23Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Adding synonymy (1) User regards any statement about a Sarathamnus species as being a statement about a Cytisus species with same epithet: assertion(20, synonym(species('Cytisus', Epithet), _, species('Sarothamnus', Epithet), _)). (Could be more restrictive, e.g. apply to only particular information sources)

24 24Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Adding synonymy (2) ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) compound(sparteine) compound(tyramine) size_range(50, 200, cm, whole) bright_yellow(flowers) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [size_range(1, 3, m, whole), size_range(50, 200, cm, whole)] [present(scent(flowers)), absent(scent(flowers))] Yes

25 25Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Some important issues for future work Complexity, e.g. –Trade-off: effective resource discovery v. computational expense of traversing rich ontology –Scalability of taxonomic conflict detection May find large data sets need clever techniques such as Rete network –Scalability of inference in myViews; caching inferred information Managing & ranking large result sets –How to rank resources discovered –How to rank conflicts to present users with matches they are likely to want Joining all these fragmentary projects up together

26 Part 2 (Malcolm Scoble)

27 27Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Specimen (unit) data Collection-level Observations Locality Date of specimen collection Time of specimen collection Name of collector Species/taxon concept Type specimen Homonyms Author of taxon Date of description Genus name (for binomial) Images The complexity of taxonomic/biodiversity data Species name DNA barcodes Synonyms Species concepts

28 28Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Where we are now Fragmented results Fragmented effort Largely a paper medium (restricted access) Where we want to be Less fragmented; single site or distributed access Easier to update Coordinated effort Electronic (or dual) medium Free access to data Taxonomy easier to use Taxonomy: from a fragmented to a distributed resource

29 29Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Projects to integrate biodiversity data BioCISE (collection-level) ENHSIN (specimen (unit)-level) BioCASE (unit- & collection-level) Species 2000 (species nomenclature) SYNTHESYS (taxonomic infrastructure) ENBI (network of biodiversity information) EDIT (distributed approach to taxonomy) PBIs (inventorying the planets biodiversity) CATE: Creating a Taxonomic e-Science

30 30Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 BioCASE National Node Network 31 National Nodes Core Meta Database is updated every night Collection-level

31 31Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 All levels A Biological Collections Service for Europe

32 32Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

33 Creating a taxonomic e-science (CATE) Literature scattered over 250 years of paper publications. Data inaccessible other than to specialist users Aim to transfer in toto the taxonomy of two groups of organisms to the web (Hawkmoths and Aroids). Broad aim: to encourage migration of taxonomy to the web. Provide data for those studying biodiversity. Encourage quality control, peer-review and the development of consensus taxonomies in the web environment. Develop means of citation for web-based revisions Arisaema candidissimum Photo : RBG Kew The Hawkmoth Sphinx caligineus sinicus from Beijing, China. Photo: Tony Pittaway


Download ppt "The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Malcolm Scoble The Natural History Museum"

Similar presentations


Ads by Google