The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Malcolm Scoble The Natural History Museum

Slides:



Advertisements
Similar presentations
Chapter 1: The Database Environment
Advertisements

Distributed Systems Architectures
Chapter 7 System Models.
Requirements Engineering Process
Chapter 1 The Study of Body Function Image PowerPoint
Copyright: SIPC From Ontology to Data Model: Choices and Design Decisions Matthew West Reference Data Architecture and Standards Manager Shell International.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Mirror Mirror on the wall does your repository reflect it all? Peter West and Timothy Miles-Board EPrints Services University of Southampton Southampton,
1 Preliminary results of the Environmental Data Exchange Network for Inland Waters (EDEN-IW) project Practical lessons. P. Haastrup.
Language Specification using Metamodelling Joachim Fischer Humboldt University Berlin LAB Workshop Geneva
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
|epcc| NeSC Workshop Open Issues in Grid Scheduling Ali Anjomshoaa EPCC, University of Edinburgh Tuesday, 21 October 2003 Overview of a Grid Scheduling.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
1 DTI/EPSRC 7 th June 2005 Reacting to HCI Devices: Initial Work Using Resource Ontologies with RAVE Dr. Ian Grimstead Richard Potter BSc(Hons)
Programming Language Concepts
Configuration management
Software change management
Fact-finding Techniques Transparencies
Information Systems Today: Managing in the Digital World
1 Quality Indicators for Device Demonstrations April 21, 2009 Lisa Kosh Diana Carl.
ITEC200 Week04 Lists and the Collection Interface.
Data Structures Using C++
Microsoft Access.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
1 IC GS J. Broome, Mar Introduction to the Informatics and Data Aspects John Broome (Canada)
25 July, 2014 Hailiang Mei, TU/e Computer Science, System Architecture and Networking 1 Hailiang Mei Remote Terminal Management.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
The world leader in serving science TQ ANALYST SOFTWARE Putting your applications on target.
UK-based developments in online thesauri for taxonomic information Copp, C., Grant, M., Hewzulla, D., Hussey, C., Robinson, J., van Breda, J. & White,
At Reading Frank Bisby, Alistair Culham, Paul Valdes, Neil Caithness, Tim Sutton, Peter Brewer At Cardiff Alec Gray, Andrew Jones, Nick Fiddian, Nick Pittas,
Database System Concepts and Architecture
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture 6: Software Design (Part I)
Cardiff School of Computer Science & Informatics Biodiversity Informatics at COMSC Andrew Jones & Richard White School of Computer Science & Informatics.
Science as a Process Chapter 1 Section 2.
Executional Architecture
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Chapter 11 Describing Process Specifications and Structured Decisions
Chapter 13 The Data Warehouse
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
From Model-based to Model-driven Design of User Interfaces.
Knowledge Sharing and Collaborative Problem Solving in Biodiversity Informatics Andrew C. Jones Cardiff University, UK.
Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”,
10 March 2004Richard J. White – COMSC / BB Unit Reliable knowledge discovery in a biodiversity Grid Part 2: Litchi and ambiguous names by Richard J. White.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
115 October 2005Richard White - Sp2000/ENBI - Stockholm Litchi: interlinking species information systems Richard White, Andrew Jones, Ed Donovan Computer.
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Designing and Building a Biodiversity Grid: the Biodiversity World Project A talk in the workshop “e-Research - Meeting New Research Challenges” at the.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
The University of Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson Cardiff University Alec Gray, Andrew Jones,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Presentation transcript:

The complexity of biodiversity knowledge Andrew C. Jones Cardiff University Malcolm Scoble The Natural History Museum

2Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Purpose of talk Malcolm & Andrew are both investigators in BiodiversityWorld (BDW) There are many problems BDW doesnt solve yet … … and the funding runs out tomorrow! Well present –BiodiversityWorld as a framework to support biodiversity research –Other projects in which biodiversity informatics problems have been addressed individually Major challenge: draw these disparate efforts together

Part 1 (Andrew Jones)

4Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Why Biodiversity Informatics is hard Need to integrate data & tools of different kinds for interesting in silico analyses Various computer science issues, e.g. –Human-Computer Interaction Design of environments to support scientific research –Interoperability –Complexity & heterogeneity of data Differences of scientific opinion Data quality problems

5Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 The BiodiversityWorld project 3 year e-Science project funded by BBSRC Partners: The University of Reading, Cardiff University, The Natural History Museum, Southampton University Aim: –Build a Biodiversity Grid (Problem Solving Environment to support Biodiversity research) –Support discovery & use of arbitrary tools & data sources for interesting in silico experiments –Provide environment to get beyond the cutting and pasting into Word documents approach to data integration and analysis

6Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Example problems for BiodiversityWorld How should conservation efforts be concentrated? –(example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? –(example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in selection among possible phylogenetic trees? –(example of Phylogenetic Analysis & Palaeoclimate Modelling)

7Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 BiodiversityWorld architecture BiodiversityWorld-GRID Interface (BGI) The GRID Workflow enactment engine Wrapped resources Native Biodiversity- World Resources Metadata repository Presentation BGI API User interface

8Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

9

10Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Some problems not fully solved in BDW Flexible data access –BGI designed to make BDW maintainable, but currently assumes each resource has a predefined set of operations –BioDA project investigated use of OGSA-DAI in BDW HCI issues –A much more exploratory approach to workflow construction might be appropriate? Semantic interoperability & data quality –Metadata repository: basic information only –Only basic solution to species naming problems (SPICE) –Other problems of descriptive terms, differences of expert opinion, etc., remain to be addressed

11Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Complexity of biodiversity data: a multi-dimensional problem Same specimen might be described with differences of: –Terminology –Opinion about identification –Opinion about whether a particular feature is present –Accuracy Experts may differ as to: –Circumscription associated with a given scientific name (So may not be describing the same concept) –Terminology used to describe a given taxon –Accepted name for a species in a taxonomic checklist There may be errors!...

12Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 SPICE for Species 2000 BBSRC/EPSRC- and EU-funded SPecies 2000 Interoperability Co-ordination Environment Aims: –build scalable, federated scientific name catalogue organised by taxon (species, etc.) –provide synonymy server, enriching information retrieval Issue: how to build an architecture to integrate specialist, heterogeneous databases, providing a consistent federated view of broader scope? Common Data Model sufficed … –data requirements of federation identical for each database –small set of canned queries adequate for the catalogue

13Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 SPICE internal architecture GSD Wrapper (e.g. JDBC) Wrapper (e.g.CGI/XML + ODBC) User (Web Browser) User (Web browser) …… (in some cases, generic) CORBA wrapper element of GSD Wrapper User Server module (HTTP) Query co-ordinator CAS knowledge repository (taxonomic hierarchy, annual checklist, genus and other caches,...) Common Access System (CAS) CORBA

14Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 LITCHI BBSRC/EPSRC- and EU- funded Logic-based Integration of Taxonomic Conflicts in Heterogeneous Information systems Aim: detect conflicts between species checklists and either –Assist in producing a consistent checklist, or –Generate correspondences between checklists (cross-map) Addressing problems of species classification & naming variations when accessing species-related data More general, semantic interoperability issue: –detecting conflicts between different expert views of same subject matter; –supporting data access based on these views

15Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 LITCHI example Checklist 1 –Caragana arborescens Lam. (accepted name) Caragana sibirica Medikus (synonym) Checklist 2 –Caragana sibirica Medikus (accepted name) Caragana arborescens Lam. (synonym) (Lam. = Lamark) A full name which is not a pro-parte name may not appear as both an accepted name and a synonym in the same checklist

16Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Name relationships (LITCHI 2)

17Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 myViews Not funded yet – limited proof-of-concept prototype only Addresses problem that an expert may wish to generate taxon descriptions which are: –Coherent; –Mapped explicitly to other taxon descriptions, and –Based directly on existing documentation (monographs, etc), rather than completely re- coded in some restrictive formalism with a new vocabulary

18Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Example: describing the same things? Description A: –Sarothamnus scoparius (L.) Wimm. ex Koch. –Broom –... a bush which is cm high... Description B: –Cytisus scoparius –Yellow broom –... a small shrub up to 6ft or more... native in its yellow form... Description C: –Cytisus scoparius (L.) Link. –Broom –... a deciduous shrub growing to 2.4m by 1m at a fast rate... scented flowers... Description D: –Common Broom –Cytisus scoparius –... covered in profuse golden-yellow flowers... shrub about 1-3m tall... Description E: –Broom –Cytisus scoparius –... Like a spineless edition of gorse... with larger scentless flowers... Similar problems apply to individual specimen descriptions

19Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Things we might want to do In a system where –data is held in as raw a form as possible, to avoid information loss, but –we can impose various views and hypotheses we might wish to … Create our own view of the data –For a given piece of knowledge, we could accept it unaltered accept but re-express in our terms (e.g. different scientific name; different units;...) state it is equivalent to another piece of knowledge (e.g. minor differences in measurements) flag it as wrong... –In relation to anothers view, we might include or ignore it declare some mapping applicable to a group of items (e.g. every species of Sarothamnus is mapped to Cytisus)... Reason with differing levels of precision simultaneously (e.g. binary/continuous characters derived from same features)

20Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 An experimental prototype Proof of concept... –arbitrary, small data set from various sources: Cytisus & Genista species –No real front end or back end yet! Implemented in Prolog (a logic programming language) Formalisms to record complex assertions & their sources Ontological knowledge not currently separated out explicitly; rules perform inference User makes his/her own assertions about (for example) –synonymy; –which assertions of others to accept; – both very specific and more general rules Main purpose: illustrate handling multiple opinions/hypotheses

21 Sample knowledge base extracts assertion(1, association(2, 3, absent(scent(flowers)))). assertion(1, property(2, yellow(flowers))). assertion(1, label(2, common('Broom'))). assertion(1, label(2, species('Cytisus', 'scoparius'))). assertion(4, property(5, shrublet(whole))). assertion(4, property(5, deciduous(whole))). assertion(4, property(5, size(6, in, whole))). assertion(4, property(5, deep_yellow(flowers))). assertion(4, property(5, small(leaves))). assertion(4, label(5, species('Cytisus', 'ardoinii'))). assertion(4, property(7, size(6, ft, whole))). assertion(4, label(7, species('Cytisus', 'scoparius'))). assertion(12, label(13, common('Broom'))). assertion(12, label(13, common('Scotch Broom'))). assertion(12, property(13, compound('sparteine'))). assertion(12, property(13, compound('tyramine'))). assertion(12, label(13, species('Sarothamnus', 'scoparius'))). assertion(14, label(15, species('Sarothamnus', 'scoparius'))). assertion(14, property(15, size_range(50, 200, cm, whole))). assertion(14, property(15, bright_yellow(flowers))). assertion(16, label(17, species('Cytisus', 'scoparius'))). assertion(16, property(17, max_height(2.4, m, whole))). assertion(16, property(17, max_width(1, m, whole))). assertion(16, property(17, present(scent(flowers)))). assertion(8, property(9, golden_yellow(flowers))). assertion(8, property(9, size_range(1, 3, m, whole))). assertion(8, label(9, species('Cytisus', 'scoparius'))). Source 12 asserts that item 13s label is common name Scotch Broom

22Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Deducing from the knowledge base ?- display_accepted_props('Cytisus', 'ardoinii'). shrublet(whole) deciduous(whole) size(6, in, whole) deep_yellow(flowers) small(leaves) Yes ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [present(scent(flowers)), absent(scent(flowers))] Yes

23Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Adding synonymy (1) User regards any statement about a Sarathamnus species as being a statement about a Cytisus species with same epithet: assertion(20, synonym(species('Cytisus', Epithet), _, species('Sarothamnus', Epithet), _)). (Could be more restrictive, e.g. apply to only particular information sources)

24Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Adding synonymy (2) ?- display_accepted_props('Cytisus', 'scoparius'). yellow(flowers) size(6, ft, whole) golden_yellow(flowers) size_range(1, 3, m, whole) max_height(2.4, m, whole) max_width(1, m, whole) present(scent(flowers)) compound(sparteine) compound(tyramine) size_range(50, 200, cm, whole) bright_yellow(flowers) absent(spines) absent(scent(flowers)) Yes ?- display_contradictions_for('Cytisus', 'scoparius'). [size_range(1, 3, m, whole), size_range(50, 200, cm, whole)] [present(scent(flowers)), absent(scent(flowers))] Yes

25Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Some important issues for future work Complexity, e.g. –Trade-off: effective resource discovery v. computational expense of traversing rich ontology –Scalability of taxonomic conflict detection May find large data sets need clever techniques such as Rete network –Scalability of inference in myViews; caching inferred information Managing & ranking large result sets –How to rank resources discovered –How to rank conflicts to present users with matches they are likely to want Joining all these fragmentary projects up together

Part 2 (Malcolm Scoble)

27Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Specimen (unit) data Collection-level Observations Locality Date of specimen collection Time of specimen collection Name of collector Species/taxon concept Type specimen Homonyms Author of taxon Date of description Genus name (for binomial) Images The complexity of taxonomic/biodiversity data Species name DNA barcodes Synonyms Species concepts

28Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Where we are now Fragmented results Fragmented effort Largely a paper medium (restricted access) Where we want to be Less fragmented; single site or distributed access Easier to update Coordinated effort Electronic (or dual) medium Free access to data Taxonomy easier to use Taxonomy: from a fragmented to a distributed resource

29Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 Projects to integrate biodiversity data BioCISE (collection-level) ENHSIN (specimen (unit)-level) BioCASE (unit- & collection-level) Species 2000 (species nomenclature) SYNTHESYS (taxonomic infrastructure) ENBI (network of biodiversity information) EDIT (distributed approach to taxonomy) PBIs (inventorying the planets biodiversity) CATE: Creating a Taxonomic e-Science

30Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 BioCASE National Node Network 31 National Nodes Core Meta Database is updated every night Collection-level

31Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06 All levels A Biological Collections Service for Europe

32Jones & Scoble, Semantic Interop., Imperial Coll, 30/03/06

Creating a taxonomic e-science (CATE) Literature scattered over 250 years of paper publications. Data inaccessible other than to specialist users Aim to transfer in toto the taxonomy of two groups of organisms to the web (Hawkmoths and Aroids). Broad aim: to encourage migration of taxonomy to the web. Provide data for those studying biodiversity. Encourage quality control, peer-review and the development of consensus taxonomies in the web environment. Develop means of citation for web-based revisions Arisaema candidissimum Photo : RBG Kew The Hawkmoth Sphinx caligineus sinicus from Beijing, China. Photo: Tony Pittaway