Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
At Reading Frank Bisby, Alistair Culham, Paul Valdes, Neil Caithness, Tim Sutton, Peter Brewer At Cardiff Alec Gray, Andrew Jones, Nick Fiddian, Nick Pittas,
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
CAD/CAM Design Process and the role of CAD. Design Process Engineering and manufacturing together form largest single economic activity of western civilization.
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
A Virtual Laboratory for Global Biodiversity Analysis.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
Richard White Biodiversity Informatics Projects. Thoughts Role of biodiversity data in bioinformatics – assisting with organising and retrieving bioinformatic.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Designing and Building a Biodiversity Grid: the Biodiversity World Project A talk in the workshop “e-Research - Meeting New Research Challenges” at the.
Software System Engineering: A tutorial
Business Analysis and Essential Competencies
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Database System Concepts and Architecture
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
ITR: Collaborative research: software for interpretation of cosmogenic isotope inventories - a combination of geology, modeling, software engineering and.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Andrew Jones Interop. in changing infrastructure BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July Design Decisions Interoperability.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Managing and communicating uncertainty in geospatial web service workflows Richard Jones, Dan Cornford, Lucy Bastin, Matthew Williams Computer Science,
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
The GIS Project First Steps. Introduction Designing a GIS project. –What is the nature of the project? –What is the scope of the project? Project management.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
At Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson At Cardiff Alec Gray, Andrew Jones, Nick.
Using and modifying plan constraints in Constable Jim Blythe and Yolanda Gil Temple project USC Information Sciences Institute
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 THETIS:A DATA MANAGEMENT AND DATA VISUALIZATION SYSTEM FOR SUPPORTING COASTAL ZONE MANAGEMENT OF THE MEDITERRANEAN SEA (F0069: Telematics on Research)
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
The University of Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson Cardiff University Alec Gray, Andrew Jones,
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
BDWorld Alex Gray, Andrew Jones, Frank Bisby, Alastair Culham, Alex Gray, Nick Fiddian, Andrew Jones, Malcolm Scoble, Paul Valdes, Richard White, Peter.
An Approach to Software Preservation
Presentation transcript:

Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK

2 The Biodiversity World project 3 year e-Science project funded by the UK BBSRC research council, Universities of Cardiff, Reading and Southampton The Natural History Museum (London)

3 Some difficult biodiversity questions How should conservation efforts be concentrated? (example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? (example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in inferring possible evolutionary pathways? (example of Phylogenetic Analysis & Palaeoclimate Modelling)

4 Point data from various herbaria

5 GARP prediction of climatic suitability

6 Distribution data from ILDIS database

7 Types of resource used in these biodiversity studies Data sources: Catalogue of Life (names of species: Species 2000, GBIF) Biodiversity data Descriptive data Distribution of specimens and observations Geographical data Boundaries of geographical & political units Climate surfaces Genetic sequences Analytic tools: Biodiversity richness assessment – various metrics Bioclimatic modelling – bioclimatic ‘envelope’ generation Phylogenetic analysis (generation of phylogenetic trees)

8 Some challenges … Finding the resources Knowing how to use these heterogeneous resources Originally constructed for various reasons Often little thought was given to standards or interoperability

9 The Biodiversity World vision (1) Problem Solving Environment for Biodiversity studies – Heterogeneous diverse resources Facilitating integration of both legacy and newly-developed resources Flexible workflows Main challenges centre around interoperability, resource discovery, metadata, etc; High-performance computing secondary (though relevant)

Our architecture …

11 Biodiversity World as a flexible PSE Species 2000 & ITIS Catalogue of Life Analytic tool Thematic data source BDW Grid Ontology:  Metadata  Resource & analytic tool descriptions  Maintenance tools Wrapper Abiotic data source User Local tools Problem Solving Environment user interface (Triana) Problem Solving Environment:  Resource discovery  Support for workflows Wrapper Analytic tool GSD

User interaction with BDWorld …

13 Example work-flow (Climate-space Modelling) Projection Prediction Species 2000 Localities Climate Space Model Base Maps Climate Submit scientific name; retrieve accepted name & synonyms for species Retrieve distribution data for species of interest Present or recent climate surfaces Model of climatic conditions where species is currently found Possibly different climate surfaces (e.g. predicted climate) World or regional maps Prediction of suitable regions for species of interest Projection of predicted distribution on to base map

14 BDWorld / Triana in operation: Workflow creation (design, editing)

15 Triana screen-shots

16 Triana screen-shots

17 Triana screen-shots

18 Triana screen-shots

19 Triana screen-shots

20 Triana screen-shots

21 BDWorld / Triana in operation: Workflow execution (enactment, run-time)

22 Triana screen-shots

23 Triana screen-shots

24 Triana screen-shots

25 Triana screen-shots

26 Triana screen-shots

27 A dream A desktop environment in which scientists can “drag & drop” data sources, analysis and modelling tools and visualisation interfaces into a desired sequence of operations which can be run automatically BDWorld just about at this stage With additional features, the environment could be made richer, more productive, and support research groups. Essentially a component-based visual programming environment Not just for biodiversity!

28 Role of metadata Metadata is needed to enable discovery of resources and to indicate how they are to be used Properties to help locate appropriate resources Check interoperability, suggest transformations Provenance of data sets Log of work-flows executed

29 Resources have to be matched To the user’s requirements To the capabilities of the user’s workstation environment To each other, so that data sets generated by one task can be used by another

30 Finding a resource that matches the user’s requirements Metadata is stored when a resource is registered This metadata is used to find a resource which meets the user’s needs (possibly interpreted with the help of an ontology) can run in the user’s environment (users have to register their metadata too)

31 Metadata about resources Description, functionality Input and output data sets User interaction, if any Platform, requirements, restrictions Quality & reputation

32 Users’ needs What the resource does (or data source delivers) Algorithm used Whether it uses the right data type Quality, reliability, reputation …

33 Matching resources to users Users have varying capabilities and privileges which may affect their ability to use resources which: run on specific platforms only have IPR or cost limitations imposed on their use interact with their user locally in real time have other unexpected requirements

34 The user’s environment Platform: OS, supporting software Privileges, licences held, etc. Connection (bandwidth etc.) Workstation hardware (display, memory, speed, etc.)

35 Matching resource inputs and outputs The output of an earlier task may be the input of a later one Thus inputs and outputs of resources have to be tested for matching The only real criterion for this is the later resource – it has been programmed to read a data set, and will complain if it isn’t suitable However …

36 Matching input and output data sets Can be done with various levels of rigour: Is the same word used to describe their type? Do they have the same schema? Do they have schemas which contain the same elements? Do they have schemas which can be proved to be equivalent? (this is very hard) Are there additional parameters which have to match? (e.g. matrix dimensions)

37 Transforming data sets If the data sets don’t match, the metadata may allow the workflow designer to supply parameters to the wrapper to adjust its generation of output data or its interpretation of input data choose a transformation tool which can be inserted into the workflow called as a local tool on the user’s workstation control a more flexible data transformation tool

38 Summary Need metadata about Resources Operations Data set types (and schemas) Conversion tools Users (and their workstations)