Designing and Building a Biodiversity Grid: the Biodiversity World Project A talk in the workshop “e-Research - Meeting New Research Challenges” at the.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
At Reading Frank Bisby, Alistair Culham, Paul Valdes, Neil Caithness, Tim Sutton, Peter Brewer At Cardiff Alec Gray, Andrew Jones, Nick Fiddian, Nick Pittas,
Cardiff School of Computer Science & Informatics Biodiversity Informatics at COMSC Andrew Jones & Richard White School of Computer Science & Informatics.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
Holding slide prior to starting show. Supporting Collaborative Working of Construction Industry Consortia via the Grid - P. Burnap, L. Joita, J.S. Pahwa,
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Virtual Laboratory for Global Biodiversity Analysis.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
An Introduction to Software Architecture
Software Processes lecture 8. Topics covered Software process models Process iteration Process activities The Rational Unified Process Computer-aided.
Service-enabling Legacy Applications for the GENIE Project Sofia Panagiotidi, Jeremy Cohen, John Darlington, Marko Krznarić and Eleftheria Katsiri.
DAME: Distributed Engine Health Monitoring on the Grid
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 09. Review Introduction to architectural styles Distributed architectures – Client Server Architecture – Multi-tier.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Interoperability between Scientific Workflows Ahmed Alqaoud, Ian Taylor, and Andrew Jones Cardiff University 10/09/2008.
Andrew Jones Interop. in changing infrastructure BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July Design Decisions Interoperability.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
Managing and communicating uncertainty in geospatial web service workflows Richard Jones, Dan Cornford, Lucy Bastin, Matthew Williams Computer Science,
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
At Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson At Cardiff Alec Gray, Andrew Jones, Nick.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Registries, ebXML and Web Services in short. Registry A mechanism for allowing users to announce, or discover, the availability and state of a resource:
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 4 Slide 1 Software Processes.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
The University of Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson Cardiff University Alec Gray, Andrew Jones,
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
BDWorld Alex Gray, Andrew Jones, Frank Bisby, Alastair Culham, Alex Gray, Nick Fiddian, Andrew Jones, Malcolm Scoble, Paul Valdes, Richard White, Peter.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Grid Portal Services IeSE (the Integrated e-Science Environment)
Service-centric Software Engineering
Large Scale Distributed Computing
Grid Systems: What do we need from web service standards?
Presentation transcript:

Designing and Building a Biodiversity Grid: the Biodiversity World Project A talk in the workshop “e-Research - Meeting New Research Challenges” at the Welsh e-Science Centre, 14 February 2006 Richard White, Andrew Jones, Alex Gray, Jaspreet S. Pahwa, Mikhaila Burgess Cardiff University, UK

2 The Biodiversity World project 3 year e-Science project funded by the UK BBSRC research council, Universities of Cardiff, Reading and Southampton The Natural History Museum (London)

3 Some difficult biodiversity questions How should conservation efforts be concentrated? (example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? (example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in inferring possible evolutionary pathways? (example of Phylogenetic Analysis & Palaeoclimate Modelling)

4 Point data from various herbaria

5 GARP prediction of climatic suitability

6 Distribution data from ILDIS database

7 Types of resource used in these biodiversity studies Data sources: Catalogue of Life (names of species: Species 2000, GBIF) Biodiversity data Descriptive data Distribution of specimens and observations Geographical data Boundaries of geographical & political units Climate surfaces Genetic sequences Analytic tools: Biodiversity richness assessment – various metrics Bioclimatic modelling – bioclimatic ‘envelope’ generation Phylogenetic analysis (generation of phylogenetic trees)

8 Some challenges … Finding the resources Knowing how to use these heterogeneous resources Originally constructed for various reasons Often little thought was given to standards or interoperability

9 The Biodiversity World vision (1) Problem Solving Environment for Biodiversity studies – Heterogeneous diverse resources Facilitating integration of both legacy and newly-developed resources Flexible workflows Main challenges centre around interoperability, resource discovery, metadata, etc; High-performance computing secondary (though relevant)

10 The Biodiversity World vision (2) Distinctive features: a biodiversity informatics Grid interoperability with heterogeneous data, complex in structure resilience to infrastructure change & interoperation with other Grids interactive collaboration a secondary concern We want to automate tasks such as the previous example analysis, as shown later

Our architecture …

12 Biodiversity World as a flexible PSE Species 2000 & ITIS Catalogue of Life Analytic tool Thematic data source BDW Grid Ontology:  Metadata  Resource & analytic tool descriptions  Maintenance tools Wrapper Abiotic data source User Local tools Problem Solving Environment user interface (Triana) Problem Solving Environment:  Resource discovery  Support for workflows Wrapper Analytic tool GSD

13 Role of metadata Metadata is needed to enable discovery of resources and to indicate how they are to be used Properties to help locate appropriate resources Check interoperability, suggest transformations Provenance of data sets Log of work-flows executed

14 Biodiversity World Wrappers A mechanism to provide consistent interface to resources using a standard resource invocation mechanism Operations on remote resources are invoked via the invokeOperation(resource, operation, dataCollection) method implemented by all the wrappers Wraps various kinds of resources and analytic tools Insulate the core BDWorld System from heterogeneous resources Retain flexibility to use various operations supported by each resource Solves the problem of interoperability between client and heterogeneous resources Wrappers give consistent form to data retrieved from heterogeneous resources by encapsulating them into a set of standard BDWorld data types Can be deployed in Web Services/Grid environment

15 Interoperability in Biodiversity World Have defined Biodiversity World – Grid Interface (BGI) addressing the need to: wrap resources to hide heterogeneity insulate from infrastructure change use metadata to cope with remaining heterogeneity

16 BDWorld-Grid Interface (BGI) Layer Provides standard mechanisms for invoking operations on heterogeneous resources provides an integrated mechanism for seamless access to BDW resources via resource wrappers Uses XML/SOAP messaging system for invoking operations on resource wrappers Potentially interoperable with other e-Science projects Isolates users from Grid/Web Service complexities Isolates resources/ resource wrapper implementation to enable use of web services/grid technologies as part of a separate layer A Helper class is provided to the user (or software such as Triana) for using the BGI layer

17 BDW Wrapper Remote Database Resource ToolJava Resource Unit A Unit B Unit C Triana Workflow Units BDW Datatypes Data Handling Tools BGI Communications Layer BGI Helper Tool

18 Biodiversity World architecture BiodiversityWorld-GRID Interface (BGI) The GRID Workflow enactment engine Wrapped resources Native Biodiversity- World Resources Metadata repository Presentation BGI API User interface BiodiversityWorld-GRID Interface (BGI) The GRID Workflow enactment engine Wrapped resources Native Biodiversity World Resources Metadata repository Presentation BGI API User interface

User interaction with BDWorld …

20 Example work-flow (Climate-space Modelling) Projection Prediction Species 2000 Localities Climate Space Model Base Maps Climate Submit scientific name; retrieve accepted name & synonyms for species Retrieve distribution data for species of interest Present or recent climate surfaces Model of climatic conditions where species is currently found Possibly different climate surfaces (e.g. predicted climate) World or regional maps Prediction of suitable regions for species of interest Projection of predicted distribution on to base map

21 BDWorld / Triana in operation: Workflow creation (design, editing)

22 Triana screen-shots

23 Triana screen-shots

24 Triana screen-shots

25 Triana screen-shots

26 Triana screen-shots

27 Triana screen-shots

28 BDWorld / Triana in operation: Workflow execution (enactment, run-time)

29 Triana screen-shots

30 Triana screen-shots

31 Triana screen-shots

32 Triana screen-shots

33 Triana screen-shots

34 Current and future work What I have described up to now is more or less what was originally envisaged Now we have some ideas on how to improve our architecture Web Services version being evaluated GT4, WSRF in future

35 BDWorld Web Services Architecture Web Services is a mechanism of enabling distributed computing based on open standards Wrappers are now deployed in a Web Services environment which can be accessed via the BGI Layer with the assistance of a BGI Helper Tool Axis SOAP engine provides the WSDL that exposes wrapper operations to outside world The MetadataAgent provides access to MDR via the BGI Layer

36 Drawbacks of Web Services Each web service needs to be deployed individually Web services are not “stateful” provide mechanisms for invoking remote operations but no provision for other functionality such as resource management, persistence, life cycle management, notification etc.

37 GT4 Key Concepts Based on Open Grid Service Architecture (OGSA) OGSA defines common, standard and open architecture for Grid-based applications Standardises various services common to Grid applications (job management, resource monitoring and discovery, resource management, security services etc) Uses Web Services as underlying technology to enable distributed computing But Web Services are not stateful

38 WSRF – An approach to statefulness WSRF provides the mechanism to keep state information by keeping the Web Service and state information completely separate State information is stored in an entity called a resource (not to be confused with a BDWorld resource) A resource can be identified via its unique key When requiring stateful interaction, a web service can be instructed to use a particular resource The resources can be stored in memory or on secondary storage

39 Where do we go from here? Present system is a proof of concept Limited Biodiversity exemplars only Needs more data resources more functionality additional features –Modelling tools –Virtual organisations

40 Workflows Creating a workflow: Workflows clearly good for capturing complex tasks Good for ‘tweaking’ tasks But is this how users think? If not, we should provide an environment that supports a more exploratory approach too, e.g. User tries out some small subtasks … … and joins results together System records interactions, so re-usable workflows can be composed

41 Other aspects of user interface The drag-and-drop metaphor needs further research into the best ways to support resource discovery resource matching data management (e.g. temporary storage of intermediate results)

42 Complex interactions BGI not well-suited to fine-grained interaction Stand-alone applications difficult to wrap may need, e.g., screen scraping We’re looking at: Less portable ‘by-pass’ mechanisms, e.g. New BGI protocol Existing techniques (in extremis) e.g. VNC Plug-ins for the BDW client External tools (which will always be needed)

43 A dream A desktop environment in which scientists can “drag & drop” data sources, analysis and modelling tools and visualisation interfaces into a desired sequence of operations which can be run automatically BDWorld just about at this stage With additional features, the environment could be made richer, more productive, and support research groups. Essentially a component-based visual programming environment Not just for biodiversity!

44 Extra functionality Enhanced metadata Provenance and data lineage Automatic electronic “lab notebook” Stored workflows Repeatability, reproduceability Re-use with different data, changed parameters Ontologies Resource discovery Usability Dynamic interaction of users with resources

45 Virtual organisations Collaborative working environments Shared and private resources: data, tools Controlled release of data, tools and results Shared experimentation User authorisation / authentication Access control Dynamic Membership Resources

46 The way forward New exemplars in environmental science, bioinformatics and health informatics Links with national and international organisations, resources, VOs “End users” Input Feedback Applied use, driven by scientific priorities …

47 Acknowledgements Thanks to Jaspreet Singh Pahwa for the slides concerning wrappers, BGI, GT4, OGSA &WSRF The Triana Project for the workflow environment Other collaborators at Cardiff University The University of Reading The Natural History Museum (London) Organisations that have co-operated with these research projects, especially Species 2000 ILDIS (International Legume Database and Information Service) Hadley Centre for Climate Prediction and Research BBSRC for BDWorld DTI, EPSRC & EU for related projects