Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.

Slides:



Advertisements
Similar presentations
Experiences with Converting my Grid Web Services to Grid Services Savas Parastatidis & Paul Watson
Advertisements

IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
BIOINFORMATICS Ency Lee.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Design of Web-based Systems IS Development: lecture 10.
John Kewley e-Science Centre GIS and Grid Computing Workshop 13 th September 2005, Leeds Grid Middleware and GROWL John Kewley
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Cyberinfrastructure for Rapid Prototyping Capability Tomasz Haupt, Anand Kalyanasundaram, Igor Zhuk, Vamsi Goli Mississippi State University GeoResouces.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Passage Three Introduction to Microsoft SQL Server 2000.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
The BioBox Initiative: Bio-ClusterGrid Gilbert Thomas Associate Engineer Sun APSTC – Asia Pacific Science & Technology Center.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Taverna and my Grid Basic overview and Introduction Tom Oinn
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Microsoft Research Faculty Summit Paul Watson Professor of Computer Science Newcastle University, UK.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local.
Service Computation 2010November 21-26, Lisbon.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.

E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Supporting Scientific Collaboration Online SCOPE Workshop at San Diego Supercomputer Center March 19-22, 2008.
© Geodise Project, University of Southampton, Geodise Middleware & Optimisation Graeme Pound, Hakki Eres, Gang Xue & Matthew Fairman Summer 2003.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Bioinformatics Computation in the Cloud A Joint Collaboration Between Microsoft’s External Research and eXtreme Computing Groups
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
VisIt Project Overview
Tools and Services Workshop
Provenance: Problem, Architectural issues, Towards Trust
Joslynn Lee – Data Science Educator
Open Source distributed document DB for an enterprise
Grid Portal Services IeSE (the Integrated e-Science Environment)
Mangaldai College, Mangaldai
Project tracking system for the structure solution software pipeline
Bioinformatics and BLAST
Lecture 1: Multi-tier Architecture Overview
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Module 01 ETICS Overview ETICS Online Tutorials
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis

Motivation: Genome Comparison  The past decade has seen the emergence of whole genome sequencing  Whole genome sequences can reveal a great deal about the biology of an organism  Comparing genomes is one of the most effective ways to exploit genome sequence information  Establishes the differences and similarities at the genetic level  Aids biologists in understanding pathogenicity, evolution, ecology, metabolism, etc.

Microbial Genome comparison commonly applied at different levels: DNA (nucleotide sequence) (..atcggatcgtacgagcgatc..) DNA (nucleotide sequence) (..atcccatcgaacgagcgatc..) Proteins (amino acid sequence MCSAKMQTR..) Nucleotide sequence Comparison (whole genome) All–against-all Amino acid sequence comparisons between proteins Proteins (amino acid sequence MSAKMPTR..)

Motivation: Genome Comparison  The number of complete genome sequences is rapidly increasing as sequencing technology advances e.g. ~200 whole genomes have been sequenced e.g. ~200 whole genomes have been sequenced  Sequence analysis and comparison is becoming more computationally intensive Large scale genome comparison is already beyond the capability of many laboratories Large scale genome comparison is already beyond the capability of many laboratories  How are we going to handle all these genomes? New methods and technologies for genome comparison are required. New methods and technologies for genome comparison are required.

Microbase Project Overview  Aims to create a scalable, Grid-enabled analytical system to support microbial genome comparison.  Aims to support both the biological and bioinformatics community.  Funded by BBSRC Bioinformatics and e-Science & DTI Started April Started April  Collaboration with microbiologists and industrial partners Providing use cases. Providing use cases.

Microbase: Functionality  A system that utilises Grid resources to automatically perform genome comparisons at nucleotide and protein levels  An information repository that: maintains and exposes the results of these comparisons to users as a base level dataset maintains and exposes the results of these comparisons to users as a base level dataset provides canned algorithms for analysis provides canned algorithms for analysis  A Grid-enabled high-performance environment to execute remote user-specified computations  Data integration with remote, Grid-enabled databases e.g. Genomic, Metabolic, Protein Interaction, Gene Expression databases, etc… e.g. Genomic, Metabolic, Protein Interaction, Gene Expression databases, etc…

MicrobaseLite: A Prototype  The first prototype of the Microbase system  Automatically performs all-against-all genome comparisons and exposes the resulting datasets  Provide services for biologists to browse and query genome sequences and comparison results  Helps the specification of entire Microbase system and the derivation of use cases  Implemented using a Component-based architecture with Web services interfaces  Also uses existing Grid technology – my Grid Notification Service

MicrobaseLite: Datasets  microbial genomes including Bacteria, archaea, eukaryota Bacteria, archaea, eukaryota Held in the GenomePool component Held in the GenomePool component  Results of all-against-all nucleotide sequence comparison Blastn, MUMmer Blastn, MUMmer  Results of all-against-all protein sequence comparison Blastp, Ssearch, Promer Blastp, Ssearch, Promer Held in the ComparisonPool component Held in the ComparisonPool component  Object-oriented data model of interspecies genome rearrangements The OGRE module component (current research) The OGRE module component (current research)

MicrobaseLite: Architecture Client Side Server Side Request Builder Object-oriented Database Object Model Builder DNA Comparison Protein Comparison Database Notification Service External Notification Internal Notification BIOSQL Genome Loader Web Services Query Microbial Genome Pool Task Scheduler Post-processing Genome Comparison Pool Query & Execution OGRE Module Client Proxies Notification Proxy Web Services Proxy Data Processing Graphical Viewer User Tools Response Receiver

MicrobaseLite: Microbial Genome Pool  Provide a Web / Grid service based information repository of microbial genomes maintains a database of 170+ microbial genomes maintains a database of 170+ microbial genomes  A web-service implementation of BioJava Interfaces  Uses the my Grid Notification Service to notify registered clients of new genomes  Available for use now with a prototype API Clients Comparison Pool Notification Service External Notification Internal Notification BIOSQL Genome Loader Web Service API Microbial Genome Pool

MicrobaseLite: Genome Comparison Pool  Retrieves genomes from the Microbial Genome Pool automatically on Notification  Executes a variety of genome comparison tools: Blast, MUMmer, Promer, MSPcrunch  Incorporates a Task Scheduler for parallel processing Uses N1 Grid Engine (batch system) to dispatch comparison tasks to run on Linux clusters Uses N1 Grid Engine (batch system) to dispatch comparison tasks to run on Linux clusters  Comparison outputs processed and stored into a relational database (mySQL). Protein & Nucleotide Comparison Database Task Scheduler Post-processing Genome Comparison Pool Parallel Cluster(s) N1 Grid Engine Parallel Cluster(s)

Task Scheduler and scalability Number of Processors Execution Time (minutes) Execution times of all-against-all comparisons with 10 microbial genomes ( Blastp, Blastn, MSPcrunch, MUMmer and PROmer )

MicrobaseLite: User Tools  Demonstration graphical tools under development  Genome Browser allows users to view genomes, the comparison results and the results of canned algorithms  Deployed at client-side operating via Web services

Vision for the full Microbase System  Continue to explore scalability issues using MicrobaseLite as platform Towards seamless scalability Towards seamless scalability Harnessing of remote clusters on demand Harnessing of remote clusters on demand  A system for the submission and enactment of remotely conceived code or workflows for user defined comparative analysis Investigating the integration of Taverna core to enact SCUFL workflows within Microbase Investigating the integration of Taverna core to enact SCUFL workflows within Microbase

Conclusions  Microbase aims to exploit Grid resources to provide a scalable system for Microbial genome comparison  MicrobaseLite produced as a prototype and demonstrator application for the biologist/bioinformatician  Work now underway on the full Microbase - a system to support remotely conceived computations

Acknowledgements  The Microbase Team: Anil Wipat, Yudong Sun, Matthew Pocock, Keith Flanagan, Pete Lee, and Paul Watson Anil Wipat, Yudong Sun, Matthew Pocock, Keith Flanagan, Pete Lee, and Paul Watson  The Microbase User Requirements/Use case contributors  my Grid project (Particularly Southampton and EBI)  The Industrial supporters: NonLinear Dynamics, NCIMB, Arrow Therapeutics, Angel Biotech, Complement Genomics, ACS Dobfar, AstraZeneca  See

Microbial Genome comparison commonly applied at two levels: DNA (nucleotide sequence) (..atcggatcgtacgagcgatc..) DNA (nucleotide sequence) (..atcccatcgaacgagcgatc..) Proteins (amino acid sequence MCSAKMQTR..) Nucleotide sequence Comparison (whole genome) All–against-all Amino acid sequence comparisons between proteins Proteins (amino acid sequence MSAKMPTR..)

OGRE: Object-oriented Genome REarrangements Model  A dataset that captures genomic rearrangements between microorganisms  Object-Oriented (OO) concepts and formalism are being used to classify the results of the nucleotide sequence comparison An Ontology and OO-conceptual model is being developed to describe chromosomal rearrangements and to define objects that can represent them An Ontology and OO-conceptual model is being developed to describe chromosomal rearrangements and to define objects that can represent them Algorithms developed to recognise defined rearrangement features in nucleotide sequence comparison data Algorithms developed to recognise defined rearrangement features in nucleotide sequence comparison data Objects made persistent in a OO database Objects made persistent in a OO database

MicrobaseLite: OGRE Module  Performs object-oriented analysis and storage of genome rearrangements An OO dataset captures genomic rearrangements revealed through nucleotide sequence comparison An OO dataset captures genomic rearrangements revealed through nucleotide sequence comparison Made persistent in an OO database Made persistent in an OO database  Provides Web services interface for external users to query and analyse the OO dataset Object-oriented Database Object Model Builder Query & Execution OGRE Module Comparison Pool Web Services