Www.eu-eela.eu E-science grid facility for Europe and Latin America Computational challenges on Grid Computing for workflows applied to Phylogeny R. Isea.

Slides:



Advertisements
Similar presentations
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Advertisements

Introduction to Grids and Grid applications Gergely Sipos MTA SZTAKI
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Santiago de Chile, 1st EELA Conference, 4-5/9/06 1 Status.
Portals and Credentials David Groep Physics Data Processing group NIKHEF.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
INFSO-RI Enabling Grids for E-sciencE The GENIUS Grid portal Tony Calanducci INFN Catania - Italy First Latin American Workshop.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Lessons learnt from the EGEE Application Porting Support activity Gergely Sipos Coordinator.
BIOINFOGRID: Bioinformatics Grid Application for Life Science Giorgio Maggi INFN and Politecnico di Bari
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Advanced Techniques for Scheduling, Reservation, and Access Management for Remote Laboratories Wolfgang Ziegler, Oliver Wäldrich Fraunhofer Institute SCAI.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
EMBRACE An example of Grid Integration (I): The EMBRACE project Jean SALZEMANN CNRS/IN2P3.
EGEE-Forum – May 11, 2007 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A gateway platform for Grid Nicolas.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
Migrating Desktop Marcin Płóciennik Marcin Płóciennik Kick-off Meeting, Santander, Graphical.
E-science grid facility for Europe and Latin America Bridging the High Performance Computing Gap with OurGrid Francisco Brasileiro Universidade.
INFSO-RI Enabling Grids for E-sciencE Status of the Biomedical Applications in EELA Project (E-Infrastructures Shared Between Europe.
INFSO-RI Enabling Grids for E-sciencE Biomedical applications V. Breton, CNRS-IN2P3.
PhyloGrid: a development for a workflow in Phylogeny E. Montes 1, R. Isea 2 and R. Mayo 1 1 CIEMAT, Avda. Complutense, 22, Madrid, Spain 2 Fundación.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 A Steering Portal for Condor/DAGMAN Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology.
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Application Porting INFN Giuseppe.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Structural Biology on the GRID Dr. Tsjerk A. Wassenaar Biomolecular NMR - Utrecht University (NL)
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Workflow repository, user.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
THE BIOVEL PROJECT: ROBUST PHYLOGENETIC WORKFLOWS RUNNING ON THE GRID Bachir Balech (IBBE-CNR)
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Tamas Kiss University Of Westminster
MATLAB Distributed, and Other Toolboxes
Recap: introduction to e-science
Introduction to the SHIWA Simulation Platform EGI User Forum,
Presentation transcript:

E-science grid facility for Europe and Latin America Computational challenges on Grid Computing for workflows applied to Phylogeny R. Isea 1, E. Montes 2, A J. Rubio-Montero 2 and R. Mayo 2 1 Fundación IDEA (Venezuela) 2 CIEMAT (Spain) IWPACBB 2009 Salamanca, June 12 th, 2009

IWPACBB Salamanca, June 12 th, Outline Phylogenetics: a reminder Challenges in Phylogenetics –Computational methods: MrBayes –Exploiting of Grid technology MrBayes and Bioinformatic resources on Grid The PhyloGrid approach –General description and objectives –Taverna workflow –GridSphere portal –Future work: GridWay metascheduler Some results: HPV case study Summary and conclusions

IWPACBB Salamanca, June 12 th, Phylogenetics: a reminder Phylogeny: reconstruction of the evolutionary history (evolutionary tree) of organisms –Influence and relationship between species –Evolution of selected populations Applications on Life Sciences, Industry, etc: –Know real history of evolution: Tree of Life –Drug discovery –Tracing geographical origin, dating introduction of stumps –Prediction of gene’s and proteins’ function –Epidemiological studies Complete Tree of life At July 1837 Darwin draw his first-know sketch of a evolutionary tree

IWPACBB Salamanca, June 12 th, Computational problem: so many trees… Nº of Rooted trees Nº of Unrooted trees Nº of taxa Nº of possible labelled topologies with n species or taxa Rooted Trees: Unrooted Trees: Exhaustive enumeration of all possible phylogenies is not computationally feasible

IWPACBB Salamanca, June 12 th, Computational methods Phenetics: no evolutionary model –Distance-matrix based methods (Neighbour-Joining) Cladistics: –Maximum Parsimony (not statistically consistent) –Maximum Likelihood –Bayesian inference (Markov Chain Monte Carlo): simulation techniques for approximating posterior probability distribution of trees MrBayes ( –Sequential and Parallel implementations (MPI enabled) –High CPU and memory consumption:  50 taxa: simulation of generations ~ 50 hours in a P4 2.8Ghz  2900 sequences of HIV-1 computational challenge

IWPACBB Salamanca, June 12 th, Challenges for Bioinformatics Yet a computational problem –Partial scientific community: inefficient local facilities –Rise in provision of HPC facilities: additional skills required Different approach to access computing infrastructures irrespective of their location Grid Computing

IWPACBB Salamanca, June 12 th, Why Grid Computing? Grids represent a powerful new tool for e-Science –Provide seamless sharing of computing and storage resources –Enable the creation of scalable VOs: Biomed VO –Service Grids (EGEE, EELA) and Opportunistic Grids Benefit for applications demanding non-trivial computing capabilities Local and remote computing and storage facilities

IWPACBB Salamanca, June 12 th, Bioinformatics Grid resources Wide range of Bioinformatics resources through Web Interfaces: –Projects of public databases (genomes, proteins, etc.):  EMBL-EB I(UK), NCBI (USA), DDBJ and PDBJ (Japan), etc. –Web services for Bioinformatics toolkits:  EBI web services, NCBI Entrez Utils, DDBJ, BioMoby services –Bioinformatics Web services Index/registry servers:  EMBRACE service registry (BioCatalogue), BioMoby Central Registry Grid-enabled software packages: –EELA-2: grEMBOSS (UNAM) Grid portals to mask applications –Genius, GridSphere Grid infrastructures & VOs –EGEE related: Biomed, GENE, EELA-prod VOs –myGrid, caBIG, TeraGrid.

IWPACBB Salamanca, June 12 th, How to access MrBayes on Grid Simply sending a standard job to a site –Software must be preinstalled in sites –Successfully tested in several projects  National Grid Service (UK)  FIRB LIBI “International Laboratory for Bioinformatics” project (Italy)  BioinfoGRID project  EELA: MPI version installed and tested in EELA-CIEMAT site –Supported by EELA-2/EGEE sites Grid bureaucracy: certificates, VOs, etc. –Usually Biologists are not advanced grid users Need for friendly interfaces to Grid facilities

IWPACBB Salamanca, June 12 th, PhyloGrid aim Offer to the scientific community an easy interface for calculating phylogenies in Grid without requiring the user knowledge about the computational procedure: –Based on MPI-enabled version of MrBayes  By means of a Taverna workflow –Takes advantage of the computational power of actual Grid infrastructures The use of Taverna Workflows: –Allows multiple database selection –Extendable with access to complementary tools (Clustalw-MPI) or other workflows (MyExperiment repository)

IWPACBB Salamanca, June 12 th, PhyloGrid architecture WMS LFC Catalog SE Portal Certificate GridSphere Portal + WF Enactor/Engine gLite UI + Submission WS HTTPS gLite GRID GRID protocols CE WNs SOAP

IWPACBB Salamanca, June 12 th, Taverna Workflow Mgmt. System A bioinformatician could easily implement Grid Workflows without Grid skills Public workflow repository (myExperiment) Several Plugins to use WS –MyGrid, CaBIG, GridSAM, BioMoby –Many public databases –GT4 services and gRavi developer framework Many tools/plugins –Manipulating files, format converter, local and remote execution, visualization applets, tools for accessing WS

IWPACBB Salamanca, June 12 th, PhyloGrid Workflow for MrBayes Input params received from GridSphere portal ALN/ClustalW, PHYLIP, MSA to NEXUS format Builds NEXUS file for MrBayes Creates JDL file Job submission Nested workflow checks Grid job execution Get output from SE

IWPACBB Salamanca, June 12 th, GridSphere portal PhyloGrid web portal built on top of GridSphere portal framework ( –A Grid portal improves usability of Grids  Hiding complexity of technology involved –A Grid portal improves utilization of Grids  Providing an appealing user-friendly Web Interface  Enforcing Grid utilization policies PKI security, etc. Cohesive Grid portals Snapshot of the virtual work area of PhyloGrid Portal with some results

IWPACBB Salamanca, June 12 th, Future work: GridWay The JDL job approach –Hard to handle job errors into Taverna workflow –gLite plugin for Taverna is under development  Taverna must be installed in a UI or,  Use remote execution to a UI (Taverna remote workflow enactor) GridWay metascheduler –Characteristics  Fully compatible with gLite based Grids (EELA-2, EGEE)  Better resource selection based on internal statistics  Automatic migration and re-schedule of failed jobs  Checkpointing management for large duration tasks –Taverna binding implementation:  WS GRAM interface deployed over GridWay  By means of GT4 plugins or directly implementing a JSDL plugin

IWPACBB Salamanca, June 12 th, HPV case study with PhyloGrid HPV is a recognized underlying factor in Cervical Cancer: –90% cases shows infection from some HPV strand Complete HPV nucleotide seqs. about 8000 basis long: –E1, E2, E4-E7 early expression and L1, L2 late expression genes –HPV classification according to L1 variability (> 100 types) –Two different categories with respect to oncogenic potential Study: check if this categorization really fits the evolutionary history of HPV –121 HPV sequences –Molecular phylogenetic calculations for L1, L2 and E7 genes

IWPACBB Salamanca, June 12 th, Results obatined with PhyloGrid Molecular Phylogeny of HPV in oncogenes from L1, L2, E7 121 HPV nucleotide sequences of L1 (the major capsid gene) Phylogenetic tree for L1 Broader lines means differences between this tree and tree derived from L2 gene Topology similarity score of 85% between L1 and L2 Conflict with HPV classification based on variability of L1 gene

IWPACBB Salamanca, June 12 th, Results obtained with PhyloGrid (II) 121 HPV nucleotide sequences of the L1 late expression gene Phylogenetic tree for L1 Broader lines means differences between this tree and the tree derived from E7 gene

IWPACBB Salamanca, June 12 th, Results obtained with PhyloGrid (III) 121 HPV nucleotide sequences of L2 late expression gene Phylogenetic tree for L2 Broader lines means differences between this tree and the tree derived from E7 gene

IWPACBB Salamanca, June 12 th, Summary and conclusions PhyloGrid is a tool for Phylogenetic studies on Grid by means of MPI-enabled MrBayes: –Friendly interface (GridSphere portal): no computational or grid skills required to perform calculations. –Automation of tasks: Taverna workflow PhyloGrid takes advantage of the computational power of actual Grid infrastructures –Allowing Phylogenetic analysis on large scale –Reducing the technological divide that a partial scientific community has for accessing computational platforms such as Grid

IWPACBB Salamanca, June 12 th, Thanks for your attention ?

E-science grid facility for Europe and Latin America Contact R. Isea 1 : raul.isea at gmail.comraul.isea at gmail.com E. Montes 2 : esther.montes at ciemat.esesther.montes at ciemat.es A J. Rubio-Montero 2 : antonio.rubio at ciemat.esantonio.rubio at ciemat.es R. Mayo 2 : rafael.mayo at ciemat.esrafael.mayo at ciemat.es 1 Fundación IDEA (Venezuela) 2 CIEMAT (Spain)