Integromics: a grid-enalbled platform for integration of advanced bioinformatics tools and data Luca Corradi Luca Corradi BIO-Lab,

Slides:



Advertisements
Similar presentations
CHEP 2000, Roberto Barbera Roberto Barbera (*) GENIUS: a Web Portal for the GRID Meeting Grid.it, Bologna, (*) work in collaboration.
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
NYU Microarray Database (NYUMAD)
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Workload Management Massimo Sgaravatto INFN Padova.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Riccardo Bruno INFN.CT Sevilla, Sep 2007 The GENIUS Grid portal.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
INAF - National Institute for Astrophysics The National Institute for Astrophysics coordinates and participates in the Astronomy and Astrophysics (A&A)
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
DISTRIBUTED COMPUTING
A Grid Environment for Medical Imaging A Grid Environment for Medical Imaging LRMN Sorina POP, Tristan GLATARD.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America GENIUS server installation and configuration.
INFSO-RI Enabling Grids for E-sciencE The GENIUS Grid portal Tony Calanducci INFN Catania - Italy First Latin American Workshop.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
The Grid computing Presented by:- Mohamad Shalaby.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Migrating Desktop Marcin Płóciennik Marcin Płóciennik Kick-off Meeting, Santander, Graphical.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Media Grid Initiative By A/Prof. Bu-Sung Lee, Francis Nanyang Technological University.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
INFSO-RI Enabling Grids for E-sciencE A Grid Approach to Distributed Image Analysis for Early Diagnosis of Alzheimer Disease Livia.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Workflows Description, Enactment and Monitoring in SAGA Ashiq Anjum, UWE Bristol Shantenu Jha, LSU 1.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Antonio Fuentes RedIRIS Barcelona, 15 Abril 2008 The GENIUS Grid portal.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Using iRODS with the EnginFrame Grid Portal into the GRIDA3 project Francesco Locunto Marco Piras Matteo Vocale.
Workload Management Workpackage
Integrating Scientific Tools and Web Portals
A web portal for management of biological data and applications
Roberto Barbera (a nome di Livia Torterolo)
Pipeline Execution Environment
Attività grid di Biomedicina in Italia e in Europa
Module 01 ETICS Overview ETICS Online Tutorials
How To Integrate an Application on Grid
Overview of Workflows: Why Use Them?
Presentation transcript:

Integromics: a grid-enalbled platform for integration of advanced bioinformatics tools and data Luca Corradi Luca Corradi BIO-Lab, DIST University of Genoa

2 Integromics Cancer research goal: tailor treatment to the molecular profile of an individual patient's tumor Microarrays and other 'omic' technologies allow to study tens of thousand of genes simultaneously Tools and methodologies used lack of standardization and repeatability Need of an "integromic" platform to: –Develop integrative ('integromic') analyses of the data –Combine tools available for genomics Better results, higher quality of work

3 Focus on... How to exploit the backend gLite infrastructure and a HPC environment to integrate bioinformatics tools and data How a Grid Portal can: –integrate heterogeneous tools and data –simplify user interaction through customized web interfaces –increase usability and efficiency Case study: example of correlation between genomics data and clinical data through a combination of processing tools provided by the platform

4 The challenges Manage large volumes of bioinformatics data Deal with complex issues as different formats, distributed locations, time- consuming tasks, computational needs Integrate heterogeneous tools and platforms Speed up analysis process through automated metodologies Improve efficiency and quality of work Make the system usable and accessible

5 Microarray technology Computation of genes expression values of thousands genes at the same time Collection of microscopic DNA spots, representing single genes, arrayed on a solid surface by covalent attachment to chemically suitable matrices Estimation of the absolute value of gene expression

6 Analyse large microarray datasets for breast cancer prognosis assessment Run several R/Bioconductor scripts Deploy a re-usable and reliable service Avoid errors, increase repeatability Create a processing pipeline where new algorithms and data analysis techniques can be tested Create a set of “atomic” components that can be combined into workflows The use case

Data Analysis Tools R/Bioconductor Free software environment for statistical computing and graphics Bioconductor is a series of R packages specific for bioinformatics community Active user community Dchip Free software for analysis and visualization of gene expression data Affymetrix Power Tools (APT)‏. Cross-platform command line programs that implement algorithms for analyzing Affymetrix GeneChip arrays

8 Parallel dChip execution Module 1 – n jobs each opening N/n Files and normalizing them – Each job produces N/n CSV Files (matching with input files)‏ ‏ Module 2 – m jobs each opening all N CSV Files and computing genes expression values concerneing a certain group of genes – Each job produces one CSV File Module 3 – One job opening the m expression files – It searches for differentially expressed genes and it performs clustering of results Mod1 1 Mod1 2 Mod1 n Mod2 1 Mod2 2 Mod2 m CSV 1 CSV 2 CSV m CSV 1 CSV N Mod3 CEL 1 CEL N CEL N/n

9 Parallel APT execution

10 Analyze large microarray datasets for breast cancer prognosis assessment Concatenate phenodata and expression results Mix of custom and R programs Automatic analysis and plot creation The service

11 The BioMedicalPortal Based on EnginFrame, an industry proven production-grade grid-portal (public/private academic and industry customer worldwide)

12 BM Portal Grid Users NON-Grid users Client Apps Web Service Interface User Web Interface gLite Clusters (LSF, PBS, LL, etc..)‏ AMGA Grid AMGA local GSAF Secure Storage Other Grids NorduGrid, Globus, SRB, AliEn, etc… other Grid DBs WLM Engin Frame APIs based on EnginFrame product from NICE srl data management and secure storage layer are based on GSAF / Secure Storage APIs BMPortal Architecture

13 BioMedicalPortal services User management, authentication and authorization services Data management (extension to metadata support on GRID)‏ Job submission (GRID, local, remote cluster) and monitoring Support for every programming and scripting language Plugin strategy for applications integration Web services interface Workflow management system Lots of software and applications already integrated etc......

14 gLite plugin & GWT Authentication, Authorization using VOMS (client side applet is coming)‏ Job submission and monitoring, retrieve and result visualization Preference settings (RB, CE, …)‏ Traditional LFC based data management New Google Web Toolkit interfaces for GSAF integration via Java API using VOMS credentials

15 8 User can check the job status, exit results or messages Users 1 User submits and monitor work via a standard web browser WinLX UXMac Users BMPortal 2 BMPortal checks input parameters and files, and submits a job to gLite gLite UI EF Server&Agent Application 3 The RB matches the user requirements with the available resources on the Grid EGEE gLite infrastructure Input files - primary - include 4 The job starts 5 Results are written to the input file directory 6 Streaming output allows to monitor the progress of the job 7 Job is done Testbed architecture Local or remote cluster (LSF)‏

16 Analysis /1 EnginFrame Grid portal interface (web access)‏ Input data selection (Affy.CEL files, phenodata, gene list)‏

17 Analysis /2 –Services execution & monitoring –Users can come back after coffee

18 Result visualization in portal spooler area (txt files, images, etc.)‏ Analysis /3

19 Impact Addressed to bio-medical researchers without specific computation skills The collaboration between molecular oncologists and software engineers allowed for the optimization of the system without loosing flexibility Scales up in the size of processed data above current available Desktop Personal computer limitations Following the Software as a Service paradigm, users can focus on experimental design rather than infrastructure.

20 Each processing step is an “atomic” service Services can be invoked one by one Now services are composed using EnginFrame portal features and LSF scheduler tools But… Atomic services

21 Current work (1) Viasual and easy WF monitoring Totally integrated with the EnginFrame job monitoring and data access Useful for very long lasting workflows User-designed “virtual experiments

22 Current work (2) Integration of new algorithms (multi-chip quality control, across-platform data integration, etc...)

23 Current work (3) Possibility to perform different analyses in a parallel way

24 Acknowledgements Part of this work is developed within the Italian FIRB project LITBIO (Laboratory for Interdisciplinary Technologies in BlOinformatics). Thanks are due to Ulrich Pfeffer and his functional genomics group at IST (National Institute for Cancer Research) of Genoa, Italy for their support.

25 Thank you!