BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
Wrapping Scientific Applications as Web Services Gopi Kandaswamy (RENCI) Marlon Pierce (IU)
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Fuzzy K means.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Yike Guo/Jiancheng Lin InforSense Ltd. 15 September 2015 Bioinformatics workflow integration.
Future Grid Future Grid User Portal Marlon Pierce Indiana University.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Software Architecture
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Software for Science Gateways: Open Grid Computing Environments Marlon Pierce, Suresh Marru Pervasive Technology Institute Indiana University
OGCE Workflow Suite GopiKandaswamy Suresh Marru SrinathPerera ChathuraHerath Marlon Pierce TeraGrid 2008.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
GMOD Projects at the Center for Genomics and Bioinformatics Chris Hemmerich - Indiana University, Bloomington.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
WS-PGRADE portal and its usage in the CancerGrid project M. Kozlovszky, P. Kacsuk Computer and Automation Research Institute of the Hungarian Academy of.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Performance Evaluation of Image Conversion Module Based on MapReduce for Transcoding and Transmoding in SMCCSE Speaker : 吳靖緯 MA0G IEEE.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
1 Media Grid Initiative By A/Prof. Bu-Sung Lee, Francis Nanyang Technological University.
Clouds in Bioinformatics Rob Knight HHMI and University of Colorado at Boulder.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Distributed Computing With Triana A Short Course Matthew Shields, Ian Taylor & Ian Wang.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
OGCE Components for Enhancing UltraScan Job Management. Suresh Marru,Raminder Singh, Marlon Pierce.
Migrating Desktop Bartek Palak Bartek Palak Poznan Supercomputing and Networking Center The Graphical Framework.
An overview of Bioinformatics. Cell and Central Dogma.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
1 Practical information for the GEMLCA / P-GRADE hands-on Tamas Kiss University of Westminster.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
High throughput biology data management and data intensive computing drivers George Michaels.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
Presenter: Bradley Green.  What is Bioinformatics?  Brief History of Bioinformatics  Development  Computer Science and Bioinformatics  Current Applications.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Deployment of Flows Loretta Auvil
Open Grid Computing Environments
Large Scale Distributed Computing
Scientific Workflows Lecture 15
Presentation transcript:

BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun Kim School of Informatics Indiana University

CONTENTS Introduction Approach Related Works Microarray technology System Architecture Experiments Conclusion Demo

INTRODUCTION Analysis of high throughput microarray experiment Performing microarray analysis is a demanding task for biologists and small research labs Computing infrastructure issue – Computationally intensive – Nontrivial to integrate various bioinformatics applications Exploratory data analysis issue – Multiple tasks in a single batch – Repetitive execution

APPROACH On-demand computing resources A suite of microarray analysis applications Reconfigurable GUI workflow composer can alleviate technical burden – Well defined workflow can be repetitively used Web portal Reusable, reconfigurable, high-level workflow execution workbench powered by computing clouds for microarray gene expression analyses

RELATED WORKS Efficient and user-friendly workflow composers and execution engine – SIBIOS, BioWBI, KDE Bioscience Distributed and heterogeneous computing resources + Workflow system – Taverna, Triana, Kepler, GNARE, RENCI-Bioportal

MICROARRAY TECHNOLOGY A subset of genes is expressed corresponding to environmental changes and its changing needs Dynamics of cell activity Measure gene expression levels of hundreds of thousands of genes within a cell Usage – Function prediction: Guilt by association – Interaction: Co-expression of genes in transcription networks reveals how they interact. – Drug discovery: Identify genes related to certain disease and detect effectiveness of new drugs Source:

RESEARCH GOALS Gene expression analysis – Search for similar patterns of genes Similar patterns of gene may reveal the function of a gene with unknown function – Extraction of differentially expressed genes Statistical evaluation – Clustering Protein function prediction Genes with similar expression may need to be studied as a group – Component analysis Hidden structure of expression patterns may be revealed Expression network analysis – Expose hidden structures – Protein-protein interaction (PPI) network analysis Central issue: key role in understanding how a cellular system works Modularity in structure in a network may reflect higher-level functional organization of cellular components

MICROARRAY ANALYSIS COMMON TASK Output of a task can plugged into another task Repeat the same set of tasks with small changes of parameters

SYSTEM ARCHITECTURE Workflow composer and execution engine Application services Web portal Web Portal Application Services Workflow Composer & Execution Execute Manage Data Create

WORKFLOW COMPOSER & EXECUTION ENGINE Introduced in the scientific communities to execute a batch of multiple tasks Enables repetitive tasks easily Directed acyclic graph – Node: application to execute Starting node: input End node: output – Edge: a flow of data Input Output Task A Task B Task C

XBaya GUI Workflow composer and execution engine Developed at IU Drag-and-drop compose from workbench Monitor status of workflow execution Application Information Panel Monitor Panel Workbench Panel Workflow Composer Panel Drag-and-drop

APPLICATION SERVICES Interoperability among applications can be achieved by Application Services Generic Service Toolkit (Gfac) – Gfac converts command-line bioinformatics application into a web service On-demand computing resources – Amazon Elastic Computing Cloud (EC2) Remote storage services – Amazon Simple Storage Services (S3) – Microsoft Application-Based Storage

BioVLAB APPLICATION DEVELOPMENT PROCEDURE Develop a command line app. Install the app. in Amazon EC2 Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage Make a virtual machine image Register the app. by using Gfac Install the app. in Amazon EC2 Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage Make a virtual machine image Register the app. by using Gfac Instantiate EC2 and run the app. by using XBaya User Admin User (Gfac user manual) Gfac Registration form

WEB PORTAL Adiministrator – Management of registered applications by Gfac registry portlet – User management and access control User – access of stored data Built by Open Grid Computing Environments (OGCE)

ANALYSIS RESOURCES R: statistical learning Bioconductor: microarray analysis Data acquisition: NCBI GEO Microarray DB Similar expression pattern: correlation Differentially expressed gene: limma package Clustering: K-means, hierarchical clustering, QT clustering, biclustering, Self organizing map (SOM) Component Analysis: principal component analysis (PCA) and Independent component analysis (ICA) Network: Database of Interacting Proteins (DIP), Perl Graph package and GraphViz

EXPERIMENT Data set: GDS38 – Remotely retrieved from the NCBI GEO database – Time-series gene expression data to observe cell cycle in Saccharomyces cerevisiae yeast genome. – 7680 spots in each 16 samples – Each sample was taken every 7 minutes as cell went through cell cycle. Expression analysis PPI network analysis

EXPERIMENTS

CONCLUSION Microarray data analysis in virtual environment Coupling computing clouds and GUI workflow engine Effective system design for small research labs

FUTURE WORKS Integration of more packages and analyses A system of great flexibility – Integrate various high throughput data Microarray, mass spectronomy, massively parallel sequencing, etc – Integrate various computing resources Clouds, grid, and multi-core PCs – Integrate various public resources NCBI, KEGG, PDB, etc

SCREEN SHOTS

S3 BROWSER

EC2 ACTIVE INSTANCE

WORKFLOW FOR CLUSTERING

INPUT PARAMETERS

WORKFLOW EXECUTION

DATA ACQUISITION

SUBSET EXTRACTION

CLUSTERINGS

WORKFLOW TERMINATION

EXPERIMENT RESULT

DOWNLOAD FILE

HEATMAP FOR K-MEANS CLUSTERING

ACKNOWLEDGEMENT The work is partially supported by NSF MCB and a MetaCyt Microbial Systems Biology grant from Lilly Foundations. Extreme Computing Group at IU – Suresh Marru, Srinath Perera, and Chathura Herath

Thank You