Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun.

Similar presentations


Presentation on theme: "BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun."— Presentation transcript:

1 BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun Kim School of Informatics Indiana University

2 CONTENTS Introduction Approach Related Works Microarray technology System Architecture Experiments Conclusion Demo

3 INTRODUCTION Analysis of high throughput microarray experiment Performing microarray analysis is a demanding task for biologists and small research labs Computing infrastructure issue – Computationally intensive – Nontrivial to integrate various bioinformatics applications Exploratory data analysis issue – Multiple tasks in a single batch – Repetitive execution

4 APPROACH On-demand computing resources A suite of microarray analysis applications Reconfigurable GUI workflow composer can alleviate technical burden – Well defined workflow can be repetitively used Web portal Reusable, reconfigurable, high-level workflow execution workbench powered by computing clouds for microarray gene expression analyses

5 RELATED WORKS Efficient and user-friendly workflow composers and execution engine – SIBIOS, BioWBI, KDE Bioscience Distributed and heterogeneous computing resources + Workflow system – Taverna, Triana, Kepler, GNARE, RENCI-Bioportal

6 MICROARRAY TECHNOLOGY A subset of genes is expressed corresponding to environmental changes and its changing needs Dynamics of cell activity Measure gene expression levels of hundreds of thousands of genes within a cell Usage – Function prediction: Guilt by association – Interaction: Co-expression of genes in transcription networks reveals how they interact. – Drug discovery: Identify genes related to certain disease and detect effectiveness of new drugs Source: www.liv.ac.uk/lmf/about_microarrays.htm

7 RESEARCH GOALS Gene expression analysis – Search for similar patterns of genes Similar patterns of gene may reveal the function of a gene with unknown function – Extraction of differentially expressed genes Statistical evaluation – Clustering Protein function prediction Genes with similar expression may need to be studied as a group – Component analysis Hidden structure of expression patterns may be revealed Expression network analysis – Expose hidden structures – Protein-protein interaction (PPI) network analysis Central issue: key role in understanding how a cellular system works Modularity in structure in a network may reflect higher-level functional organization of cellular components

8 MICROARRAY ANALYSIS COMMON TASK Output of a task can plugged into another task Repeat the same set of tasks with small changes of parameters

9 SYSTEM ARCHITECTURE Workflow composer and execution engine Application services Web portal Web Portal Application Services Workflow Composer & Execution Execute Manage Data Create

10 WORKFLOW COMPOSER & EXECUTION ENGINE Introduced in the scientific communities to execute a batch of multiple tasks Enables repetitive tasks easily Directed acyclic graph – Node: application to execute Starting node: input End node: output – Edge: a flow of data Input Output Task A Task B Task C

11 XBaya GUI Workflow composer and execution engine Developed at IU Drag-and-drop compose from workbench Monitor status of workflow execution Application Information Panel Monitor Panel Workbench Panel Workflow Composer Panel Drag-and-drop

12 APPLICATION SERVICES Interoperability among applications can be achieved by Application Services Generic Service Toolkit (Gfac) – Gfac converts command-line bioinformatics application into a web service On-demand computing resources – Amazon Elastic Computing Cloud (EC2) Remote storage services – Amazon Simple Storage Services (S3) – Microsoft Application-Based Storage

13 BioVLAB APPLICATION DEVELOPMENT PROCEDURE Develop a command line app. Install the app. in Amazon EC2 Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage Make a virtual machine image Register the app. by using Gfac Install the app. in Amazon EC2 Let the app. store any output to Amazon S3 / Microsoft Application-Based Storage Make a virtual machine image Register the app. by using Gfac Instantiate EC2 and run the app. by using XBaya User Admin User (Gfac user manual) Gfac Registration form

14 WEB PORTAL Adiministrator – Management of registered applications by Gfac registry portlet – User management and access control User – access of stored data Built by Open Grid Computing Environments (OGCE)

15 ANALYSIS RESOURCES R: statistical learning Bioconductor: microarray analysis Data acquisition: NCBI GEO Microarray DB Similar expression pattern: correlation Differentially expressed gene: limma package Clustering: K-means, hierarchical clustering, QT clustering, biclustering, Self organizing map (SOM) Component Analysis: principal component analysis (PCA) and Independent component analysis (ICA) Network: Database of Interacting Proteins (DIP), Perl Graph package and GraphViz

16 EXPERIMENT Data set: GDS38 – Remotely retrieved from the NCBI GEO database – Time-series gene expression data to observe cell cycle in Saccharomyces cerevisiae yeast genome. – 7680 spots in each 16 samples – Each sample was taken every 7 minutes as cell went through cell cycle. Expression analysis PPI network analysis

17 EXPERIMENTS

18 CONCLUSION Microarray data analysis in virtual environment Coupling computing clouds and GUI workflow engine Effective system design for small research labs

19 FUTURE WORKS Integration of more packages and analyses A system of great flexibility – Integrate various high throughput data Microarray, mass spectronomy, massively parallel sequencing, etc – Integrate various computing resources Clouds, grid, and multi-core PCs – Integrate various public resources NCBI, KEGG, PDB, etc

20 SCREEN SHOTS

21 S3 BROWSER

22 EC2 ACTIVE INSTANCE

23 WORKFLOW FOR CLUSTERING

24 INPUT PARAMETERS

25 WORKFLOW EXECUTION

26 DATA ACQUISITION

27 SUBSET EXTRACTION

28 CLUSTERINGS

29 WORKFLOW TERMINATION

30 EXPERIMENT RESULT

31 DOWNLOAD FILE

32 HEATMAP FOR K-MEANS CLUSTERING

33 ACKNOWLEDGEMENT The work is partially supported by NSF MCB 0731950 and a MetaCyt Microbial Systems Biology grant from Lilly Foundations. Extreme Computing Group at IU – Suresh Marru, Srinath Perera, and Chathura Herath

34 Thank You


Download ppt "BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment Youngik Yang, Jong Youl Choi, Kwangmin Choi, Marlon Pierce, Dennis Gannon, and Sun."

Similar presentations


Ads by Google