1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
SPARCLE = SPArse ReCovery of Linear combinations of Expression Presented by: Daniel Labenski Seminar in Algorithmic Challenges in Analyzing Big Data in.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Functionality & Speciation in Boolean Networks
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Mutual Information Mathematical Biology Seminar
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Data classification based on tolerant rough set reporter: yanan yean.
Biological networks Tutorial 12. Protein-Protein interactions –STRING Protein and genetic interactions –BioGRID Signaling pathways –SPIKE Network visualization.
Simulation and Application on learning gene causal relationships Xin Zhang.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
國立陽明大學生資學程 陳虹瑋. Genetic Algorithm Background Fitness function ……. population selection Cross over mutation Fitness values Random cross over.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Bioquali : tool for analyzing regulatory networks Carito Guziolowski.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Comparison of methods for reconstruction of models for gene expression regulation A.A. Shadrin 1, *, I.N. Kiselev, 1 F.A. Kolpakov 2,1 1 Technological.
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Mining Evolutionary Model MEM Rida E. Moustafa And Edward J. Wegman George Mason University Phone:
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department,
OPTIMAL CONNECTIONS: STRENGTH AND DISTANCE IN VALUED GRAPHS Yang, Song and David Knoke RESEARCH QUESTION: How to identify optimal connections, that is,
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms Yupei Xiong, Univ. of Maryland Bruce Golden, Univ. of Maryland.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Graph Indexing From managing and mining graph data.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Introduction to Machine Learning, its potential usage in network area,
Visualization in Process Mining
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Using Graph Theory to Analyze Gene Network Coherence
Tutorial 12 Biological networks.
Introduction to Soft Computing
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
GPX: Interactive Exploration of Time-series Microarray Data
Anastasia Baryshnikova  Cell Systems 
A task of induction to find patterns
All Pairs Shortest Path Examples While the illustrations which follow only show solutions from vertex A (or 1) for simplicity, students should note that.
Presentation transcript:

1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela José A. Sánchez

2 Outlines Introduction Proposed Methodology Experiments Conclusions

3 Outlines Introduction Proposed Methodology Experiments Conclusions

There is a need to generate patterns of expression, and behavioral influences between genes from microarray. GNs arise as a visual and intuitive solution for gene- gene interaction. They are presented as a graph:  Nodes: are made up of genes.  Edges: relationships among these genes. 4 Introduction Gene Network

5 Many GN inference algorithms have been developed as techniques for extracting biological knowledge PPonzoni et al., GGallo et al., They can be broadly classified as (Hecker M, 2009): BBoolean Network IInformation Theory Model BBayesian Networks

6 Introduction Gene Network Validation in Bioinformatics Once the network has been generated, it is very important to assure network reliability in order to illustrate the quality of the generated model. SSynthetic data based validation This approach is normally used to validate new methodologies or algorithms. WWell-Known data based validation The literature prior knowledge is used to validate gene networks.

Introduction Well-Known Biological data based Validation The quality of a GN can be measured by a direct comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011). However, these approaches are not entirely accurate as they only take direct gene–gene interactions into account for the validation task, leaving aside the weak (indirect) relationships (Poyatos, 2011). 7

8 Outlines Introduction Proposed Methodology Experiments Conclusions

9 Introduction Proposed methodology The main features of our method:  Evaluate the similarities and differences between gene networks and biological database.  Take into account the indirect gene-gene relationships for the validation process.  Using Graph Theory to evaluate with gene networks and obtain different measures.

10 Proposed Methodology B A D C Input Network Biological Database BA E C F Distance Matrices DM IN DM DB DM IN DM DB Coherence Matrix CM CM = |DM IN – DM DB | Floyd Warshall Algorithm CM=|DMi – DMj|

11 Proposed Methodology Floyd-Warshall Algorithm This approach is a graph analysis method that solves the shortest path between nodes. Network ABCEF A02112 B20112 C11021 E11201 F22110 Distance Matrix B F E A C

Proposed Methodology Distance Threshold Distance threshold (δ)  It is used to exclude relationships that lack biological meaning.  This threshold denotes the maximum distance to be considered as relevant in the Distance Matrix generation process.  If the minimum distance between two genes is greater than δ, then no path between the genes will be assumed. 12

ABCEF A02112 B20112 C11021 E11201 F NetworkDistance Matrix δ = 1 Proposed Methodology Distance Threshold B F E A C ABCEF A02112 B20112 C11021 E11201 F22110 ABCEF A0∞11∞ B∞011∞ C110∞1 E11∞01 F∞∞110

14 Proposed Methodology Coherence Matrix (CM) ABCD A0132 B1021 C3201 D2110 ABC A01∞ B101 C∞10 ABCD A01∞2 B1021 C∞201 D2110 ABCEF A02122 B20112 C11011 E21101 F22110 ABCEF A02122 B20112 C11011 E21101 F22110 DM IN DM DB CM=|DMi – DMj|

Coherence Level threshold (θ)  This threshold denotes the maximum coherence level to be considered as relevant in the Coherence Matrix generation process.  It is used to obtain well-Known indices by using the elements of the coherence matrix: Proposed Methodology Obtaining Measures 15 CM i,j |∞-y| ( α ) |v-y|>θ FP FN |∞-∞|( β ) TN |v-y|<=θTP 0< v,y <∞

ABCDE A-1 α 47 B1- β 25 C αβ -18 D421-1 E Coherence Matrix θ = 3 Proposed Methodology ABCDE A-1 α 47 B1- β 25 C αβ -18 D421-1 E7581- ABCDE A-TP α 47 B - β 5 C αβ - 8 D4 - E758 - ABCDE A- α 47 B - β 5 C αβ - 8 D4 - E758 - ABCDE A- α FP BTP- β FP C αβ -TPFP D TP - EFP TP- ABCDE A- α FP BTP- β FP C αβ -TPFP D TP - EFP TP- ABCDE A- FNFP BTP- β FP CFN β -TPFP D TP - EFP TP- ABCDE A- FNFP BTP- β FP CFN β -TPFP D TP - EFP TP- ABCDE A- FNFP BTP-TNTPFP CFNTN-TPFP D TP - EFP TP-

17 Outlines Introduction Proposed Methodology Experiments Conclusions

18 Results Real data experiment Input networks were obtained by applying four inference network techniques on the well-known yeast cell cycle expression data set (Spellman et al., 1998). Soinov et al., Bulashevska et al., Ponzoni (GRNCOP) et al., 2007 Comparison with Well-Known data: BioGrid KEGG SGD YeastNet

19 Results Real data experiment Several studies were carried out using different threshold value combinations:  Distance threshold (δ) and Coherence level threshold (θ) have been modified from one to five, generating 25 different combinations. The results show that the higher δ and θ values, the greater is the noise introduced.  The most representative result, was obtained for δ=4 and θ=1.

20 Results Biogrid KEGG SGD YeastNet Soinov Bulashevska Ponzoni 0,27 0,58 0,31 0,29 0,65 0,34 0,53 0,50 0,82 0, ,42 0,48 0,47 0,45 0,79 0,50 0,69 0,66 0,90 0, Accuracy F-measure

21 Results  These results are consistent with the experiment carried out in Ponzoni et al.,  Ponzoni was successfully compared with Soinov and Bulashevska approaches.

22 Outlines Introduction Proposed Methodology Experiments Conclusions

23 Conclusions A new approach of a gene network validation framework is presented:  The methodology not only takes into account the direct relationships, but also the indirect ones.  Graph theory has been used to perform validation task.

24 Conclusions Experiments with Real Data.  These results are consistent with the experiment carried out in Ponzoni et al.,  Ponzoni was successfully compared with Soinov and Bulashevska approaches.  These behaviours are also found in the obtained results. Ponzoni presents better coherence values than Soinov and Bulashevska in BioGrid, SGD and YeastNet.

25 Future Works The methodology has been improved:  The elements in coherence matrix will be weighted based on the gene-gene relationships distance.  A new measure, based on different databases will be generated. Moreover, a Cytoscape plugin will be implemented.

26 Using Graph Theory to Analyze Gene Network Coherence Thanks for your attention

27 Results - II Biogrid KEGG SGD YeastNet Soinov Bulashevska Ponzoni Gallo 0,27 0,58 0,31 0,29 0,65 0,34 0,53 0,50 0,82 0, ,42 0,48 0,47 0,45 0,79 0,50 0,69 0,66 0,90 0, ,86 0,61 0,73 0,77 0,75 0,47 0,58 0,62 Accuracy F-measure

References 28 Pavlopoulos GA, et al. (2011): Using graph theory to analyze biological networks. BioData Mining, 4:10. Asghar A, et al (2012) Speeding up the Floyd–Warshall algorithm for the cycled shortest path problem. AppliedMathematics Letters 25(1): 1 Bulashevska S and Eils R (2005) Inferring genetic regulatory logic from expression data. Bioinformatics 21(11):2706. Ponzoni I, et al (2007) Inferring adaptive regulationthresh-olds and association rules from gene expressiondata through combinatorial optimization learning.IEEE/ACM Transaction on Computation Biology andBioinformatics 4(4):624. Poyatos JF (2011). The balance of weak and strong interactions in genetic networks. PloS One 6(2):e14598.