Using Graph Theory to Analyze Gene Network Coherence

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Control Case Common Always active
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
SPARCLE = SPArse ReCovery of Linear combinations of Expression Presented by: Daniel Labenski Seminar in Algorithmic Challenges in Analyzing Big Data in.
A New Biclustering Algorithm for Analyzing Biological Data Prashant Paymal Advisor: Dr. Hesham Ali.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Data classification based on tolerant rough set reporter: yanan yean.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Tijana Janjusevic Multimedia and Vision Group, Queen Mary, University of London Clustering of Visual Data using Ant-inspired Methods Supervisor: Prof.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
OPTIMAL CONNECTIONS: STRENGTH AND DISTANCE IN VALUED GRAPHS Yang, Song and David Knoke RESEARCH QUESTION: How to identify optimal connections, that is,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
1 Munther Abualkibash University of Bridgeport, CT.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Introduction to Machine Learning, its potential usage in network area,
Graph clustering to detect network modules
Cluster Analysis II 10/03/2012.
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Bag-of-Visual-Words Based Feature Extraction
CSC317 Shortest path algorithms
Hansheng Xue School of Computer Science and Technology
Business and Management Research
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Pyramid Sketch: a Sketch Framework
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
GPX: Interactive Exploration of Time-series Microarray Data
SEG5010 Presentation Zhou Lanjun.
A Unified View of Multi-Label Performance Measures
Prepared by: Mahmoud Rafeek Al-Farra
A task of induction to find patterns
A task of induction to find patterns
Scalable light field coding using weighted binary images
Presentation transcript:

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela fgomez@upo.es Norberto Díaz-Díaz ndiaz@upo.es José A. Lagares José A. Sánchez Jesús S. Aguilar Hello Good Morning everybody. My name is Francisco Gómez and I ‘m going to present the work: “Using Graph Theory to Analyze Gene Network Coherence”. This work has been performed in collaborations with Norberto Díaz, Jose Antonio Lagares, Jose A. Sanchez and Jesús Aguilar.

Outlines Introduction Proposed Methodology Experiments Conclusions In this work we propose a novel methodology to assess a gene network reability based on a weel-known Graph theory algorithm. The outline of this presentation is: A short Introduction, where I will present the framework of our work. In the Methodology section, We will see our approach in detail. Next, in the Experiments section, I’m going to show the performed experiments. And the most representative obtained results. And finally, The most important conclusions and future works, will be presented.

Outlines Introduction Proposed Methodology Experiments Conclusions In this work we propose a novel clustering algorithm based on the idea of grouping pairs of genes that present similar interactions. The outline of this presentation is: A short Introduction, where I present the framework of our work. In the Method section, I’m going to explain in detail our approach. Next, in the section Experiments, I’m going to show the experiments carried out. And finally, I will summarize the conclusions of our work and future research directions.

Introduction Gene Network There is a need to generate patterns of expression, and behavioral influences between genes from microarray. GNs arise as a visual and intuitive solution for gene-gene interaction. They are presented as a graph: Nodes: are made up of genes. Edges: relationships among these genes. Currently, many biological studies try to model the processes ocurring in living organinms. In this regard, Gene networks have become as a visual, powerful and intuitive tools to represent the behaviour of genes into these processes. The are presented as a graph, where the genes represents the nodes and the relationships between them are the edges.

Introduction Gene Network Many GN inference algorithms have been developed as techniques for extracting biological knowledge Ponzoni et al., 2007. Gallo et al., 2011. They can be broadly classified as (Hecker M, 2009): Boolean Network Information Theory Model Bayesian Networks There are many algorithm to modeling Gene networks. Depending of the model architecture used they can be broadly classified as: Boolean networks, where there are only two possible states fot hte genes (expressed and no-expressed), Information theory networks (like correlation networks) and bayesian networks (Where the bayes theory is used to generate the models).

Introduction Gene Network Validation in Bioinformatics Once the network has been generated, it is very important to assure network reliability in order to illustrate the quality of the generated model. Synthetic data based validation This approach is normally used to validate new methodologies or algorithms. Well-Known data based validation The literature prior knowledge is used to validate gene networks. Once the model is generated a very important task is to asses the predicted relationships in order to prove the quality of the generated network . Typically, we can found two different tecniquesin the literature to validate the model: -synthetic data validation, that use artificial data to compare the results by different techniques. -well-known data based validation, in this way, the obtained results are compared with contrast biological data in order to prove the reliability of the obtained network. Our approach belongs to this technique.

Introduction Well-Known Biological data based Validation The quality of a GN can be measured by a direct comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011). However, these approaches are not entirely accurate as they only take direct gene–gene interactions into account for the validation task, leaving aside the weak (indirect) relationships (Poyatos, 2011). Normally, The quality of a GN can be measured by a direct comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011). However this approaches have the lack as they only take direct gene relationships into account leaving aside indirect ones. Our approach not only takes into account direct but also the indrect ones to make a biological validation. This is posible through the inclusion of graph Theory, as we will see in the methodology section.

Outlines Introduction Proposed Methodology Experiments Conclusions

Introduction Proposed methodology The main features of our method: Evaluate the similarities and differences between gene networks and biological database. Take into account the indirect gene-gene relationships for the validation process. Using Graph Theory to evaluate with gene networks and obtain different measures. As we mentioned before, In this work we use graph theory to make a biological validation of a Gene Network. The main idea is to find similarities and differences between gene networks and prior biological knowledge. With this goal, a well-known graph algorithm is used, concretly the floyd-Warshall algorithm, that is able to find the shortest path between a pair of nodes in a graph, or in our case, between a pair of genes. Finally we will be able to obtain several measures to rate the network quality.

Floyd Warshall Algorithm Proposed Methodology Biological Database Input Network DMIN B A D C DMDB B A E C F CM=|DMi – DMj| Floyd Warshall Algorithm DMIN DMDB Distance Matrices Coherence Matrix CM Measures Firstly i’m going to present a general vision of our proposal. As we mentioned before our method makes an asseasment of an input network through a comparison with prior knowledge. In the first step, the Floyd and warshall algorithm is used to obtain disctance matrices for both networks. These Distance matrices will be used to generate a new one, that we call Coherence Matrix. Using this Coherence Matriz several measures are obtained to stablish the queality of the input network. CM = |DMIN – DMDB|

Proposed Methodology Floyd-Warshall Algorithm This approach is a graph analysis method that solves the shortest path between nodes. Network Distance Matrix B F E A C A B C E F 2 1 Now, i’m going to explain the hole process in detail. The fist step is the generation of the distance matrix for the input network and biological database. With this goal,we use the floy-Warshall Algorithm.This is a graph analysis method that is able to find the shortest path between a pair of nodes, or in our case between a pair of genes. Obviously, the greater the distance, the lower the significance of the relationship between the genes. And to solve this problem,we use a treshol, the distance treshold

Proposed Methodology Distance Threshold It is used to exclude relationships that lack biological meaning. This threshold denotes the maximum distance to be considered as relevant in the Distance Matrix generation process. If the minimum distance between two genes is greater than δ, then no path between the genes will be assumed. Obviously, the greater the distance, the lower the relevance of the relationship. Depending of the performed study, it is possible to choose a threshold to stablish the minimun significance of the relationship With this threshold: -It is used to remove the relationships without relevance. - When the distance between a pair of genes ir greater than the Distance Trhre, we assume that don’t exist a path between them.

Proposed Methodology Distance Threshold Network Distance Matrix δ = 1 B F E A C A B C E F 2 1 A B C E F 2 1 A B C E F ∞ 1 For example, in this case, for a Threshold value of 1. Lets see the gene A, the relations with the genes C and E exxeccd the threshols, so we assume that the relation is no relevant for this case and an infinite value are stored for them.

Proposed Methodology DMDB DMIN A B C E F 2 1 A B C E F 2 1 A B C D 1 ∞ 2 1 A B C E F 2 1 A B C D 1 ∞ 2 A B C D 1 3 2 CM=|DMi – DMj| Coherence Matrix (CM) A B C 1 ∞ Once, the Distance matrices are obtained it is time to generate, the Coherence matrix. This matrix contains the existing gaps amongthe distances of the common genes between the two distance matrices. We have to use the common genes because we have no information to rate the quality of the rest of interactions from the genes that are no present into the database. Each element of the matrix is obtained by the absolute value for the subtraction of the distances previously obtained. For example, in the slide

Proposed Methodology Obtaining Measures Coherence Level threshold (θ) This threshold denotes the maximum coherence level to be considered as relevant in the Coherence Matrix generation process. It is used to obtain well-Known indices by using the elements of the coherence matrix: |v-y|<=θ TP 0< v,y <∞ |v-y|>θ FP Esta diapositiva no explica lo que es el CT CMi,j |∞-y| (α) FN |∞-∞|(β) TN

Proposed Methodology θ = 3 α β α β α β α β α β α β β β A B C D E - TP θ = 3 Coherence Matrix A B C D E - TP α FP β A B C D E - TP FN FP TN A B C D E - TP α FP β A B C D E - TP α 4 7 β 5 8 A B C D E - 1 α 4 7 β 2 5 8 A B C D E - TP α 4 7 β 5 8 A B C D E - 1 α 4 7 β 2 5 8 A B C D E - TP FN FP β A B C D E - TP FN FP β Esto esta mal

Outlines Introduction Proposed Methodology Experiments Conclusions In this work we propose a novel clustering algorithm based on the idea of grouping pairs of genes that present similar interactions. The outline of this presentation is: A short Introduction, where I present the framework of our work. In the Method section, I’m going to explain in detail our approach. Next, in the section Experiments, I’m going to show the experiments carried out. And finally, I will summarize the conclusions of our work and future research directions.

Results Real data experiment Input networks were obtained by applying four inference network techniques on the well-known yeast cell cycle expression data set (Spellman et al., 1998). Soinov et al., 2003. Bulashevska et al., 2005. Ponzoni (GRNCOP) et al., 2007 Comparison with Well-Known data: BioGrid KEGG SGD YeastNet Produced no Obtained. For the experiments we have used 3 yeast networks obtained with different techniques from the well-known yeast dataset (Spellman et al. 1998). The techniques are published by Soinov et al, bulashevka and Ponzoni. For the comparison, we also used 4 well-known gene-gene interaction databases: BioGrid,KEGG, SGD and YeastNet.

Results Real data experiment Several studies were carried out using different threshold value combinations: Distance threshold (δ) and Coherence level threshold (θ) have been modified from one to five, generating 25 different combinations. The results show that the higher δ and θ values, the greater is the noise introduced. The most representative result, was obtained for δ=4 and θ=1.

Soinov Bulashevska Ponzoni Results Soinov Bulashevska Ponzoni Accuracy F-measure Accuracy F-measure Accuracy F-measure Biogrid KEGG SGD YeastNet 0,27 0,58 0,31 0,29 0,42 0,48 0,47 0,45 0,65 0,34 0,53 0,50 0,79 0,50 0,69 0,66 0,82 0,28 1 0,90 0,43 1 Table 1 shows that inference method proposed by Gallo (GRNCOP2) generates the most reliable result, although Ponzoni technique (GRNCOP) provides the best result in three of the four data sets. Soinov approach obtains the worst values.   These results are consistent with the experiment carried out in (Ponzoni et al., 2007). And are cronologically sorted.

Results These results are consistent with the experiment carried out in Ponzoni et al., 2007. Ponzoni was successfully compared with Soinov and Bulashevska approaches. GRNCOP… es mejor poner ponzoni directamente.

Outlines Introduction Proposed Methodology Experiments Conclusions In this work we propose a novel clustering algorithm based on the idea of grouping pairs of genes that present similar interactions. The outline of this presentation is: A short Introduction, where I present the framework of our work. In the Method section, I’m going to explain in detail our approach. Next, in the section Experiments, I’m going to show the experiments carried out. And finally, I will summarize the conclusions of our work and future research directions.

Conclusions A new approach of a gene network validation framework is presented: The methodology not only takes into account the direct relationships, but also the indirect ones. Graph theory has been used to perform validation task.

Conclusions Experiments with Real Data. These results are consistent with the experiment carried out in Ponzoni et al., 2007. Ponzoni was successfully compared with Soinov and Bulashevska approaches. These behaviours are also found in the obtained results. Ponzoni presents better coherence values than Soinov and Bulashevska in BioGrid, SGD and YeastNet. GRNCOP… es mejor poner ponzoni directamente.

Future Works The methodology has been improved: The elements in coherence matrix will be weighted based on the gene-gene relationships distance. A new measure, based on different databases will be generated. Moreover, a Cytoscape plugin will be implemented. The methodology will be improved by weighting the elements of the coherence matrix based on the distance. Futhermore, the methodology will be implemented as a cytoscape plugin in a near future.

Thanks for your attention Using Graph Theory to Analyze Gene Network Coherence Thanks for your attention Thanks for your attention

Soinov Bulashevska Ponzoni Gallo Results - II Soinov Bulashevska Ponzoni Gallo Accuracy F-measure Accuracy F-measure Accuracy F-measure Accuracy F-measure Biogrid KEGG SGD YeastNet 0,27 0,58 0,31 0,29 0,42 0,48 0,47 0,45 0,65 0,34 0,53 0,50 0,79 0,50 0,69 0,66 0,82 0,28 1 0,90 0,43 1 0,75 0,47 0,58 0,62 0,86 0,61 0,73 0,77 Table 1 shows that inference method proposed by Gallo (GRNCOP2) generates the most reliable result, although Ponzoni technique (GRNCOP) provides the best result in three of the four data sets. Soinov approach obtains the worst values.   These results are consistent with the experiment carried out in (Ponzoni et al., 2007). And are cronologically sorted.

References Pavlopoulos GA, et al. (2011): Using graph theory to analyze biological networks. BioData Mining, 4:10. Asghar A, et al (2012) Speeding up the Floyd–Warshall algorithm for the cycled shortest path problem. AppliedMathematics Letters 25(1): 1 Bulashevska S and Eils R (2005) Inferring genetic regulatory logic from expression data. Bioinformatics 21(11):2706. Ponzoni I, et al (2007) Inferring adaptive regulationthresh-olds and association rules from gene expressiondata through combinatorial optimization learning.IEEE/ACM Transaction on Computation Biology andBioinformatics 4(4):624. Poyatos JF (2011). The balance of weak and strong interactions in genetic networks. PloS One 6(2):e14598.