Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.

Slides:



Advertisements
Similar presentations
Molecular Systems Biology 3; Article number 140; doi: /msb
Advertisements

Control Case Common Always active
Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Gene prioritization through genomic data fusion Aerts et. al., Nature Biotechnology, 24, , 2006 November 21st, 2008 ENDEAVOUR
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Biological networks Tutorial 12. Protein-Protein interactions –STRING Protein and genetic interactions –BioGRID Signaling pathways –SPIKE Network visualization.
Classification of Gene-Phenotype Co-Occurences in Biological Literature Using Maximum Entropy CIS Term Project Proposal November 1, 2002 Sharon Diskin.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Metagenomic Analysis Using MEGAN4
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Networks and Interactions Boo Virk v1.0.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Biological networks Tutorial 12. Protein-Protein interactions –STRING Protein and genetic interactions –BioGRID Network visualization –Cytoscape Cool.
Prediction of proteins that participate in learning process by machine learning Dan Evron Miri Michaeli Project Advisors: Dr. Gal Chechik Ossnat Bar Shira.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Phenotype And Trait Ontology (PATO) and plant phenotypes
By Jay Krishnan. Introduction Information gathered from Proteomic techniques + neuroscientific research = Information on protein composition and function.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
The TDR Targets Database Prioritizing potential drug targets in complete genomes.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
David Amar, Tom Hait, and Ron Shamir
Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders  Ariel Feiglin, Bryce K. Allen,
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Figure 1. Pictorial overview of the analysis of pairwise co-citations of protein–protein interactions by different source databases from individual publications.
Analysis of bio-molecular networks through RANKS (RAnking of Nodes
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
CIS Term Project Proposal November 1, 2002 Sharon Diskin
Volume 125, Issue 4, Pages (May 2006)
Walking the Interactome for Prioritization of Candidate Disease Genes
Volume 11, Issue 6, Pages (May 2015)
Anastasia Baryshnikova  Cell Systems 
Systems-wide Identification of cis-Regulatory Elements in Proteins
Characteristics of tissue‐specific co‐expression networks (CNs)‏
Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders  Ariel Feiglin, Bryce K. Allen,
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Universal microbial diagnostics using random DNA probes
Presentation transcript:

Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis vs ToppGene (functional prioritization method). Results: For the first time, the PageRank and HITS algorithms and the K-Step Markov method used in Web and social network analysis, are applied to a PPIN to prioritize disease candidate genes. Conclusion: PPIN-based candidate gene prioritization performs better than all others gene features or annotation. It can be successfully used for disease candidate gene prioritization.

Background-1 Most of the current disease candidate gene identification and prioritization methods rely on functional annotations from different data sources: GO, Pathways,Domains, Expressions.. In their recent work, the authors used a functional prioritization method named ToppGene: they integrated functional data with Mouse Phenotype data. ToppGene outperforms better than the other published functional prioritization methods. In these methods there is a limitation, with regard to the coverage of the gene functional annotation: - only a fraction of human genome is annotated with pathways and phenotypes - 2/3 of all genes are annotated by at least one functional annotation - 1/3 is yet to be annotated

Background-2 Different approach In this study, for the first time, they applied to a PPIN, social and Web- network analysis-based algorithms to prioritize disease candidate genes PPIN represented as unweighted, undirected, simple graph G (V, E); genes are nodes, interactions are edges, V all genes, E all interactions. The set of known disease genes (seeds) is denoted as R. Prioritization approaches are based on the methods of White and Smyth whose framework of four successive problem formulations defines the approach to rank nodes in the unweighted graph G (V,E).

Methods-1 White and Smyth problem formulations: 1.Given G, where t and r are both nodes in G, compute the Importance I(t|r) of the node t respect to the root r 2.Given G and a root node r in G, rank all vertices in T, a subset of vertices in G and for each node in t in T compute I(t|r) 3.Given G and a set of root node R in G, rank all vertices in T. The I(t|R) is the average sum of importance of each node in R: I(t|R) = (1/|R|)(sum(I(t|r)) 4. Given G, rank all nodes where R=T=V The solution of the formulation 3 is what is needed in this study: here the problem is to prioritize a set of genes in the network based on their importance to a set of root genes (genes known to be associated with a disease). The importance of a gene to the set of root genes is just the average sum of its importance towards each individual root gene.

Methods-2 The solution is to find I(t|r), the importance of the node t with respect to a root node r. They used the three algorithms from White and Smyth methods: 1.PageRank 2.HITS 3. K-Step Markov

Methods-3 Human protein interactions network The Human protein-protein interactions were extracted from the NCBI Entrez Gene FTP site with 8340 nodes and edges (BIND, BioGRID, HPRD). Evaluation of PPIN for gene prioritization they used the same training data, from their previous study, comprising 19 diseases on OMIM (Online Mendelian Inheritance in Man) and GAD (Genetic Association Database) databases. A total ol 693 associated genes. 589 genes were used in the cross validation. Cardiac septal defect candidate gene prioritization From NCBI’s OMIM databse: 166 OMIM records were extracted; they had the label “atrial septal defect”. 81 genes were mapped on these records and used as the training set. 431 genes (from interactions) used for ranking (test set).

Results-1 Cross validation 13 conditions with 3 algorithms different parameter settings repeated 5 times Rank-based ROC curves were plotted, and AUC values were used to quantitatively measure the performance.

Results-2

Results-3 Top 20 ranked genes *Genes associated with cardiac development or malformation: 15 ToppGen, 14 PPIN-based method #(hash) genes associated with septal defects: 6 ToppGene, 3 PPIN-based method A combined functional annotations and PPIN-based methods are more effective in identifying and ranking of disease candidate genes Mouse embryos lacking p300 protein (EP300 gene) show ventricular septal defects Truncated CBP protein (CREBBP gene) leads, in mice, atrial and ventricular septal defects Mice with deletion of Erbb2 show ventricular septal defects (VSD) Suggesting that the human ortholog ERBB2 could be a potential candiadte gene for VSD

Results-4 Prioritized candidate genes of cardiac septal defects using both functional annotation- and PPIN- based methods.

Results-5 AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls.

Conclusions-1 PageRank, HITS, K-Step Markov algorithms were applied on a Literature- based and manually curated protein interactions network. Goal: to prioritize disease candidate genes. Known disease-related genes was used as a training set ("seeds"), and the candidate genes were ranked. Network-based methods are generally not as effective as the integrated functional annotation-based methods. By comparing PPIN-based methods to the individual functional annotation features, network-based methods are better than all annotations. Therefore, PPINs can be a good feature for disease candidate gene prioritization, especially when the genes lack all other functional annotations or are sparsely annotated.

Conclusions-2 Limitations: Just like functional annotation-based methods, the performance depends on the quality of interaction data (missing interactions and false positives). Solutions: better fit with biological networks (e.g., using weighted nodes - genes or proteins - or edges – interactions-). integrate the method with other methods (e.g., combining results from functional annotation-based methods and expression profiles with network- based approaches). It is expected that using both functional annotations and PPIN-based topological parameters may better facilitate the discovery and prioritization of disease genes.