Graph preprocessing. Common Neighborhood Similarity (CNS) measures.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

An Association Analysis Approach to Biclustering website:
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Predicting domain-domain interactions using a parsimony approach Katia Guimaraes, Ph.D. NCBI / NLM / NIH.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach,
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Network Analysis and Application Yao Fu
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
NUS-KI IMS 28 Nov 2005 Protein Function Prediction from Protein Interactions Limsoon Wong.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.
Complementarity of network and sequence information in homologous proteins March, Department of Computing, Imperial College London, London, UK 2.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Protein Folding recognition with Committee Machine Mika Takata.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Community Detection  Definition: Community Detection  Girwan Newman Approach  Hierarchical Clustering.
Project GuideBenazir N( ) Mr. Nandhi Kesavan RBhuvaneshwari R( ) Batch no: 32 Department of Computer Science Engineering.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Application of Association Pattern Mining Techniques to Genomic Data Vipin Kumar University of Minnesota Team Members:
PINALOG Protein Interaction Network Alignment and its implication in function prediction and complex detection Hang Phan Prof. Michael J.E. Sternberg.
Semi-Supervised Clustering
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Assessing Hierarchical Modularity in Protein Interaction Networks
Application of Association Pattern Mining Techniques to Genomic Data
Association Analysis Techniques for Bioinformatics Problems
Sangeeta Devadiga CS 157B, Spring 2007
Schedule for the Afternoon
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

Graph preprocessing

Common Neighborhood Similarity (CNS) measures

Jaccard similarity

Pvalue

Functional Similarity (FS)

Topological Overlap Measure (TOM)

Pair-wise H-Confidence Measure of the affinity of two items in terms of the transactions in which they appear simultaneously [Xiong et al, 2006] For an interaction network represented as an adjacency matrix: – Unweighted Networks: n 1,n 2 =# neighbors of p 1,p 2 m=# shared neighbors of p 1,p 2 – Weighted Networks: n 1,n 2 =sum(weights) of edges incident on p 1,p 2 m = sum of min(weights) of edges to common neighbors of p 1,p 2

H-confidence Example p1p1 p2p2 p3p3 p4p4 p5p5 p1p p2p p3p p4p p5p p1p1 p2p2 p3p3 p4p4 p5p5 p1p p2p p3p p4p p5p Unweighted NetworkWeighted Network Hconf(p 1,p 2 )= min(0.5,0.5) = 0.5 Hconf(p 1,p 2 )= min(0.5/0.6,0.5/1.2) = 0.416

Validation of Final Network Use FunctionalFlow algorithm [Nabieva et al, 2005] on the original and transformed graph(s) – One of the most accurate algorithms for predicting function from interaction networks – Produces likelihood scores for each protein being annotated with one of 75 MIPS functional labels Likelihood matrix evaluated using two metrics – Multi-label versions of precision and recall: – m i = # predictions made, n i = # known annotations, k i = # correct predictions – Precision/accuracy of top-k predictions Useful for actual biological experimental scenarios

Test Protein Interaction Networks Three yeast interaction networks with different types of weighting schemes used for experiments – Combined Composed from Ito, Uetz and Gavin (2002)’s data sets Individual reliabilities obtained from EPR index tool of DIP Overall reliabilities obtained using a noisy-OR – [Krogan et al, 2006]’s data set 6180 interactions between 2291 annotated proteins Edge reliabilities derived using machine learning techniques – DIPCore [Deane et al, 2002] ~5K highly reliable interactions in DIP No weights assigned: assumed unweighted

Results on Combined data set Precision-Recall Accuracy of top-k predictions

Results on Krogan et al’s data set Precision-Recall Accuracy of top-k predictions

Results on DIPCore Precision-Recall Accuracy of top-k predictions

Noise removal capabilities of H-confidence H-confidence and hypercliques have been shown to have noise removal capabilities [Xiong et al, 2006] To test its effectiveness, we added 50% random edges to DIPCore, and re-ran the transformation process Fall in performance of transformed network is significantly smaller than that in the original network

Summary of Results H-confidence-based transformations generally produce more accurate and more reliably weighted interaction graphs: Validated function prediction Generally, the less reliable the weights assigned to the edges in the raw network, the greater improvement in performance obtained by using an h-confidence-based graph transformation. Better performance of the h-confidence-based graph transformation method is indeed due to the removal of spurious edges, and potentially the addition of biologically viable ones and effective weighting of the resultant set of edges.

Conclusions and future directions

References (I) [Pandey et al, 2006] Gaurav Pandey, Vipin Kumar and Michael Steinbach, Computational Approaches for Protein Function Prediction: A Survey, TR , Department of Computer Science and Engineering, University of Minnesota, Twin Cities [Pandey et al, 2007] G. Pandey, M. Steinbach, R. Gupta, T. Garg and V. Kumar, Association analysis-based transformations for protein interaction networks: a function prediction case study. KDD 2007: [Xiong et al, 2005] XIONG, H., HE, X., DING, C., ZHANG, Y., KUMAR, V., AND HOLBROOK, S. R Identification of functional modules in protein complexes via hyperclique pattern discovery. In Proc. Pacific Symposium on Biocomputing (PSB). 221–232. [Xiong et al, 2006a] XIONG, H., TAN, P.-N., AND KUMAR, V Hyperclique Pattern Discovery, Data Mining and Knowledge Discovery, 13(2): [Xiong et al, 2006b] XIONG, H., PANDEY, G., STEINBACH, M., AND KUMAR, V. 2006, Enhancing Data Analysis with Noise Removal, IEEE TKDE, 18(3): [Xiong et al, 2006c] Hui Xiong, Michael Steinbach, and Vipin Kumar, Privacy Leakage in Multi-relational Databases: A Semi-supervised Learning Perspective, VLDB Journal Special Issue on Privacy Preserving Data Management, Vol. 15, No. 4, pp , November, 2006 [Xiong et al, 2004] Hui Xiong, Michael Steinbach, Pang-Ning Tan and Vipin Kumar, HICAP: Hierarchical Clustering with Pattern Preservation, SIAM Data Mining 2004 [Tan et al, 2005] TAN, P.-N., STEINBACH, M., AND KUMAR, V Introduction to Data Mining. Addison-Wesley. [Nabieva et al, 2005] NABIEVA, E., JIM, K., AGARWAL, A., CHAZELLE, B., AND SINGH, M Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, Suppl. 1, i1–i9. [Deng et al, 2003] DENG, M., SUN, F., AND CHEN, T Assessment of the reliability of protein–protein interactions and protein function prediction. In Pac Symp Biocomputing. 140–151. [Gavin et al, 2002] A. Gavin et al. Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415: , 2002 [Hart et al, 2006] G Traver Hart, Arun K Ramani and Edward M Marcotte, How complete are current yeast and human protein-interaction networks, Genome Biology, 7:120, 2006

References (II) [Brun et al, 2003] BRUN, C., CHEVENET, F.,MARTIN, D.,WOJCIK, J., GUENOCHE, A., AND JACQ, B Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology 5, 1, R6 [Samanta et al, 2003] SAMANTA, M. P. AND LIANG, S Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci U.S.A. 100, 22, 12579–12583 [Salwinski et al, 2004] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. NAR 32 Database issue:D449-51, [Gavin et al, 2006] Gavin et al, 2006, Proteome survey reveals modularity of the yeast cell machinery, Nature 440, [Deane et al, 2002] Deane CM, Salwinski L, Xenarios I, Eisenberg D (2002) Protein interactions: Two methods for assessment of the reliability of high-throughput observations. Mol Cell Prot 1: