Reliable Integration Strategy of PPI databases JWH 2009 / 6 / 19.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Development of on-line database & tool for protein interface analysis Suk-hoon Jung.
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
LESSONS FROM THE BIOCREATIVE PROTEIN- PROTEIN INTERACTION (PPI) TASK RegCreative Jamboree, Friday, December, 1st, (2006) MARTIN KRALLINGER, 2006 LESSONS.
Session outline 1.Standards and the problem of data integration Example: PSICQUIC and the PSICQUIC game 2.Introduction to ontologies. Exploring the Gene.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Annotating Molecular Interactions in MINT
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
A Comparative Study on Feature Selection in Text Categorization (Proc. 14th International Conference on Machine Learning – 1997) Paper By: Yiming Yang,
Protein-protein Interactions June 18, Why PPI?  Protein-protein interactions determine outcome of most cellular processes  Proteins which are.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Ch10. Intermolecular Interactions and Biological Pathways
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
PutidaNET :Interactome database service and network analysis of Pseudomonas putida KT2440 (P. putida KT2440) Korean BioInformation Center (KOBIC) Seong-Jin,
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
MN-B-WP II (BInf 2) Bioinformatische Datenbanken Kay Hofmann – Protein Evolution Group Woche 4: Interaktionsdatenbanken.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
by B. Zadrozny and C. Elkan
Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein.
An Ontology for Protein- Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006.
Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Hyun, Bora. Contents Introduction Background & Motivation PreSPI++ Evaluation of PreSPI++ Method DCPPW++ Evaluation Conclusion 2ISI LABORATORY.
DAS for Molecular Interactions Hagen Blankenburg.
Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics.
Proteome and interactome Bioinformatics.
Computational prediction of protein-protein interactions Rong Liu
A Method for Protein Functional Flow Configuration and Validation Woo-Hyuk Jang 1 Suk-Hoon Jung 1 Dong-Soo Han 1
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Introduction to IntAct Pablo Porras Millán, IntAct
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
The Mammalian Protein – Protein Interaction Database and Its Viewing System That Is Linked to the Main FANTOM2 Viewer Genome Research (2003) Speaker: 蔡欣吟.
Selecting Evidence Based Practices Oregon’s initial attempts to derive a process Implementation Conversations 11/10.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
A Map of the Interactome Network of the Metazoan C. elegans Science, Vol 303, , 23 January 2004.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
A Tutorial of the PrePPI Database Presenters: Gabriel Leis and Katrina Sherbina Loyola Marymount University Departments of Biology and Computer Science.
Protein interactions: main methods for detection (all organisms) Two-hybrid8,446 (Co-)Immunoprecipitation567 Interaction adhesion assay225 In vitro binding138.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Create and assess protein networks through molecular characteristics of individual proteins Yanay Ofran et al. ISMB ’06 Presenter: Danhua Guo 12/07/2006.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Protein-protein Interactions
10. Decision Trees and Markov Chains for Gene Finding.
Networks and Interactions
Figure 1. Pictorial overview of the analysis of pairwise co-citations of protein–protein interactions by different source databases from individual publications.
Optimizing Biological Data Integration
Semantic Interoperability and Data Warehouse Design
Functional Coherence in Domain Interaction Networks
Presented by Meeyoung Park
iSRD Spam Review Detection with Imbalanced Data Distributions
TRANSLATED BY: KARUN RAJESH
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Generation of an HPV–human PPI map.
Presentation transcript:

Reliable Integration Strategy of PPI databases JWH 2009 / 6 / 19

Contents Introduction Reason for Recent Problem –Low Prediction Accuracy in newly published PPI data Confidence Leveling Strategies –In terms of Interaction Type and Evaluation Methods Control –In terms of Domain Appearances Evaluation Conclusion & Work to do

Introduction PPI integration –Means merge heterogeneous PPI databases into single data source. –The evident need to integrate multiple sources. –Technical problems exist. It is essential and important because of –Different distributers for different interests. –Most of machine learning based PPI prediction methods(ex, PreSPI) are highly sensitive in different training sets.

Introduction Meanwhile in PreSPI, –Only Database of Interaction Protein (DIP) was used for PPI source. –There was no consideration PPI type. –Its domain source, InterPro has a redundancy problem such as, Domain A Domain B Domain C

Introduction Meanwhile in PreSPI, –Only Database of Interaction Protein (DIP) was used for PPI source. + MINT, IntAct –There was no consideration PPI type. PSI-MI –Its domain source, InterPro has a redundancy problem such as, Pfam-A Domain A Domain B Domain C

Recent Problem Reason for recent low prediction accuracy problem About 33% of PPIs are overlapped Test set may have exactly same PPI which exists in the learning set. Prediction accuracy decreases to 52%, 94% for the sensitivity and specificity respectively.. IntActDIP. +. Integrated DB

Recent Problem 각 DB 별 도메인 분포 분석, 업데이트에 따른 도메인 분포 분석 결과 NEW1: DIP U MINT U IntAct (no pre-processing) NEW2: DIP U MINT U IntAct (no colocalization) NEW3: DIP U MINT U IntAct (no colocalization, no association) OLDNEW1NEW2NEW3 # of PPI pairs # of PPI pairs (domains are known) Portion of available PPI pairs71.5%68.9% 65.8% # of proteins # of proteins (domains are known) Avg. # of domains for one protein Sensitivity63.4%50.68%50.23%49.82%

Recent Problem 각 DB 별 도메인 분포 분석, 업데이트에 따른 도메인 분포 분석 결과 NEW1: DIP U MINT U IntAct (no pre-processing) NEW2: DIP U MINT U IntAct (no colocalization) NEW3: DIP U MINT U IntAct (no colocalization, no association) OLDNEW1NEW2NEW3 # of PPI pairs # of PPI pairs (domains are known) Portion of available PPI pairs72.7%68.9% 65.8% # of proteins # of proteins (domains are known) Avg. # of domains for one protein Sensitivity52.0%50.68%50.23%49.82%

Confidence Leveling Strategy Control Detected Interaction Type Evaluation Method Domain Appearances –When both proteins in binary PPI have rarely appeared domain, they give harm to prediction accuracy.

PSI-MI Ontology Tree Association (MI:0914) Molecules that are experimentally shown to be associated potentially by sharing just one interactor. Often associated molecules are co-purified by a pull-down or coimmunoprecipitation and share the same bait molecule. Physical association (MI:0915) Molecules that are experimentally shown to belong to the same functional or structural complex. Direct interaction (MI:0407) Interaction that is proven to involve only its interactors. Physical interaction (MI:0218) Interaction among molecules that can be direct or indirect. OBSOLETE: splitted to “association; MI:0914” and “physical association; MI:0915”. For remapping consider the experimental setting of an interaction. For bulk remapping a possible criteria is to whatever physical interaction that has among its participant a bait should become “association; MI:0914” the others can become “physical association; MI:0915”. Two hybrid interactions are an expection and can be “physical association; MI:0915”. Interaction Type Control

Evaluation Method(DIP) Confidence Score DIP: dip:0005(high throughput)  non-core dip:0005(high throughput)|dip:0005(high throughput)  core dip:0002(small scale)  core dip:0004(small scale)  core High throughput only  1  non core High throughput multi (non small scale)  2 Small scale  3 One small scale + High throughput  4 Two or more small scale  5 ConfidenceCount

Evaluation Method(MINT) Confidence Score Mint : mint-score  0.0 ~ 1.0 experimental knowledge based  no mint score average mint-score of “MI:0018, two hybrid”  ~ 0.32  ~ 0.49  ~ 0.66  ~ 0.83  ~  5 ConfidenceCount

Evaluation Method(IntAct) Confidence Score IntAct: any child of transcriptional complementation assay  high-throughput We give confidence level such as the way in DIP ConfidenceCount

Domain Appearances 전체 단백질 상호작용에서 매우 드물게 나 타나는 도메인을 보유한 PPI 를 제한함 P1P2 D1D2D3D4 If appearance frequencies of D1, D2, D3 and D4 are smaller than threshold, this PPI will be removed manually. How can we decide threshold? 2116 PPI pairs have PF00012 domain while only one pair has PF domains are appeared only one pair in the PPI databases

Domain Appearances For all PPI pairs, Threshold = 10 (209) –52.1%, 94.3% Threshold = 20 (584) –53.5%, 94.4% Threshold = 30 (1172) –54.3%, 94.3% AppearancesCount <10209 <20584 < < <501611

Conclusion The PPI databases distributed by different research group are heterogeneous and the overlap among them is very small. PPI prediction methods based on machine learning are very sensitive in different training sets. We can take the quality management of integrated PPI database through confidence leveling.