PPI network construction and false positive detection Jin Chen CSE891-002 2012 Fall 1.

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Biological Gene and Protein Networks
Protein domains vs. structure domains - an example.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Affinity chromatography/mass spec Bait protein GST Page 252.
Protein Interactions and Disease Audry Kang 7/15/2013.
BCB 570 Spring Protein-Protein Interaction Networks & methods Julie Dickerson Electrical and Computer Engineering.
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
A highly abbreviated introduction to proteomics
Protein-protein interactions Chapter 12. Stable complex Transient Interaction Transient Signaling Complex Rap1A – cRaf1 Interface 1310 Å 2 Stable complex:
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Network Analysis and Application Yao Fu
Research Methodology of Biotechnology: Protein-Protein Interactions
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
蛋白质相互作用的生物信息学 高友鹤 中国医学科学院 基础医学研究所. 蛋白质相互作用的生物信息学 1. 实验数据 2. 蛋白质相互作用数据库 3. 高通量实验数据的验证 4. 蛋白质相互作用网络 5. 计算预测蛋白质相互作用.
Interactions and more interactions
Protein-protein interactions Courtesy of Sarah Teichmann & Jose B. Pereira-Leal MRC Laboratory of Molecular Biology, Cambridge, UK EMBL-EBI.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Networks and Interactions Boo Virk v1.0.
Finish up array applications Move on to proteomics Protein microarrays.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Proteome and interactome Bioinformatics.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Computational prediction of protein-protein interactions Rong Liu
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
The Mammalian Protein – Protein Interaction Database and Its Viewing System That Is Linked to the Main FANTOM2 Viewer Genome Research (2003) Speaker: 蔡欣吟.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
Biol 729 – Proteome Bioinformatics Dr M. J. Fisher - Protein: Protein Interactions.
GO based data analysis Iowa State Workshop 11 June 2009.
How many interactions are there? ~6,200 genes ~6,200 proteins x 2-10 interactions/protein ~12, ,000 interactions Yeast.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Protein-Protein Interactions. A Protein may interact with: –Other proteins –Nucleic Acids –Small molecules Protein Interactions.
Protein-protein Interactions
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Network biology An introduction to STRING and Cytoscape
Presentation transcript:

PPI network construction and false positive detection Jin Chen CSE Fall 1

Layout Protein-protein interaction (PPI) networks PPI network construction PPI network false-positive detection 2

Background Study of interactions between proteins is fundamental to the understanding of biological systems PPIs have been studied through a number of high-throughput experiments PPIs have also been predicted through an array of computational methods that leverage the vast amount of sequence data generated Comparative genomics at sequence level has indicated that species differences are due more to the difference in the interactions between the component proteins, rather than the individual genes themselves * 3 * Valencia A, Pazos F: Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 2002, 12:

Nidhi et al. DSiMB 2009 PPI at different levels 4 3D structure Protein folding Protein docking Domain

Hawoong Jeong 5 PPI at different levels Node – protein Every node represents an unique protein Edge – protein interaction Physical interaction Functional interaction

PPI Identification Concept of PPI ranges from direct physical interactions inferred from experimental methods (yeast two-hybrid) to functional linkages predicted on the basis of computational analysis (based on protein sequences and structures ) Given the difficulties in experimentally identifying PPIs, a wide range of computational methods have been used to identify functional PPIs 6

Domain Fusion Hypothesis: if domains A and B exist fused in a single polypeptide AB in another organism, then A and B are functionally linked 7 Marcotte EM et al. Detecting Protein Function and Protein-Protein Interactions from Genome Sequences. Science, 285(5428)

Domain Fusion Inclusion of eukaryotic sequences increased the robustness of domain fusion predictions * Eukaryotes, with a larger volume, cannot afford to accommodate separate proteins A and B, as the required concentrations of A and B would be prohibitively high, to achieve the same equilibrium concentration of AB. Limitation: low coverage 8 *Veitia RA: Rosetta Stone proteins: "chance and necessity"? Genome Biol 2002,3(2):interactions

Conserved Neighborhood Hypothesis: If the genes that encode two proteins are neighbors on the chromosome in several genomes, the corresponding proteins are likely to be functionally linked 9 Dandekar T et al. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochemical Sci 1998, 23(9):

Conserved Neighborhood The method has been reported to identify high-quality functional relationships The method suffers from low coverage, due to the dual requirement of identifying orthologues in another genome and then finding those orthologues that are adjacent on the chromosome 10 Marcotte EM: Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 2000, 10:

Phylogenetic Profiles Hypothesis: functionally linked proteins would co-occur in genomes Phylogenetic profile of a protein can be represented as a 'bit string', encoding the presence or absence of the protein in each of the genomes considered 11

Co-evolution Hypothesis: Co-evolution requires the existence of mutual selective pressure on two or more species in silico Two-hybrid (i2h) method has been proposed based on the study of correlated mutations in multiple sequence alignments 12 Pazos F et al: In silico Two-Hybrid System for the Selection of Physically Interacting Protein Pairs. Proteins 2002, 47: Protein family A Protein family B

Software: Protein Link Explorer (PLEX) 13 Date, S.V. and E.M. Marcotte, Protein function prediction using the Protein Link EXplorer (PLEX). Bioinformatics, (10): p

Biological Problem  Algorithm  Knowledge 1.Biological hypothesis 2.Mathematical representation 3.Algorithm design 4.Biological verification 14

High-throughput PPI Detection Booming of biotechnology – Yeast-two hybrid / split ubiquitin system – Mass spectrometry – Protein microarrays – etc. Limitations of computational prediction – Low coverage – Locally optimized (pair-wise) – Super-high negative PPI rates 15

Yeast Two-Hybrid Two hybrid proteins are generated with transcription factor domains Both fusions are expressed in a yeast cell that carries a reporter gene whose expression is under the control of binding sites for the DNA-binding domain Reporter Gene Bait Protein Binding Domain Prey Protein Activation Domain

Yeast Two-Hybrid Interaction of bait and prey proteins localizes the activation domain to the reporter gene, thus activating transcription Since the reporter gene typically codes for a survival factor, yeast colonies will grow only when an interaction occurs Reporter Gene Bait Protein Binding Domain Prey Protein Activation Domain

Mating based Split-ubiquitin System Lalonde S et al. Plant J 2008

Biomass The trends for yeast cell growth over time Yeast Cell Growth Rate

PPI Databases STRING – PPIs derived from high-throughput experimental data, mined of databases and literature, analyses of co-expressed genes and also from computational predictions HPRD - Human Protein Reference Database. It integrates information relevant to the function of human proteins in health and disease DIP - Experimentally derived PPIs with assessments. DIP is generally considered as a valuable benchmark or verify the performance of any new method for prediction of PPIs Many others: MIPS, YGD, BIND, TAIR… 20

False-Positive Detection in PPI Networks Background: PPI networks generated with high-throughput methods contain a sizeable number of false-positives and their reproducibility is not satisfactory* Central to the understanding of PPI is the definition of “interaction” itself – Binding energy / Interaction / Complex – We need to define what we mean by interaction 21 * von Mering Comparative assessment of large-scale data sets of protein-protein interactions. Nature ;417(6887):

Useful Data for False-Positive Detection Functional and localization data (Gene Ontology) Indirect high-throughput data (gene and protein expression) Sequence related data ( protein domain (domain fusion), interologs) Structure data (protein 3D structure) Network topological features (connectivity, network motif) 22

Different Hypothesis for Different Data DataExample of Hypothesis Gene OntologyTwo proteins which share a similar annotation are more likely to interact than proteins with different or null annotations Gene ExpressionTwo proteins which have similar genes express patterns are more likely to interact Domain InteractionIf two domains are often found in PPIs, two proteins containing such domains are more likely to interact PPI network topological analysis PPI topologies fit spoke or matrix models are more likely to be true 23 Other hypotheses include: synthetic lethality, interlogs, linear motif, etc.

Gold Standard for PPI Networks For algorithm evaluation and comparison To train a model as positive training data Manually annotated databases such as DIP Interactions from low-throughput experiments True negative set is equally important – Co-localized? No? 24

Estimate PPI Network Reliability Overall index of reliability of a PPI network 25

Estimate PPI Network Reliability “capture-recapture” model - reaching back to the raw counts of observed bait–prey clones of yeast-two hybrid experiments 26 Huang et al. Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps. PLoS Computational Biology 2007

PPI Filtering GOAL: To identify reliable protein complexes from two existing mass spectrometry (MS) data Analyze the data with a purification enrichment (PE) scoring system Using gold standard PPIs, the consolidated dataset is of greater accuracy than the original sets and is comparable to PPIs defined using more conventional small-scale methods 27 Collins et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics Mar;6(3):439-50

PPI Filtering e=0  no evidence for or against the validity of a particular interaction was collected Two types of observations: bait-prey observations and prey-prey observations i and j are two proteins (bait & prey). k indicates a distinct purification. M ij measures indirect evidence due to co-occurrence of proteins i and j as preys in the same purifications 28

PPI Filtering where r representing the probability that a true association will be preserved and detected in a purification experiment and p ijk representing the probability that a bait-prey pair will be observed for nonspecific reasons where n ik prey is the number of preys identified in purification k with bait i, n i bait is the number of times protein i was used as bait, and f j is an estimate of the nonspecific frequency of occurrence of prey j in the dataset 29

PPI Filtering 30

PPI Filtering 31

PPI Filtering PPI topological analysis – First student presentation is about a topological measure called “FS-weight”, which was compared with other topological measures – Suitable for large PPI networks rather than preliminary networks 32

33 "Most good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program." - Linus Torvalds