Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

DNA BLAST Lab.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. We demonstrate an approach to protein clustering,
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Pfam(Protein families )
Structural bioinformatics
Correlated Mutations and Co-evolution May 1 st, 2002.
Bioinformatics and Phylogenetic Analysis
The Protein Data Bank (PDB)
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Biomathematics: Using Graph Models for High School DIMACS Conference on Linking Mathematics & Biology In High Schools 29 April 2005 L. Charles (Chuck)
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
5.4 Cladistics The ancestry of groups of species can be deduced by comparing their base or amino acid sequences.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Protein Tertiary Structure Prediction
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Metagenomic Analysis Using MEGAN4
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Small protein modules with similar 3D structure but different amino acid sequence Institute of Evolution, University of Haifa, ISRAEL Genome Diversity.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Condor: BLAST Rob Quick Open Science Grid Indiana University.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Construction of Substitution matrices
1 Computational functional genomics Lital Haham Sivan Pearl.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Detecting Protein Function and Protein-Protein Interactions from Genome Sequences TuyetLinh Nguyen.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Using BLAST to Identify Species from Proteins
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Bioinformatics study of convertases
Pipelines for Computational Analysis (Bioinformatics)
Using BLAST to Identify Species from Proteins
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Genomic Data Manipulation

Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 112, Issue 7, Pages (April 2017)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Luis Sanchez-Pulido, John F.X. Diffley, Chris P. Ponting 
Alignment IV BLOSUM Matrices
Using BLAST to Identify Species from Proteins
A protein domain interaction interface database: InterPare
Presentation transcript:

Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson Department of Computer Science, University of North Texas Introduction Finding the protein interactions that are responsible for cellular operations has become one of the main goals of proteomics and computational biology. The prediction of protein-protein interactions is a computational intensive problem in bioinformatics. Studies have shown for an evolutionary genomics comparison that the number of protein-protein interactions a protein has negatively correlates with their rates of evolution. Currently there are new and powerful experimental techniques used to discovery these interaction networks. With the current methods being far from perfect, we will show some of the current and effective techniques being used today and how those techniques will help researchers and scientists uncover the evolution of genomes. Fig. 2: This figure shows the scheme of the Mirror Tree method. This method reduces the initial multiple sequence alignments of the two proteins, which leaves only sequences of the same species. The trees constructed from these reduced alignments will have the same number of leaves and the same species in the leaves. From the reduced alignments, the matrices are constructed which contain the average homology for every possible pair of proteins. Such matrices contain the structure of the phylogenetic tree. Finally, a linear correlation coefficient evaluates the similarity between the data sets of the two matrices and implicitly the similarity between the two trees [1]. Second Methodology: The Use of Phylogenetic Trees as Indicators One method to help open up the possibilities of searching for interaction partners between proteins in a large collection of complete genomes and proteins is to use the comparison of the evolutionary distances between the sequences of the associated protein families. This comparison is based on the observations of correspondence between phylogenetic trees of associated proteins. The method measures the similarity between trees as the correlation between the distances matrices used to build the trees based on the mirror tree method. The mirror tree method assumes that functionally correlated proteins evolve in a correlated form (Fig. 2). With Peason’s correlation coefficient based on phylogenetic trees, the mirror tree method is able to evaluate the intensity levels between correlated proteins [1]. References [1] Pazos, F. and Valencia, A., Similarity of phylogenetic trees as indicator of protein-protein interaction, Protein Eng., 14(9):609–614, [2] S. Gong, C. Park, H. Choi, J. Ko, I. Jang, J. Lee, D. Bolser, D. Oh, D. Kim, and J. Bhak, Using Interpare, a protein domain interaction interface database, to identify and classify protein interaction interfaces. [3] Toh, H. and Kanehisa, M., Predicting protein-protein interaction from phylogenetic trees using the partial correlation coefficient. [4] H. Fraser, A. Hirsh, L. Steinmetz, C. Scharfe, and M. Feldman Evolutionary Rate in the Protein Interaction Network, Science Magazine., 296: , [5] H. Fraser, D. Wall, and A. Hirsh A Simple Dependence Between Protein Evolution Rate and the Number of Protein-Protein Interactions, BMC Evolutionary Biology., 3:11, 2003 Final Thoughts and Conclusion Studies have shown for an evolutionary genomics comparison that the number of protein-protein interactions a protein has negatively correlates with their rates of evolution [4]. In order to find the correlation between evolutionary genomics and protein-protein interactions, we must compile several data sets to show any significant relationship, since the correlation can not be shown with a small set of protein-protein interaction [5]. With a large interaction data set, researchers able to assess the quality of the data set through the correlation between protein interaction and evolutionary rate using a simple genomic sequence comparisons statistically. First Methodology: Using Phylogenetic Trees to Predict Protein Interaction with the Partial Correlation Coefficient A new method to predict protein-protein interaction from evolutionary information using partial correlation coefficient extracts direct protein interactions unlike Peason’s correlation coefficient, which only gives indirect interactions between proteins. The partial correlation coefficient uses the comparison of phylogenetic trees of proteins to predict physical protein interactions [3]. Fig. 1: This graph compares the accuracy of the Peason’s coefficient and the partial correlation coefficient using the first five top-ranking predictions. In column 1, the Peason’s correlation coefficient has an accuracy of 20% (1/5). In column 2, the partial correlation coefficient has an accuracy of 80% (4/5) [3]. Fig. 3 (left): The figure shows the protein structure with respect to their geometrical region. This is an example of a 3D structure (SCOP id: d1a25a_) which corresponds to a schematic diagram. It shows the three areas of a domain (red: protein surface, blue: protein interior, filled-in space model: interaction interface). Interface regions are represented as a space-fill model to distinguish them from other regions [2]. Fig. 4 (bottom): This diagram represents the interior, interface, and surface of longitudinal section of a protein domain [2]. Third Methodology: Identifying and Classifying Protein Interaction Interfaces with InterPare InterPare ( is a large-scale protein domain interaction interface database. The interface consists of both inter- chain (between chains) and intra-chain (within chains). The three methods InterPare uses to detect protein-protein interaction are the geometric distance method (PSIMAP), Accessible Surface Area (ASA), and the Voronoi diagram. There are visual tools to display protein interior, surface, and interaction interfaces and statistics of the amino acid propensities of queried protein according to its interior, surface, and interface region (Fig. 3 Left). InterPare makes searching and looking up of protein-protein interaction easy and convenient [2].