Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates Presented by Krishna.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Putting genetic interactions in context through a global modular decomposition Jamal.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic reconstruction
Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab.
Lesson 4 Understanding Genetics. Next Generation Science/Common Core Standards Addressed! HS-LS1-1. Construct an explanation based on evidence for how.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Mutual Information Mathematical Biology Seminar
Evolution of minimal metabolic networks WANG Chao April 11, 2006.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Social Research Methods
Today Concepts underlying inferential statistics
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Demetris Kennes. Contents Aims Method(The Model) Genetic Component Cellular Component Evolution Test and results Conclusion Questions?
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
The Psychology of the Person Chapter 2 Research Naomi Wagner, Ph.D Lecture Outlines Based on Burger, 8 th edition.
Chapter 1: The Research Enterprise in Psychology.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Linkage & Gene Mapping in Eukaryotes
Sequencing a genome and Basic Sequence Alignment
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Protein and RNA Families
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Introduction to biological molecular networks
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
1 Computational functional genomics Lital Haham Sivan Pearl.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Sequence similarity, BLAST alignments & multiple sequence alignments
Chapter 12 Understanding Research Results: Description and Correlation
Social Research Methods
Inferential statistics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
SEG5010 Presentation Zhou Lanjun.
An Introduction to Correlational Research
Presentation transcript:

Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates Presented by Krishna Balasubramanian

2 Contents Introduction Introduction Background Background Method Used - LAPP Method Used - LAPP Results Results Observations Observations Conclusion Conclusion Future Work Future Work

3 Introduction Major focus of genome research: Major focus of genome research: –Deciphering networks of molecular interactions underlying cellular function. Developed a Computational approach: Developed a Computational approach: –Identify detailed relationships btw proteins based on genomic data. The method reveals many previously unidentified higher order relationships The method reveals many previously unidentified higher order relationships

4 Background Patterns across multiple complete genomes have been used to infer biological interactions and functional linkages btw proteins: Patterns across multiple complete genomes have been used to infer biological interactions and functional linkages btw proteins: –2 distinct proteins from one organism genetically fused into a single protein in another organism. –Tendency of 2 proteins to occur in chromosomal proximity across multiple organisms. –Phylogenetic profile approach  Detects functional relationships btw proteins exhibiting statistically similar patterns of presence or absence.  Determine pattern describing a protein’s presence or absence by searching for its homologs across N organisms.

5 Background Original implementations sought to infer “links” btw pairs of proteins with similar profiles. Original implementations sought to infer “links” btw pairs of proteins with similar profiles. A subsequent variation on that idea linked proteins if their profiles represented the negation of each other. A subsequent variation on that idea linked proteins if their profiles represented the negation of each other. Simple notions - with the presence of one protein implying the presence or absence of another. Simple notions - with the presence of one protein implying the presence or absence of another. Such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, and alternate pathways. Such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, and alternate pathways. Higher order logic relationships involving a pattern of presence/absence of multiple proteins expected due to: Higher order logic relationships involving a pattern of presence/absence of multiple proteins expected due to: –Observed complexity of cellular networks. –Evolutionary divergence, convergence, and horizontal transfer events.

6 Method - LAPP Perform complete analysis of logic relations possible btw triplets of phylogenetic profiles. Perform complete analysis of logic relations possible btw triplets of phylogenetic profiles. Demonstrate the power of the resulting logic analysis of phylogenetic profiles (LAPP) to: Demonstrate the power of the resulting logic analysis of phylogenetic profiles (LAPP) to: –Illuminate relationships among multiple proteins. –Infer the coarse function of large numbers of uncharacterized protein families.

7 Logical Relationships to determine presence/absence of Proteins Venn diagrams and logic statements show the 8 distinct kinds of logic functions that describe the possible dependence of the presence of on the presence of A and B, jointly. Venn diagrams and logic statements show the 8 distinct kinds of logic functions that describe the possible dependence of the presence of on the presence of A and B, jointly. Logic functions are grouped together if they are related by a simple exchange of proteins A and B. Logic functions are grouped together if they are related by a simple exchange of proteins A and B.

8 Logical Relationships to determine presence/absence of Proteins There are 8 possible logic relationships combining two phylogenetic profiles to match a third profile. There are 8 possible logic relationships combining two phylogenetic profiles to match a third profile. E.g. 1: protein C might be present if and only if proteins A and B are both present. E.g. 1: protein C might be present if and only if proteins A and B are both present. –Function of protein C is necessary only when the functions of proteins A and B are both present. Gene C may be present if and only if either A or B is present. Gene C may be present if and only if either A or B is present. –Different organisms use two different protein families in combination with a common third protein to accomplish some task. Several of the eight possible logic relationships intuitively understood to describe commonly observed biological scenarios. Several of the eight possible logic relationships intuitively understood to describe commonly observed biological scenarios. However, a few of the logic relationships are not easily related to real biological situations. However, a few of the logic relationships are not easily related to real biological situations.

9 Examples of LAPP based on Phylogenetic Profiles Phylogenetic Profiles Biological examples of LAPP

10 Examples of LAPP.. Cont’d Hypothetical phylogenetic profiles are used to illustrate the eight possible logic functions. Hypothetical phylogenetic profiles are used to illustrate the eight possible logic functions. Real biological e.g. shown to illustrate the ternary relationships identified from actual phylogenetic profiles for the 4 most commonly observed logic types. Real biological e.g. shown to illustrate the ternary relationships identified from actual phylogenetic profiles for the 4 most commonly observed logic types.

11 Identifying Protein Triplets Created a set of binary-valued vectors describing the presence or absence of each of the known protein families across 67 fully sequenced organisms. Created a set of binary-valued vectors describing the presence or absence of each of the known protein families across 67 fully sequenced organisms. Categorized complete set of proteins into 4873 distinct families called clusters of orthologous groups (COGs). Categorized complete set of proteins into 4873 distinct families called clusters of orthologous groups (COGs). Examined all triplet combinations of profiles and rank- ordered them according to how well the logical combination f (a,b) of two profiles predicted a third profile, c. Examined all triplet combinations of profiles and rank- ordered them according to how well the logical combination f (a,b) of two profiles predicted a third profile, c. Neither profile a nor b alone was predictive of c. Neither profile a nor b alone was predictive of c.

12 Identifying Protein Triplets Uncertainty Coefficients calculated for U(c|a), U(c|b), and the logically combined profile U(c|f (a,b)) Uncertainty Coefficients calculated for U(c|a), U(c|b), and the logically combined profile U(c|f (a,b)) –U(x|y) = [H(x) + H(y) – H(x, y)]/H(x) –H is the entropy of individual/joint distributions U can range between 1.0, where x is a deterministic function of y, and 0.0, where x is completely independent of y. U can range between 1.0, where x is a deterministic function of y, and 0.0, where x is completely independent of y. Selected triplets whose individual pairwise uncertainty scores described protein profile c poorly [U(c|a) 0.6] described c well. Selected triplets whose individual pairwise uncertainty scores described protein profile c poorly [U(c|a) 0.6] described c well.

13 Example Synthesis of aromatic amino acids proceeds through the shikimate pathway. Synthesis of aromatic amino acids proceeds through the shikimate pathway. Logic analysis of 5 participating proteins show: Logic analysis of 5 participating proteins show: –Shikimate can be converted to the end product prephenate by one of two possible routes, leading to a type 7 logic relationship. Example showing triplet and pairwise uncertainty coefficients, U.

14 Results When either one shikimate kinase protein family (protein A, COG1685) or an alternate shikimate kinase protein family (protein B, COG0703) is present in an organism, then excitatory postsynaptic potential (EPSP) synthase must also be present (protein C, COG0128) (U ) to carry out the subsequent enzymatic step. When either one shikimate kinase protein family (protein A, COG1685) or an alternate shikimate kinase protein family (protein B, COG0703) is present in an organism, then excitatory postsynaptic potential (EPSP) synthase must also be present (protein C, COG0128) (U ) to carry out the subsequent enzymatic step. The same type 7 logic relationship is also observed between alternate shikimate kinase enzymes and the successive chorismate synthase (protein D, COG0082) and chorismate mutase (protein E, COG1605) enzymatic steps of the pathway. The same type 7 logic relationship is also observed between alternate shikimate kinase enzymes and the successive chorismate synthase (protein D, COG0082) and chorismate mutase (protein E, COG1605) enzymatic steps of the pathway. The ordering of the metabolic steps that follow shikimate kinase is predicted by the value of successive U coefficients, where EPSP synthase (second step, U ) is most strongly linked to shikimate kinase, followed directly by the chorismate synthase (third step, U ) and lastly by chorismate mutase (fourth step, U ). The ordering of the metabolic steps that follow shikimate kinase is predicted by the value of successive U coefficients, where EPSP synthase (second step, U ) is most strongly linked to shikimate kinase, followed directly by the chorismate synthase (third step, U ) and lastly by chorismate mutase (fourth step, U ).

15 Results Cont’d Organisms synthesize chorismate and prephenate from shikimate with the use of only one of two possible alternate routes: pathways consisting of either ordered enzymes A-C-D-E or enzymes B-C-D-E. Organisms synthesize chorismate and prephenate from shikimate with the use of only one of two possible alternate routes: pathways consisting of either ordered enzymes A-C-D-E or enzymes B-C-D-E. LAPP recovers 750,000 previously unknown relationships among protein families (U(c|(f(a,b)) > 0.60; U(c|b) 0.60; U(c|b) < 0.30; U(c|a) < 0.30). Validity assessed by comparing known annotations of the linked proteins. Validity assessed by comparing known annotations of the linked proteins. The ability to recover links between proteins annotated as belonging to a major functional category has been used widely to corroborate computational inferences of protein interactions. The ability to recover links between proteins annotated as belonging to a major functional category has been used widely to corroborate computational inferences of protein interactions.

16 Observations One of the most frequently observed triplet relationships relates three proteins belonging to the cell motility category, confirmation that the triplet associations link proteins closely related in function. One of the most frequently observed triplet relationships relates three proteins belonging to the cell motility category, confirmation that the triplet associations link proteins closely related in function. Other triplets involve two proteins from the motility category and a third protein of another COG category, producing recognizable horizontal and vertical bands in the histogram. Other triplets involve two proteins from the motility category and a third protein of another COG category, producing recognizable horizontal and vertical bands in the histogram. E.g. the category combinations NNU (COG category U, intracellular trafficking and secretion) and NNS (COG category S, unknown function) are also plentiful. E.g. the category combinations NNU (COG category U, intracellular trafficking and secretion) and NNS (COG category S, unknown function) are also plentiful. Connections between these categories make intuitive sense and facilitate placement of unannotated proteins within the context of specific cellular networks of interacting proteins. Connections between these categories make intuitive sense and facilitate placement of unannotated proteins within the context of specific cellular networks of interacting proteins. Section taken from a 3-D histogram that describes the frequency of observed logic relationships in which protein A of the triplet is annotated as belonging to the COG functional category N, cell motility.

17 Observations LAPP leads to a set of statistically significant ternary relationships that are distinct from and more numerous than the ones inferred using traditional pairwise analysis. LAPP leads to a set of statistically significant ternary relationships that are distinct from and more numerous than the ones inferred using traditional pairwise analysis. Matrix of randomized phylogenetic profiles, containing the same individual and pairwise distributions as the native profiles used to assess the probability of observing a given uncertainty coefficient score by chance. Matrix of randomized phylogenetic profiles, containing the same individual and pairwise distributions as the native profiles used to assess the probability of observing a given uncertainty coefficient score by chance. Triplets with U > 0.60 are observed from the unshuffled vectors ~10 2 times more frequently than from shuffled profiles and ~10 4 more frequently when U > Triplets with U > 0.60 are observed from the unshuffled vectors ~10 2 times more frequently than from shuffled profiles and ~10 4 more frequently when U > Plot of the cumulative number of protein triplets recovered at an uncertainty coefficient score greater than a given threshold.

18 Observations Cont’d P value calculated for each triplet relationship by enumerating all possible values of U that could be obtained from shuffled profiles while maintaining the individual and pairwise distributions. P value calculated for each triplet relationship by enumerating all possible values of U that could be obtained from shuffled profiles while maintaining the individual and pairwise distributions. P = number of trials that exceed the observed value of U divided by the total number of trials. P = number of trials that exceed the observed value of U divided by the total number of trials. More than 98% of the identified triplets (U > 0.6) have P 0.6) have P < 0.05, and more than 75% of the identified triplets have P <

19 Observations The 8 distinct logic types occur with widely varying frequencies within the set of significant ternary relationships. The 8 distinct logic types occur with widely varying frequencies within the set of significant ternary relationships. Consistent with our understanding of evolution & biological relationships. Consistent with our understanding of evolution & biological relationships. Logic types 1, 3, 5, and 7 are observed frequently in the biological data. Logic types 1, 3, 5, and 7 are observed frequently in the biological data. Logic types 2, 4, and 8 are more difficult to relate to simple cellular logic and are observed only rarely. Logic types 2, 4, and 8 are more difficult to relate to simple cellular logic and are observed only rarely. Number of identified triplets (U > 0.6) for each of the eight logic function types for randomized (black) and real (gray) phylogenetic profiles.

20 Observations 50 highest scoring relationships (U > 0.75) involving proteins from the cell motility and intracellular trafficking and secretion functional categories.

21 Observations cont’d Cell motility proteins are colored light blue, intracellular trafficking and secretion are colored magenta, and proteins annotated as both are colored in orange. Cell motility proteins are colored light blue, intracellular trafficking and secretion are colored magenta, and proteins annotated as both are colored in orange. Edges are shown between proteins A-C and B-C of each logic triplet, with each edge labeled according to the logic function type used to associate the proteins families. Edges are shown between proteins A-C and B-C of each logic triplet, with each edge labeled according to the logic function type used to associate the proteins families.

22 Observations cont’d The proteins linked include adhesin proteins necessary for bacterial pathogenesis, chemotaxis proteins, and translocase proteins. The proteins linked include adhesin proteins necessary for bacterial pathogenesis, chemotaxis proteins, and translocase proteins. Network contains previously unknown interactions that suggest mechanisms connecting bacterial pathogenesis and chemotaxis. Network contains previously unknown interactions that suggest mechanisms connecting bacterial pathogenesis and chemotaxis. CheZ, a chemotaxis dephosphorylase that regulates cell motility, is linked to the surface receptor and virulence factors adhesin AidA and Flp pilus-associated FimT. CheZ, a chemotaxis dephosphorylase that regulates cell motility, is linked to the surface receptor and virulence factors adhesin AidA and Flp pilus-associated FimT.

23 Conclusion New higher order protein associations detected by LAPP provides a framework to understand the complex logical dependencies that relate proteins to one another in the cell. New higher order protein associations detected by LAPP provides a framework to understand the complex logical dependencies that relate proteins to one another in the cell. Also useful in: Also useful in: –Modeling and engineering biological systems –Generating biological hypotheses for experimentation –Investigating additional protein properties

24 Future Work In all likelihood, logic relationships btw proteins in the cell extend beyond ternary relationships to include much larger sets of proteins. In all likelihood, logic relationships btw proteins in the cell extend beyond ternary relationships to include much larger sets of proteins. Ideas underlying the logical analysis of phylogenetic profiles can be extended to the investigation of other kinds of genomic data: Ideas underlying the logical analysis of phylogenetic profiles can be extended to the investigation of other kinds of genomic data: –Gene expression, –Nucleotide polymorphism –Phenotype data

25 Questions??