Download presentation
Presentation is loading. Please wait.
Published byLewis Parrish Modified over 9 years ago
1
Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates Presented by Krishna Balasubramanian
2
2 Contents Introduction Introduction Background Background Method Used - LAPP Method Used - LAPP Results Results Observations Observations Conclusion Conclusion Future Work Future Work
3
3 Introduction Major focus of genome research: Major focus of genome research: –Deciphering networks of molecular interactions underlying cellular function. Developed a Computational approach: Developed a Computational approach: –Identify detailed relationships btw proteins based on genomic data. The method reveals many previously unidentified higher order relationships The method reveals many previously unidentified higher order relationships
4
4 Background Patterns across multiple complete genomes have been used to infer biological interactions and functional linkages btw proteins: Patterns across multiple complete genomes have been used to infer biological interactions and functional linkages btw proteins: –2 distinct proteins from one organism genetically fused into a single protein in another organism. –Tendency of 2 proteins to occur in chromosomal proximity across multiple organisms. –Phylogenetic profile approach Detects functional relationships btw proteins exhibiting statistically similar patterns of presence or absence. Determine pattern describing a protein’s presence or absence by searching for its homologs across N organisms.
5
5 Background Original implementations sought to infer “links” btw pairs of proteins with similar profiles. Original implementations sought to infer “links” btw pairs of proteins with similar profiles. A subsequent variation on that idea linked proteins if their profiles represented the negation of each other. A subsequent variation on that idea linked proteins if their profiles represented the negation of each other. Simple notions - with the presence of one protein implying the presence or absence of another. Simple notions - with the presence of one protein implying the presence or absence of another. Such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, and alternate pathways. Such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, and alternate pathways. Higher order logic relationships involving a pattern of presence/absence of multiple proteins expected due to: Higher order logic relationships involving a pattern of presence/absence of multiple proteins expected due to: –Observed complexity of cellular networks. –Evolutionary divergence, convergence, and horizontal transfer events.
6
6 Method - LAPP Perform complete analysis of logic relations possible btw triplets of phylogenetic profiles. Perform complete analysis of logic relations possible btw triplets of phylogenetic profiles. Demonstrate the power of the resulting logic analysis of phylogenetic profiles (LAPP) to: Demonstrate the power of the resulting logic analysis of phylogenetic profiles (LAPP) to: –Illuminate relationships among multiple proteins. –Infer the coarse function of large numbers of uncharacterized protein families.
7
7 Logical Relationships to determine presence/absence of Proteins Venn diagrams and logic statements show the 8 distinct kinds of logic functions that describe the possible dependence of the presence of on the presence of A and B, jointly. Venn diagrams and logic statements show the 8 distinct kinds of logic functions that describe the possible dependence of the presence of on the presence of A and B, jointly. Logic functions are grouped together if they are related by a simple exchange of proteins A and B. Logic functions are grouped together if they are related by a simple exchange of proteins A and B.
8
8 Logical Relationships to determine presence/absence of Proteins There are 8 possible logic relationships combining two phylogenetic profiles to match a third profile. There are 8 possible logic relationships combining two phylogenetic profiles to match a third profile. E.g. 1: protein C might be present if and only if proteins A and B are both present. E.g. 1: protein C might be present if and only if proteins A and B are both present. –Function of protein C is necessary only when the functions of proteins A and B are both present. Gene C may be present if and only if either A or B is present. Gene C may be present if and only if either A or B is present. –Different organisms use two different protein families in combination with a common third protein to accomplish some task. Several of the eight possible logic relationships intuitively understood to describe commonly observed biological scenarios. Several of the eight possible logic relationships intuitively understood to describe commonly observed biological scenarios. However, a few of the logic relationships are not easily related to real biological situations. However, a few of the logic relationships are not easily related to real biological situations.
9
9 Examples of LAPP based on Phylogenetic Profiles Phylogenetic Profiles Biological examples of LAPP
10
10 Examples of LAPP.. Cont’d Hypothetical phylogenetic profiles are used to illustrate the eight possible logic functions. Hypothetical phylogenetic profiles are used to illustrate the eight possible logic functions. Real biological e.g. shown to illustrate the ternary relationships identified from actual phylogenetic profiles for the 4 most commonly observed logic types. Real biological e.g. shown to illustrate the ternary relationships identified from actual phylogenetic profiles for the 4 most commonly observed logic types.
11
11 Identifying Protein Triplets Created a set of binary-valued vectors describing the presence or absence of each of the known protein families across 67 fully sequenced organisms. Created a set of binary-valued vectors describing the presence or absence of each of the known protein families across 67 fully sequenced organisms. Categorized complete set of proteins into 4873 distinct families called clusters of orthologous groups (COGs). Categorized complete set of proteins into 4873 distinct families called clusters of orthologous groups (COGs). Examined all triplet combinations of profiles and rank- ordered them according to how well the logical combination f (a,b) of two profiles predicted a third profile, c. Examined all triplet combinations of profiles and rank- ordered them according to how well the logical combination f (a,b) of two profiles predicted a third profile, c. Neither profile a nor b alone was predictive of c. Neither profile a nor b alone was predictive of c.
12
12 Identifying Protein Triplets Uncertainty Coefficients calculated for U(c|a), U(c|b), and the logically combined profile U(c|f (a,b)) Uncertainty Coefficients calculated for U(c|a), U(c|b), and the logically combined profile U(c|f (a,b)) –U(x|y) = [H(x) + H(y) – H(x, y)]/H(x) –H is the entropy of individual/joint distributions U can range between 1.0, where x is a deterministic function of y, and 0.0, where x is completely independent of y. U can range between 1.0, where x is a deterministic function of y, and 0.0, where x is completely independent of y. Selected triplets whose individual pairwise uncertainty scores described protein profile c poorly [U(c|a) 0.6] described c well. Selected triplets whose individual pairwise uncertainty scores described protein profile c poorly [U(c|a) 0.6] described c well.
13
13 Example Synthesis of aromatic amino acids proceeds through the shikimate pathway. Synthesis of aromatic amino acids proceeds through the shikimate pathway. Logic analysis of 5 participating proteins show: Logic analysis of 5 participating proteins show: –Shikimate can be converted to the end product prephenate by one of two possible routes, leading to a type 7 logic relationship. Example showing triplet and pairwise uncertainty coefficients, U.
14
14 Results When either one shikimate kinase protein family (protein A, COG1685) or an alternate shikimate kinase protein family (protein B, COG0703) is present in an organism, then excitatory postsynaptic potential (EPSP) synthase must also be present (protein C, COG0128) (U 0 0.85) to carry out the subsequent enzymatic step. When either one shikimate kinase protein family (protein A, COG1685) or an alternate shikimate kinase protein family (protein B, COG0703) is present in an organism, then excitatory postsynaptic potential (EPSP) synthase must also be present (protein C, COG0128) (U 0 0.85) to carry out the subsequent enzymatic step. The same type 7 logic relationship is also observed between alternate shikimate kinase enzymes and the successive chorismate synthase (protein D, COG0082) and chorismate mutase (protein E, COG1605) enzymatic steps of the pathway. The same type 7 logic relationship is also observed between alternate shikimate kinase enzymes and the successive chorismate synthase (protein D, COG0082) and chorismate mutase (protein E, COG1605) enzymatic steps of the pathway. The ordering of the metabolic steps that follow shikimate kinase is predicted by the value of successive U coefficients, where EPSP synthase (second step, U 0 0.85) is most strongly linked to shikimate kinase, followed directly by the chorismate synthase (third step, U 0 0.66) and lastly by chorismate mutase (fourth step, U 0 0.56). The ordering of the metabolic steps that follow shikimate kinase is predicted by the value of successive U coefficients, where EPSP synthase (second step, U 0 0.85) is most strongly linked to shikimate kinase, followed directly by the chorismate synthase (third step, U 0 0.66) and lastly by chorismate mutase (fourth step, U 0 0.56).
15
15 Results Cont’d Organisms synthesize chorismate and prephenate from shikimate with the use of only one of two possible alternate routes: pathways consisting of either ordered enzymes A-C-D-E or enzymes B-C-D-E. Organisms synthesize chorismate and prephenate from shikimate with the use of only one of two possible alternate routes: pathways consisting of either ordered enzymes A-C-D-E or enzymes B-C-D-E. LAPP recovers 750,000 previously unknown relationships among protein families (U(c|(f(a,b)) > 0.60; U(c|b) 0.60; U(c|b) < 0.30; U(c|a) < 0.30). Validity assessed by comparing known annotations of the linked proteins. Validity assessed by comparing known annotations of the linked proteins. The ability to recover links between proteins annotated as belonging to a major functional category has been used widely to corroborate computational inferences of protein interactions. The ability to recover links between proteins annotated as belonging to a major functional category has been used widely to corroborate computational inferences of protein interactions.
16
16 Observations One of the most frequently observed triplet relationships relates three proteins belonging to the cell motility category, confirmation that the triplet associations link proteins closely related in function. One of the most frequently observed triplet relationships relates three proteins belonging to the cell motility category, confirmation that the triplet associations link proteins closely related in function. Other triplets involve two proteins from the motility category and a third protein of another COG category, producing recognizable horizontal and vertical bands in the histogram. Other triplets involve two proteins from the motility category and a third protein of another COG category, producing recognizable horizontal and vertical bands in the histogram. E.g. the category combinations NNU (COG category U, intracellular trafficking and secretion) and NNS (COG category S, unknown function) are also plentiful. E.g. the category combinations NNU (COG category U, intracellular trafficking and secretion) and NNS (COG category S, unknown function) are also plentiful. Connections between these categories make intuitive sense and facilitate placement of unannotated proteins within the context of specific cellular networks of interacting proteins. Connections between these categories make intuitive sense and facilitate placement of unannotated proteins within the context of specific cellular networks of interacting proteins. Section taken from a 3-D histogram that describes the frequency of observed logic relationships in which protein A of the triplet is annotated as belonging to the COG functional category N, cell motility.
17
17 Observations LAPP leads to a set of statistically significant ternary relationships that are distinct from and more numerous than the ones inferred using traditional pairwise analysis. LAPP leads to a set of statistically significant ternary relationships that are distinct from and more numerous than the ones inferred using traditional pairwise analysis. Matrix of randomized phylogenetic profiles, containing the same individual and pairwise distributions as the native profiles used to assess the probability of observing a given uncertainty coefficient score by chance. Matrix of randomized phylogenetic profiles, containing the same individual and pairwise distributions as the native profiles used to assess the probability of observing a given uncertainty coefficient score by chance. Triplets with U > 0.60 are observed from the unshuffled vectors ~10 2 times more frequently than from shuffled profiles and ~10 4 more frequently when U > 0.80. Triplets with U > 0.60 are observed from the unshuffled vectors ~10 2 times more frequently than from shuffled profiles and ~10 4 more frequently when U > 0.80. Plot of the cumulative number of protein triplets recovered at an uncertainty coefficient score greater than a given threshold.
18
18 Observations Cont’d P value calculated for each triplet relationship by enumerating all possible values of U that could be obtained from shuffled profiles while maintaining the individual and pairwise distributions. P value calculated for each triplet relationship by enumerating all possible values of U that could be obtained from shuffled profiles while maintaining the individual and pairwise distributions. P = number of trials that exceed the observed value of U divided by the total number of trials. P = number of trials that exceed the observed value of U divided by the total number of trials. More than 98% of the identified triplets (U > 0.6) have P 0.6) have P < 0.05, and more than 75% of the identified triplets have P < 0.005.
19
19 Observations The 8 distinct logic types occur with widely varying frequencies within the set of significant ternary relationships. The 8 distinct logic types occur with widely varying frequencies within the set of significant ternary relationships. Consistent with our understanding of evolution & biological relationships. Consistent with our understanding of evolution & biological relationships. Logic types 1, 3, 5, and 7 are observed frequently in the biological data. Logic types 1, 3, 5, and 7 are observed frequently in the biological data. Logic types 2, 4, and 8 are more difficult to relate to simple cellular logic and are observed only rarely. Logic types 2, 4, and 8 are more difficult to relate to simple cellular logic and are observed only rarely. Number of identified triplets (U > 0.6) for each of the eight logic function types for randomized (black) and real (gray) phylogenetic profiles.
20
20 Observations 50 highest scoring relationships (U > 0.75) involving proteins from the cell motility and intracellular trafficking and secretion functional categories.
21
21 Observations cont’d Cell motility proteins are colored light blue, intracellular trafficking and secretion are colored magenta, and proteins annotated as both are colored in orange. Cell motility proteins are colored light blue, intracellular trafficking and secretion are colored magenta, and proteins annotated as both are colored in orange. Edges are shown between proteins A-C and B-C of each logic triplet, with each edge labeled according to the logic function type used to associate the proteins families. Edges are shown between proteins A-C and B-C of each logic triplet, with each edge labeled according to the logic function type used to associate the proteins families.
22
22 Observations cont’d The proteins linked include adhesin proteins necessary for bacterial pathogenesis, chemotaxis proteins, and translocase proteins. The proteins linked include adhesin proteins necessary for bacterial pathogenesis, chemotaxis proteins, and translocase proteins. Network contains previously unknown interactions that suggest mechanisms connecting bacterial pathogenesis and chemotaxis. Network contains previously unknown interactions that suggest mechanisms connecting bacterial pathogenesis and chemotaxis. CheZ, a chemotaxis dephosphorylase that regulates cell motility, is linked to the surface receptor and virulence factors adhesin AidA and Flp pilus-associated FimT. CheZ, a chemotaxis dephosphorylase that regulates cell motility, is linked to the surface receptor and virulence factors adhesin AidA and Flp pilus-associated FimT.
23
23 Conclusion New higher order protein associations detected by LAPP provides a framework to understand the complex logical dependencies that relate proteins to one another in the cell. New higher order protein associations detected by LAPP provides a framework to understand the complex logical dependencies that relate proteins to one another in the cell. Also useful in: Also useful in: –Modeling and engineering biological systems –Generating biological hypotheses for experimentation –Investigating additional protein properties
24
24 Future Work In all likelihood, logic relationships btw proteins in the cell extend beyond ternary relationships to include much larger sets of proteins. In all likelihood, logic relationships btw proteins in the cell extend beyond ternary relationships to include much larger sets of proteins. Ideas underlying the logical analysis of phylogenetic profiles can be extended to the investigation of other kinds of genomic data: Ideas underlying the logical analysis of phylogenetic profiles can be extended to the investigation of other kinds of genomic data: –Gene expression, –Nucleotide polymorphism –Phenotype data
25
25 Questions??
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.