Download presentation
Presentation is loading. Please wait.
2
Evolution of Multidomain Proteins CS 374 – Lecture 10 Wissam Kazan
3
Reference Papers C. Chothia, J. Gough, C. Vogel, S. A. Teichmann, “Evolution of the Protein Repertoire”, www.sciencemag.com, Science VOL 300, 13 June 2003 www.sciencemag.com T. Przytycka, G. Davis, N. Song, D. Durand, “Graph Theoretical Insights into Evolution of Multidomain Proteins”, RECOMB 2005, LNBI 3500, pp. 311-325, 2005
4
Proteins Large Organic Compounds made of amino acids Fold into specific Structures, unique to each protein. Three possible representations of the three-dimensional structure of the protein triose phosphate isomerase.
5
Protein Functions Chief Actors in the cell Proteins bind to other molecules specifically and tightly at the binding site Act as enzymes to catalyze chemical reactions Antibodies are proteins that bind to antigen and target them for destruction
6
Proteins Domains Primary Constituent of Proteins It is a conserved evolutionary structural unit : –Assumed to fold independently –Observed in different proteins in the context of different neighboring domains –Whose coding sequence can be duplicated and/or undergo recombination
7
Domains – cont’d Small proteins contain just one domain Large proteins are formed by combination of domains Cartoon representation of the protein Zif268 (blue) containing three zinc fingers domains in complex with DNA (orange). The coordinating amino acid residues of the middle zinc ion (green) are highlighted.
8
Domain – cont’d (2) Often each domain has a separate function to perform for the protein On average, domains lengths range from 100 to 250 nucleotides
9
Binding Domain
10
Domain Family A domain family is a collection of small proteins and/or parts of larger ones that descend from a common ancestor.
11
PR domain family members
12
Increase in Protein Repertoire The dominant mechanisms are: –Duplications of sequences coding for one or more domain –Divergence of duplicated sequences by mutations, deletions and insertions producing modified structures that may have useful new properties –Recombination of genes that results in new arrangements of domains
13
Family relationships Difficult to detect distant relationships by direct comparisons of sequences Presence/Absence of domains and family relationships can be determined if the 3D structures are known We only know family relationships and domain structures of proteins of known structures or proteins homologous to proteins of known structures
14
C2 domain family
15
Domain Family Sizes In individual genomes, the number of members in the different families fit a Pareto distribution: –Few families have many members –Many families have few members It is mainly the result of selection for useful functions –Some families have properties that lend themselves to a wide variety of molecular functions: P-loop nucleotide family has members functioning as kinases with diff. specifities, as diff. kind of motor proteins
16
Analysis of Evolution 50% of sequences in the currently known genomes homologous to proteins of known structure Based on that half, we got a detailed picture of the evolution that we will explain in the on-coming slides
17
SCOP DB Relationships of domains in proteins of known structures described in the Structural Classification of Proteins (SCOP) Database http://scop.berkeley.edu/data/scop.b.html
18
Families and Species Vertebrates, ~750 different families, with 50 members per family on average Invertebrates, ~670 different families, with 20 members per family on average Yeast and bacteria with large genome, ~550 different families, with 8 members per family on average Parasitic bacteria, ~220 different families, with 2 member per family on average
19
Protein Repertoire The larger domain families make up the bulk of the protein repertoire in each genome and are widely distributed across genomes 429 families occur in all of the 14 known eukaryotes genomes: –Members form 80% of domains in Animals –90% of domains in Fungi and Plants
20
Contribution of common families to the protein repertoire
21
Domain Combinations Many proteins formed by combinations of two or more domains Domains from some families appear together with domains from several families Multidomain proteins constitute 4/5 th eukaryotes proteins and 2/3 rd of prokaryote proteins Phenomenon called Domain Accretion
22
Known Combinations 1100 families of proteins of known structure in total 1100 2 = 1,210,000 different possible pairwise combinations. Only useful combinations will be present in genomes Studies showed that only 2500 pairwise combinations were found in 85 different genomes
23
Combination Properties Few families have members present in many different combinations Many families combine with just one or two others. Power Law (Again!) Sequential Order
24
Supradomains Two-domain and Three-domain combinations recurring in different protein contexts with different partner domains Have a particular functional and spatial relationship Larger than individual domains
25
Supradomain
26
Metabolic Pathway Formation Proteins in a pathways do not function by themselves A metabolic pathway is a series of chemicals reactions occurring within a cell, catalyzed by enzymes, resulting in either the formation of a metabolic product to be used or stored by the cell. Problem How does the duplication, divergence and recombination process of the proteins fit into the formation or extension of pathways?
27
First Solution Substrates in pathways retain some similarities Enzyme evolve by gene duplications: - Catalytic mechanisms change - Some aspects of their recognition properties are retained
28
Second Solution Enzymes recruited across pathways Duplicated Enzymes conserve their catalytic functions while evolving different substrate specificities
29
Multidomain Protein Mystery Are new domains acquired infrequently, or often enough that the same combinations of domains will be repeated through independent events?
30
Multidomain Protein Mystery Once domain architectures are created, do they persist? If the domain is present in ancestral proteins, is it likely to observe them in current proteins?
31
Protein Family Analysis One Traditional method: –Tree modeling gene family evolution based on multiple sequence alignments Unclear how to build the model for families with heterogeneous domains Solution Proposed: Analyze a graph structure to study multidomain protein evolution
32
Parsimony Model Assume a phylogenetic tree, with each node described by a set of characters (one per domain) Focus on binary characters: – 1: presence of a domain in the node – 0: absence of a domain in the node Perfect Phylogeny: Each character state change occurs at most once State Change: –0 1: Gain –1 0: Loss
33
Dollo Parsimony A character may change state from zero to one *only* once, but from one to zero *multiple* times Appropriate for complex characters that are hard to gain but relatively easy to lose
34
Maximum Parsimony Example Unrelated to the work presented but useful to explain the concept of parsimony We want to find a model such that we minimize the total number of insertions and deletions Find a tree that requires the least number of evolutionary changes.
35
Example We have four sequences: Find the tree that can explain the observed sequences with a minimal number of substitutions D1D2D3 110 111 001 101
36
Try Different Trees 1 1 1 1 1 1 1 2 Total Cost: 3Total Cost: 4 2
37
Domain Architectures Phylogenetic tree of family protein tyrosine kinase family, constructed from an Mutliple Sequence Alignment (MSA) of the kinase domain Note that the tree is not optimal with respect to a parsimony criterion minimizing the total number of insertions and deletions. For example, if architectures INSR and EGFR were siblings (the only two architectures containing the Furin-like cysteine rich and Receptor lingand binding domains) the number of insertions and deletions would be smaller.
38
Evolution of Multidomain Proteins Multidomain proteins are formed by: –Gene Fusion –Domain Shuffling –Retrotransposition of Exons Represent those by: –Domain Merge –Domain Deletion
39
Domain Merge Any process that unites two or more previously separate domains in a single protein
40
Domain Deletion Any process in which a protein loses one or more domains
41
Protein Overlap Graph Vertices are Proteins If two proteins share a domain, the two corresponding nodes are connected by an edge
42
Domain Overlap Graph Vertices are protein domains Two domains are connected by an edge if there is a protein containing both domains
43
Domain Overlap Graph
44
Static Dollo Parsimony For any ancestral node, the set of characters in state one in this node is a subset of the set of character in state one in some leaf node. Consistent with a history in which no ancestor contains a domain not seen in a leaf node
45
Conservative Dollo Parsimony For any ancestral node and any pair of characters that appear in state one in this node, there exists a leaf node where these two characters are also in state one Consistent with a history in which every instance of a domain pair came from a single merge event If domains acting in concert offer a selective advantage, it is unlikely that the pair once formed would later separate
46
Why all this? If we can show that for a family, a conservative Dollo parsimony does not exists then: –Single Insertion Assumption is false or –Conservative Assumption is too strong
47
Example Domain Overlap Graph Dollo Parsimony
48
Analyzing the Graph 1.Check for Chordal Graph in domain overlap graph 2.Check for Helly Property in the domain overlap graph 3.Conclude
49
Chordal Graph A Chord is any edge connecting two non-consecutive vertices of a cycle A Chordal Graph is a graph which does not contain chordless cycles of length greater than three Chords
50
Theorem 1 There exists a conservative Dollo parsimony tree for a given set of multidomain architectures, iff the domain overlap graph for this set is chordal
51
Helly Property A set S of sets S i has the Helly property if for every subset T of S the following hold: if the elements of T pairwise intersect, then the intersection of all elements of T is also non-empty. A family {T i | i ∈ I} of subsets of a set T is said to satisfy the Helly property if, for any collection of sets from this family, {T i | j ∈ J ⊆ I}, ∩ j ∈ J T j = ∅, whenever T j ∩ T k = ∅, ∀ j, k ∈ J.
52
Example The picture on the left doesn’t satisfy the Helly property but the picture on the right does.
53
Theorem 2 There exists a static Dollo parsimony tree for a set of multidomain proteins, iff the domain overlap graph for this set is chordal and statisfies the Helly property
54
Questions raised Is independent merging of the same pair of domain a rare event? –Yes, for a vast majority of small and medium size superfamilies –No, for large complex superfamilies
55
Second Question Do domain architectures persist through evolution? –Yes, for a vast majority of small and medium size superfamilies –No, for large complex superfamilies
56
Thank You! Questions?
57
Experimental Results Superfamily: Set of proteins sharing one particular domain Complex Superfamily: Superfamily sharing more than one domain with another superfamily. Considering only dataset restricted to Complex Superfamilies, they check for CDP, SDP and PP criteria
58
Experimental Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.