1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology
2 Orthologs: Homologous sequences are orthologous if they were separated by a speciation event Paralogs: paralogous if they were separated by a gene duplication event Homologs
3 Genomic duplication Can involve : Individual genes Genomic segments Whole genome duplication (WGD) Gene duplication has a major role in evolution.
4 Whole genome duplication Large scale adaptation Polyploidy instability Back to stability: –gene loss –mutation –genomic rearrangements
5 Fate of duplicated genes Find specialized ‘niche’: Localization Temporal expression Expression level Another classification: Sub – functionalization Neo – functionalization (lowest probability) Non – functionalization (70%)
6 Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae Kellis M, Birren BW, Lander ES. Nature. Apr First article
7 S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii Analyzing post duplication divergence of paralogs Main ideas
8 After duplication, usually, one paralog would be lost (random local deletions) Both copies will be retained only if they acquire distinct functions Eventually: a few paralog genes in the same order and same orientation Those regions should be short since chromosomal rearrangements will disrupt gene order over time Expected signature for genome duplication:
9 Model for WGD followed by massive gene loss Common ancestor
10 Proving existence of an ancient WGD Look for a species (Y) in the lineage of S.cerevisiae (S). Y and S should have 1:2 mapping and: –Nearly every region in Y would correspond to 2 regions in S (‘sister region’). –Each sister region in S would contain an ordered subsequence of the genes in Y. –Each sister region in S would contain ~half of Y genes. –Together, the two sister region account for nearly all Y’s genes. –Every region of S would correspond to one region in Y.
11 Y = K. Waltii Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae). 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae). 7% of it’s genes shows no protein similarity to S. Cerevisiae Identifying orthologs regions: –Matching genes (based on protein similarity) –Regions with numerous matching genes in the same order. Most local regions in K. waltii mapped to two regions in S. cerevisiae. Each of those regions matched subset of K. waltii genes.
12 Quantify observations DCS – Doubly Conserved Synteny: maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.
13 Gene and region correspondence
14 Results 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes) DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD Typical DCS block: –27 genes. –Separated by small segments (~3 genes), that match one conserved region in S. cerevisiae.
15 Duplicate mapping of centromers Note: no paralogs here !
16 Using the DCS blocks: define 253 sister regions in S. cerevisiae. Many of those could not be recognized without K. waltii mediation. Duplicated blocks in S. cerevisiae
17 Duplicated blocks in S. cerevisiae
18 Zooming in on one sister region
19 Conclusion WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.
20 Pattern of gene loss Number of chromosomes was doubled. Despite WGD, current S. cerevisiae genome: –13% larger than K. waltii genome. –10% more genes. Gene loss: –large segmental deletions individual gene deletions. –Balanced between two paralogs act primarily on one of them. Analysis of DCS blocks show: –average size of lost segment: 2 genes. –average balance: 43%-57%.
21 Two models – what happens after duplication event One copy preserves original function while the other one is free to diverge (Ohno) Both copies would diverge more rapidly and acquire new functions
22 Study the evolution of the 457 gene pairs that arose by WGD: Use synteny to distinguish them from pairs which arose by local duplication events. Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides). Evolutionary analysis
23 Results 17% of gene pairs (76 of 457) showed accelerated protein evolution relative to K. waltii. In 95% of them, accelerated evolution was confined to only one paralog Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function
gene pairs consisting of one paralog which has evolved >50% faster than the other. Often, derived paralogs are specialized in: –Cellular localization (Acc1 - Hfa1) –Temporal expression (Skt5 – Shc1) Ancestral derived paralogs
25 Ancestral derived paralogs, cont. Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes: –Deletion of ancestral paralog was lethal in 18%. –Deletion of derived paralog was never lethal. Explanation: –Derived paralog is not essential under this conditions. –Ancestral paralogue compensate. (but not vice versa)
26 60 of the 457 pairs (13%) showed decelerated protein evolution. Including highly constrained proteins: –ribosomal proteins (25) –Histone proteins (2) –Translation factors (4) In 90% of them both paralogs were very similar ( 98% amino acid identity versus 55% for all pairs) more results
27 However… ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457) Possible explanations: –Too strict criteria –Divergence in regulatory regions will not be seen here. –Sometimes it’s nice to have two copies.
28 summary S. cerevisiae arose from an ancient WGD. –Massive loss of ~90% of duplicated genes in small deletions. –Preserving at least one copy of each ancestral gene. divergence of paralogs: –Accelerated evolution (17%) –Derived genes tend to be specialized in function, expression level and localization. –Derived genes tend to lose essential aspects of their ancestral function.
29 Second article Transcription control reprogramming in genetic backup circuits. Transcription control reprogramming in genetic backup circuits. Kafri R, Bar-Even A, Pilpel Y. Nat Genet. Mar 2005.
30 Introduction Severe mutations often don’t result in abnormal phenotype Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation Suggested mechanism: transcriptional reprogramming
31 Definitions Working on S. cerevisiae. Paralog pairs defined by BLASTing their DNA sequences. Dispensable genes = non essential.
32 Expression parameters For each pair of paralog: –Calculate 40 correlation coefficients of mRNA expression. –Define: mean expression similarity <= mean. –Define: partial co regulation (PCoR) <= standard deviation.
33 Summary of observations Expressed differently Co-expressed + - Remote paralog - + Close paralog + : backup enabled
34 Close paralogs Backup increases with co-expression. Similar sequences: –Similar expression –Enable backup In close paralogs: Backup increases with co-expression. Expressed differently Co-expressed + - Remote paralog - + Close paralog
35 Remote paralogs Expressed differently Co-expressed + - Remote paralog - + Close paralog Backup is optimal in non-co expressed pairs. co-expression (little backup): interaction sub-functionalization
36 Suggestion for backup mechanism A, B - genes which are expressed differently. Upon mutation in A: expression of gene B is reprogrammed. Result: wild type expression profile of A.
37 Experimental verifier: reprogramming in Acs1/Acs2 Glucose Acs1 Acs2 Glucose Wild-type Acs1 Acs2 Acs1Acs2
38 What is the mechanism enabling this change? Suggestion: backup occurs among paralogs with partially co regulation. Enable switching from different expression profile to similar one. Observation: PCoR predicts backup.
Proportion of dispensable genes Partial motif content overlap is optimal for backup O= |m1 ∩ m2| |m1 U m2| Motif content overlap (O) Backup measure
40 suggestion Unique motifs -> different expression level. Shared motifs -> enable responding to the same conditions. Hypothesis: PCoR underlies reprogramming and backup.
41 In high PCoR paralogs one gene is upregulated when other is deleted <0.35>0.45 Partial co-regulation (predicted backup capacity) Fold change 0.35 – (Hughes et al. Cell 2000)
42 What controls reprogramming? Kinetic model: T E2 E1 G1 G2 M1 M2 G1, G2 – paralog genes. E1, E2 – their products. T – TF which is generated by M1 and has binding site in both genes.
43 Conclusions In remote paralogs: Genes which express differently but has partial common regulation tends to backup each other. In close paralogs: Backup increases with co-expression.
44 Third article Gene regulatory network growth by duplication Teichmann SA, Babu MM. Nat Genet. May, 2004
45 What is the role of gene duplication in regulatory network evolution? Determine the extent to which duplicated genes inherit interactions from their ancestors. Describe possible mechanisms which leads to the formation of a new interaction. Main questions
46 Transcription factor DNA binding site Target gene (or transcription unit) Complex network: 1 gene is regulated by few transcription factors. 1 transcription factor controls more than one gene. Transcription factor Target gene Basic unit of gene regulation
47 Research subjects E. Coli and yeast known regulatory networks: > 100 transcription factors regulate several hundreds genes. Gene regulatory network in Yeast 477 proteins (109 TFs TGs) 901 interactions Gene regulatory network in E. coli 795 proteins (121 TFs TGs) 1423 interactions Shen-orr et.al. Nat. Gen. (2002) and RegulonDB Guelzim et.al. Nat. Gen. (2002)
48 duplication event: –Inherit regulatory interaction –Lose regulatory interaction Also, a new interaction may arise. Duplication (reminder)
49 structural protein homology Detects more distant relationships than sequence > 65% of the genes are the result of gene duplication Same domain architecture -> common ancestor. Homology detecting
50 Duplication of transcription factor Transcription factor Target gene Inheritance Duplication of TF Loss and gain
51 Duplication of transcription factor (TF) At first, new TF regulates the same target gene. Divergence: –Regulate the same gene but respond to a different signal. –Recognize a new binding site. More than 2/3 of TF in E. coli and yeast have at least one interaction in common with their duplicates ( 128 interaction in E. coli (10%). 188 interactions in yeast (22%))
52 Both homologous involves drug response. They responds to a different signal. Pdr1Pdr3 Flr1 Example: Duplication of TF in yeast
53 Duplication of target gene and it’s upstream region Transcription factor Target gene Loss and gainInheritance
54 Duplication of target gene (TG) and it’s upstream region First, both genes are regulated by the same TF. Divergence: –Change coding sequence but stay under the same TF control –Change upstream region as well, resulting in recognition of a different TF 272 interaction in E. coli (22%). 166 interactions in yeast (20%)
55 BioA and BioBFCDoperons are regulated by BirA TF. Those are homologous enzymes in the biotin biosynthesis pathway. Example: Duplication of TG in E. coli BioA BioF BirA
56 Duplication of transcription factor (TF) and its target gene (TG) around the same time Duplication of TF+TGgain
57 Duplication of transcription factor (TF) and its target gene (TG) around the same time Can happen if both were adjacent on the chromosome. New TF regulates only the new TG, while old TF regulates old TG. Divergence of TF or TG can result in additional interactions. 74 interaction in E. coli (6%). 31 interactions in yeast (4%).
58 Example: Duplication of both TF and its TG in yeast AraBAD RhaBAD AraC RhaR
59 Duplication in the gene regulatory network in E.coli and yeast Gene regulatory network in Yeast 43% - duplication and gain 12% - innovations 45% duplication and inheritance Gene regulatory network in E. coli 52% - duplication and gain 10% - innovations 38% duplication and inheritance Shen-orr et.al. Nat. Gen. (2002) and RegulonDB Guelzim et.al. Nat. Gen. (2002)
60 Some more numbers Duplication and inheritance: E. Coliyeast TF:10%22% TG:22%20% Both:6%4%
61 Gene regulatory networks in E. coli and yeast: The number of TG per TF obeys a power low. Do TF with many TG have many homologous genes as their target? No. Are duplication patterns linked to topology of networks?
62 Conclusions In both E. coli and yeast ~90% of the interactions evolved by duplication: –Half of them: duplication + inheritance of interaction –Other half: duplication + gain of new interactions.
63 The End Of the semester…