1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Evolution of genomes.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Metabolic functions of duplicate genes in Saccharomyces cerevisiae Presented by Tony Kuepfer et al
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Detecting Orthologs Using Molecular Phenotypes a case study: human and mouse Alice S Weston.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Protein Modules An Introduction to Bioinformatics.
Sequence similarity.
Ohnologs and Regulatory Networks Robbie Sedgewick Group Meeting March 2, 2006.
Transcription control reprogramming in genetic backup circuits Literature search WANG Chao 4/6/2005.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Demetris Kennes. Contents Aims Method(The Model) Genetic Component Cellular Component Evolution Test and results Conclusion Questions?
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Reconstruction of Transcriptional Regulatory Networks
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Chapter 21 Eukaryotic Genome Sequences
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Protein and RNA Families
Main Idea #4 Gene Expression is regulated by the cell, and mutations can affect this expression.
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
Comparative genomics Haixu Tang School of Informatics.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Using blast to study gene evolution – an example.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Structure, evolution and dynamics of transcriptional regulatory networks M. Madan Babu, PhD National Institutes of Health.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Introduction to biological molecular networks
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
How many genes are there?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Determinants of Mutation Outcome Predicting mutation outcome from early stochastic variation in genetic interaction partners.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Genes in ActionSection 2 Section 2: Regulating Gene Expression Preview Bellringer Key Ideas Complexities of Gene Regulation Gene Regulation in Prokaryotes.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Gene structure and function
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
1 How do regulatory networks evolve? Module = group of genes co-regulated by the same regulatory system * Evolution of individual gene targets Gain or.
Last time … * Constraint on transcription factor binding sites Sites with the most ‘information content’ generally evolve slowest * Stabilizing selection.
A Quest for Genes What’s a gene? gene (jēn) n.
Evolution of eukaryotic genomes
Evolution of gene function
Genetics and Evolutionary Biology
Basics of Comparative Genomics
Genomes and Their Evolution
Today… Review a few items from last class
Gene duplications: evolutionary role
Chapter 6 Clusters and Repeats.
Unit Genomic sequencing
Basics of Comparative Genomics
Presentation transcript:

1 Paralogs Inbal Yanover Reading Group in Computational Molecular Biology

2 Orthologs: Homologous sequences are orthologous if they were separated by a speciation event Paralogs: paralogous if they were separated by a gene duplication event Homologs

3 Genomic duplication Can involve : Individual genes Genomic segments Whole genome duplication (WGD) Gene duplication has a major role in evolution.

4 Whole genome duplication Large scale adaptation Polyploidy  instability Back to stability: –gene loss –mutation –genomic rearrangements

5 Fate of duplicated genes Find specialized ‘niche’: Localization Temporal expression Expression level Another classification: Sub – functionalization Neo – functionalization (lowest probability) Non – functionalization (70%)

6 Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae Kellis M, Birren BW, Lander ES. Nature. Apr First article

7 S. Cerevisiae genome arose from ancient whole-genome duplication of K. waltii Analyzing post duplication divergence of paralogs Main ideas

8 After duplication, usually, one paralog would be lost (random local deletions) Both copies will be retained only if they acquire distinct functions Eventually: a few paralog genes in the same order and same orientation Those regions should be short since chromosomal rearrangements will disrupt gene order over time Expected signature for genome duplication:

9 Model for WGD followed by massive gene loss Common ancestor

10 Proving existence of an ancient WGD Look for a species (Y) in the lineage of S.cerevisiae (S). Y and S should have 1:2 mapping and: –Nearly every region in Y would correspond to 2 regions in S (‘sister region’). –Each sister region in S would contain an ordered subsequence of the genes in Y. –Each sister region in S would contain ~half of Y genes. –Together, the two sister region account for nearly all Y’s genes. –Every region of S would correspond to one region in Y.

11 Y = K. Waltii Sequencing and assembling into 8 complete chromosomes (16 in S. cerevisiae). 5,230 likely protein-coding genes (5,714 genes in S. cerevisiae). 7% of it’s genes shows no protein similarity to S. Cerevisiae Identifying orthologs regions: –Matching genes (based on protein similarity) –Regions with numerous matching genes in the same order. Most local regions in K. waltii mapped to two regions in S. cerevisiae. Each of those regions matched subset of K. waltii genes.

12 Quantify observations DCS – Doubly Conserved Synteny: maximal regions in K. Waltii that map across their entire length to two distinct regions in S. cerevisiae.

13 Gene and region correspondence

14 Results 253 DCS blocks containing most of both genomes. (75% of K. waltii genes and 81% of S. cerevisiae genes) DCS blocks tile 85% of each K. waltii chromosome -> as expected in WGD Typical DCS block: –27 genes. –Separated by small segments (~3 genes), that match one conserved region in S. cerevisiae.

15 Duplicate mapping of centromers Note: no paralogs here !

16 Using the DCS blocks: define 253 sister regions in S. cerevisiae. Many of those could not be recognized without K. waltii mediation. Duplicated blocks in S. cerevisiae

17 Duplicated blocks in S. cerevisiae

18 Zooming in on one sister region

19 Conclusion WGD event occurred in the Saccharomyces lineage after the divergence from K. waltii.

20 Pattern of gene loss Number of chromosomes was doubled. Despite WGD, current S. cerevisiae genome: –13% larger than K. waltii genome. –10% more genes. Gene loss: –large segmental deletions individual gene deletions. –Balanced between two paralogs act primarily on one of them. Analysis of DCS blocks show: –average size of lost segment: 2 genes. –average balance: 43%-57%.

21 Two models – what happens after duplication event One copy preserves original function while the other one is free to diverge (Ohno) Both copies would diverge more rapidly and acquire new functions

22 Study the evolution of the 457 gene pairs that arose by WGD: Use synteny to distinguish them from pairs which arose by local duplication events. Compute divergence rates for them, using sequences of K. waltii, S. cerevisiae and S. bayanus. (both amino acid and nucleotides). Evolutionary analysis

23 Results 17% of gene pairs (76 of 457) showed accelerated protein evolution relative to K. waltii. In 95% of them, accelerated evolution was confined to only one paralog Supports Ohno’s model: one paralog retains ancestral function, the other one gains a derived function

gene pairs consisting of one paralog which has evolved >50% faster than the other. Often, derived paralogs are specialized in: –Cellular localization (Acc1 - Hfa1) –Temporal expression (Skt5 – Shc1) Ancestral derived paralogs

25 Ancestral derived paralogs, cont. Functional distinction confirmed with knockout experiments (in rich medium) of all 115 genes: –Deletion of ancestral paralog was lethal in 18%. –Deletion of derived paralog was never lethal. Explanation: –Derived paralog is not essential under this conditions. –Ancestral paralogue compensate. (but not vice versa)

26 60 of the 457 pairs (13%) showed decelerated protein evolution. Including highly constrained proteins: –ribosomal proteins (25) –Histone proteins (2) –Translation factors (4) In 90% of them both paralogs were very similar ( 98% amino acid identity versus 55% for all pairs) more results

27 However… ~70% of the gene pairs had neither accelerated protein evolution nor decelerated evolution (321/457) Possible explanations: –Too strict criteria –Divergence in regulatory regions will not be seen here. –Sometimes it’s nice to have two copies.

28 summary S. cerevisiae arose from an ancient WGD. –Massive loss of ~90% of duplicated genes in small deletions. –Preserving at least one copy of each ancestral gene. divergence of paralogs: –Accelerated evolution (17%) –Derived genes tend to be specialized in function, expression level and localization. –Derived genes tend to lose essential aspects of their ancestral function.

29 Second article Transcription control reprogramming in genetic backup circuits. Transcription control reprogramming in genetic backup circuits. Kafri R, Bar-Even A, Pilpel Y. Nat Genet. Mar 2005.

30 Introduction Severe mutations often don’t result in abnormal phenotype Partially ascribed to redundant paralogs, that provide backup to each other in case of mutation Suggested mechanism: transcriptional reprogramming

31 Definitions Working on S. cerevisiae. Paralog pairs defined by BLASTing their DNA sequences. Dispensable genes = non essential.

32 Expression parameters For each pair of paralog: –Calculate 40 correlation coefficients of mRNA expression. –Define: mean expression similarity <= mean. –Define: partial co regulation (PCoR) <= standard deviation.

33 Summary of observations Expressed differently Co-expressed + - Remote paralog - + Close paralog + : backup enabled

34 Close paralogs Backup increases with co-expression. Similar sequences: –Similar expression –Enable backup In close paralogs: Backup increases with co-expression. Expressed differently Co-expressed + - Remote paralog - + Close paralog

35 Remote paralogs Expressed differently Co-expressed + - Remote paralog - + Close paralog Backup is optimal in non-co expressed pairs. co-expression (little backup): interaction sub-functionalization

36 Suggestion for backup mechanism A, B - genes which are expressed differently. Upon mutation in A: expression of gene B is reprogrammed. Result: wild type expression profile of A.

37 Experimental verifier: reprogramming in Acs1/Acs2 Glucose Acs1  Acs2 Glucose Wild-type Acs1 Acs2 Acs1Acs2

38 What is the mechanism enabling this change? Suggestion: backup occurs among paralogs with partially co regulation. Enable switching from different expression profile to similar one. Observation: PCoR predicts backup.

Proportion of dispensable genes Partial motif content overlap is optimal for backup O= |m1 ∩ m2| |m1 U m2| Motif content overlap (O) Backup measure

40 suggestion Unique motifs -> different expression level. Shared motifs -> enable responding to the same conditions. Hypothesis: PCoR underlies reprogramming and backup.

41 In high PCoR paralogs one gene is upregulated when other is deleted <0.35>0.45 Partial co-regulation (predicted backup capacity) Fold change 0.35 – (Hughes et al. Cell 2000)

42 What controls reprogramming? Kinetic model: T E2 E1 G1 G2 M1 M2 G1, G2 – paralog genes. E1, E2 – their products. T – TF which is generated by M1 and has binding site in both genes.

43 Conclusions In remote paralogs: Genes which express differently but has partial common regulation tends to backup each other. In close paralogs: Backup increases with co-expression.

44 Third article Gene regulatory network growth by duplication Teichmann SA, Babu MM. Nat Genet. May, 2004

45 What is the role of gene duplication in regulatory network evolution? Determine the extent to which duplicated genes inherit interactions from their ancestors. Describe possible mechanisms which leads to the formation of a new interaction. Main questions

46 Transcription factor DNA binding site Target gene (or transcription unit) Complex network: 1 gene is regulated by few transcription factors. 1 transcription factor controls more than one gene. Transcription factor Target gene Basic unit of gene regulation

47 Research subjects E. Coli and yeast known regulatory networks: > 100 transcription factors regulate several hundreds genes. Gene regulatory network in Yeast 477 proteins (109 TFs TGs) 901 interactions Gene regulatory network in E. coli 795 proteins (121 TFs TGs) 1423 interactions Shen-orr et.al. Nat. Gen. (2002) and RegulonDB Guelzim et.al. Nat. Gen. (2002)

48 duplication event: –Inherit regulatory interaction –Lose regulatory interaction Also, a new interaction may arise. Duplication (reminder)

49 structural protein homology Detects more distant relationships than sequence > 65% of the genes are the result of gene duplication Same domain architecture -> common ancestor. Homology detecting

50 Duplication of transcription factor Transcription factor Target gene Inheritance Duplication of TF Loss and gain

51 Duplication of transcription factor (TF) At first, new TF regulates the same target gene. Divergence: –Regulate the same gene but respond to a different signal. –Recognize a new binding site. More than 2/3 of TF in E. coli and yeast have at least one interaction in common with their duplicates ( 128 interaction in E. coli (10%). 188 interactions in yeast (22%))

52 Both homologous involves drug response. They responds to a different signal. Pdr1Pdr3 Flr1 Example: Duplication of TF in yeast

53 Duplication of target gene and it’s upstream region Transcription factor Target gene Loss and gainInheritance

54 Duplication of target gene (TG) and it’s upstream region First, both genes are regulated by the same TF. Divergence: –Change coding sequence but stay under the same TF control –Change upstream region as well, resulting in recognition of a different TF 272 interaction in E. coli (22%). 166 interactions in yeast (20%)

55 BioA and BioBFCDoperons are regulated by BirA TF. Those are homologous enzymes in the biotin biosynthesis pathway. Example: Duplication of TG in E. coli BioA BioF BirA

56 Duplication of transcription factor (TF) and its target gene (TG) around the same time Duplication of TF+TGgain

57 Duplication of transcription factor (TF) and its target gene (TG) around the same time Can happen if both were adjacent on the chromosome. New TF regulates only the new TG, while old TF regulates old TG. Divergence of TF or TG can result in additional interactions. 74 interaction in E. coli (6%). 31 interactions in yeast (4%).

58 Example: Duplication of both TF and its TG in yeast AraBAD RhaBAD AraC RhaR

59 Duplication in the gene regulatory network in E.coli and yeast Gene regulatory network in Yeast 43% - duplication and gain 12% - innovations 45% duplication and inheritance Gene regulatory network in E. coli 52% - duplication and gain 10% - innovations 38% duplication and inheritance Shen-orr et.al. Nat. Gen. (2002) and RegulonDB Guelzim et.al. Nat. Gen. (2002)

60 Some more numbers Duplication and inheritance: E. Coliyeast TF:10%22% TG:22%20% Both:6%4%

61 Gene regulatory networks in E. coli and yeast: The number of TG per TF obeys a power low. Do TF with many TG have many homologous genes as their target? No. Are duplication patterns linked to topology of networks?

62 Conclusions In both E. coli and yeast ~90% of the interactions evolved by duplication: –Half of them: duplication + inheritance of interaction –Other half: duplication + gain of new interactions.

63 The End Of the semester…