Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.

Slides:



Advertisements
Similar presentations
Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
Advertisements

Evolution of genomes.
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Natural Selection on the Olfactory Receptor Gene Family in Humans and Chimpanzee Chloe Lee.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Molecular Evolution Revised 29/12/06
5.4 Cladistics Nature of science:
14 Molecular Evolution and Population Genetics
1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Phylogenetic trees Sushmita Roy BMI/CS 576
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Chapter 14 Molecular Evolution and Population Genetics
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Gene Regulations and Mutations
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Using blast to study gene evolution – an example.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Asymmetric Sequence Divergence of Duplicate Genes Experimented By: Gavin Conant and Andreas Wagner Presented By: Jennifer Case and Jonathan Hobbs.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Lesson Overview 17.4 Molecular Evolution.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Opener Chapter 24 – Genome Evolution. Comparative Genomes Powerful tool for exploring evolutionary divergence among organisms Footprints on the evolutionary.
High-throughput Biological Data -data deluge, bioinformatics algorithms- and evolution Introduction to bioinformatics 2005 Lecture 3.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Please feel free to chat amongst yourselves until we begin at the top of the hour. 1.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Genetica per Scienze Naturali a.a prof S. Presciuttini MOLECULAR EVOLUTION Questo documento è pubblicato sotto licenza Creative Commons Attribuzione.
Evolution of eukaryotic genomes
Evolution of gene function
Causes of Variation in Substitution Rates
The neutral theory of molecular evolution
Basics of Comparative Genomics
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Lesson Overview 17.4 Molecular Evolution.
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
Genome Annotation Continued
What are the Patterns Of Nucleotide Substitution Within Coding and
Phylogeny and Systematics
What do you with a whole genome sequence?
First Draft of Chimpanzee Genome
Pedir alineamiento múltiple
Lesson Overview 17.4 Molecular Evolution.
Chapter 6 Clusters and Repeats.
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Presentation transcript:

Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik Journal Club November 14, 2006

evolution through paralogy paralogous families with essential genes: E-families paralogous families without essential genes: N-families tolerance to mutations -> extent of evolution within the family Essential genes definition: Genes that when mutated can result in a lethal phenotype. Essential genes and their families: diverge more slowly than non- essential genes diverge to a greater extent than non-essential genes Why this happens? What parameters are responsible? - unanswered

Type of selection acting on evolving genes: purifying selection.

What is purifying selection? The ratio Ka/Ks <1 Ka is the number of nonsynonymous mutations per site Ks is the number of the synonymous mutation per site

9.2% 18.4% 3.5% fraction of essential genes that are not singletons Most of essential genes do not have paralogs - Why? Is there something special about those which do have paralogs? ratio of non-essential to essential genes in E-families No answer in this paper… How can a gene have paralogs and still be essential? - All the paralogs together cannot replace all the function of the essential gene. Once this happens, the gene becomes non-essential.

Significantly fewer edges between paralogs in E-families Edges represent homology relationships Divergence and diffusion graph. How were the families assembled?

Construction of paralogous families. 1.Do all-vs.-all Blast comparison of sequences of all translated ORFs within organis 2. Measure amino acid identity level between nodes Each ORF is a node on a graph. 3. Translate amino acids to nucleotides and calculate Ks (synonymous substitution per site) and Ka (nonsynonymous substitutions) The result is 3 weighted graphs (as defined by 1, 2, and 3). A paralogous family consist of strongly connected components of the graph. A cutoff of Ks=5 and E-value 1e-15 are used in this work. In general there is a near-linear dependency of cutoff on Ks.

Largest families What is a typical size of E-family and of N-family? Are N-families typically larger? Are there more N-families than E-families? Both? How paralogous families evolve: After duplication and divergence the following may happen: a. Nonfunctionalization: a duplicate turns into pseudogene b. subfnuctionalization: multiple functions of the ancestral gene are divided between the paralogs c. neofuntionalization: one of the paralogs evolves a new function, the other keeps the old function(s) A more typical scenario for N-families More common for E-families Do non-essential members always evolve from essential memebers of the family? Can a duplicate of non- essential paralog become essential?

Purifying selection is stronger in E-familes (about 2 times) – Ka/Ks ratio is lower in E-families How this is done: 1. For single feature polymorphism (SFP): check within Saccharomyces cerevisiae 2. For Ka/Ks ratio compare orthologs between closely related species (S.cerevisiae/S.paradoxus – yeast; E.coli K12/CFT073 orthologs ) Implication: N-families diverge faster…

Rate of conversion to peudogene is substantially higher in N-families 6.8 fold difference

Paralogs get fixated more often in N-families (explains the larger size of N-families?) Equal rate of duplication in E-families and in N-families is assumed. What happens to the paralogs that do not go to fixation? Do they become pseudogenes, something else?

Ks is higher in E-families, than in F-families Implication: paralogs in E-families stick around for a longer time, than in N- families (3 times longer)

Sequence divergence is higher in E-families nonsynonomous substitutions among paralogs within the family sequence identity among paralogs within the family

It is possible to identify E- and N-families using only sequence divergence information. ROC plot (true negatives) (true positives) Clustering coefficient measures now well connected are the neighbors of a given node in a graph.

Transcriptional regulation of paralogs changes more in E-families: paralogs rarely share trancriptional factors ChIP-cip experiments

Summary: Two types of paralogous families exist: E-families and N-families Two type of families have dramatically different dynamics of molecular evolution: E-families diverge slowly, but persist for a long periods of time, thus diverging further than the paralogs in N-families N-families undergoes a more dynamic evolution: many duplicate get fixated, many other become pseudogenes. Level of sequence divergence is significantly lower. Duplicate in E-families typically assume part of the functions from the original gene and/or evolve a new function. This is less so with duplicates in N-families (no data shown for this…) My musings: N-families gradually evolve from E-families, when the essential gene(s) in the family is not essential any longer. This happens when sufficient number of duplicates exist to assure that all function of the original essential gene are covered. In a minimalistic organism every gene would be an essential gene. The gene becomes non-essential when its functions are assumed by other gene or split between several genes. Every non-essential gene will go through the stage of being in an E-family in which one there is one essential gene.

In this scenario, the E-families are the transition link between essential genes on their way to become non-essential. (You could argue that more robust organism has less essential genes…) Essential genes (singleton) Non-essential genes (N-families) Transition to non-essentiality (E-families) Different selection pressures in each category? – Yes. But… how does the behavior of the family changes once it crosses from E-family to N-family? very careful creeping forward careless evolution careful evolution