Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 12: epistasis and the evolution of gene regulation.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Epistasis, Molecular mechanism, Importance Xudong Zou Prof. Yun-Dong Wu Dr. Zhiqiang Ye 8 th Nov
Periodic clusters. Non periodic clusters That was only the beginning…
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Transcriptional-level control (10) Researchers use the following techniques to find DNA sequences involved in regulation: – Deletion mapping – DNA footprinting.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Genome evolution: a sequence-centric approach Lecture 13: epistasis: RNA, enhancers, networks.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Section 8.6: Gene Expression and Regulation
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
General Microbiology (Micr300) Lecture 10 Microbial Genetics (Text Chapter: ; )
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Evolutionary Concepts: Variation and Mutation 6 February 2003.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Elements of Molecular Biology All living things are made of cells All living things are made of cells Prokaryote, Eukaryote Prokaryote, Eukaryote.
12.4 Gene Regulation and Mutation
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Genetic Variation and Mutation. Definitions and Terminology Microevolution –Changes within populations or species in gene frequencies and distributions.
Today: Genetic Technology Wrap-up Exam Review Remember: Final Exam is Wednesday, 12/13 at 1 pm!
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Reconstruction of Transcriptional Regulatory Networks
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Comparative genomics Haixu Tang School of Informatics.
Evidence for Positive Epistasis in HIV-1 Sebastian Bonhoeffer, Colombe Chappe, Neil T. Parkin, Jeanette M. Whitcomb, Christos J. Petropoulos.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Introduction to biological molecular networks
The Interactions of Selection With Genetic Drift Can Be Complicated Because the Changes in p Induced By Drift are Random and Ever-Changing Three Important.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Motif Search and RNA Structure Prediction Lesson 9.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Transcription factor binding motifs (part II) 10/22/07.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
Gene structure and function
1 How do regulatory networks evolve? Module = group of genes co-regulated by the same regulatory system * Evolution of individual gene targets Gain or.
Last time … * Constraint on transcription factor binding sites Sites with the most ‘information content’ generally evolve slowest * Stabilizing selection.
Change in Pufs and their RNA InteractionsAnalogous change in transcription factors and their gene regulation Puf binding specificity tends to be conserved.
Evolution of gene function
Predicting RNA Structure and Function
Relationship between Genotype and Phenotype
Introduction to Bioinformatics II
Mating in yeast Stressed diploid yeast undergoes meiosis
Presentation transcript:

Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 12: epistasis and the evolution of gene regulation

Genome Evolution. Amos Tanay 2009 Britten and Davidson, July 1969 The view of cells as complex networks of genes that interact and regulate each other became a central part of the modern central dogma of molecular biology Cells are complex gene networks

Genome Evolution. Amos Tanay 2009 Many networks in today’s biology – most are not directly interpretable in evolutionary/genomics term, beware! Metabolic networks: representing metabolic reactions and enzymes catalyzing them. State of the art: characterized in many species. Enzymes identified. Dynamics modeled using linear approximation (Flux balance analysis) Protein networks: representing different types of (usually physical) interaction among proteins. State of the art: Methods in development (mass spec and more). Large surveys in yeast providing reasonable coverage. In mammals work in progress. Dubious quality for some of the data. Structure- based prediction still minimal. Genetic interaction networks: representing fitness interaction among genes State of the art: Available for large fractions of the pairs in yeast. Flies/Mammals technique in development using RNAi – but not easy.

Genome Evolution. Amos Tanay 2009 Transcriptional regulation generate a network that is more directly encoded by the genome Maps of interaction between TFs and genomic loci. State of the art: Almost complete for specific conditions in yeast. Data on larger genomes rapidly acumulating Transcriptional regulation is encoded into several levels of the genome: - The transcription factor sequence (trans- effect) - The binding site (cis- effects) - (The binding site neigborhood – co-factors, epigenetics) - (Sequence of co-factors and their own regulation…) Transcription regulation (our phenotype) can therefore be: Conserved due to conservation of the genotype Diverge due to divergence of any of multiple loci in the genome Conserved due to coordinated divergence of multiple loci in the genome

Genome Evolution. Amos Tanay 2009 After S. Carroll Phenotypic innovation through regulatory adaptation

Genome Evolution. Amos Tanay 2009 After S. Carroll

Genome Evolution. Amos Tanay 2009 Ancient and Recent Positive Selection Transformed Opioid cis-Regulation in Humans (Rockman, Plos Biol, 2005) Sequence evidence for positive selection Try to remember what can help us establish this? (e.g. divergence and polymorphisms) The human variant is indeed responding differently

Genome Evolution. Amos Tanay 2009 Big questions in evolution of regulation How does the network structure affect genome evolution (conservation and divergence)? Can we enhance our understanding of these effects at the population genetics level? Which levels in the genome drives regulatory innovation? (cis- or trans-) What are the major drivers of phenotypic innovation – regulation or proteins? Big challenges in comparative genomics of regulation: Can we infer regulatory mechanisms from patterns of conservation and divergence? Can we combine functional experiments on the regulatory phenotype into our models? Would extensive comparative genomic ultimately breaks regulatory codes that are currently not understood?

Genome Evolution. Amos Tanay 2009 Comparative genomics: Obtain a set of sequenced genome Collect some functional data on them (expression, TF interaction, epigenomics) Describe the conservation and divergence of the sequence and functional data Build models that describe genome evolution given some regulatory potential and fit it to the data – then infer function from the sequence Interventions Work with two or more species Introduce some genomic alteration, emulating some evolutionary scenario (possibly and absurdic one) Examine the behavior of the altered genomic fragment Evolutionary experiment Evolve strains given some controlled conditions Follow phenotypic and genomic changes (why isn’t it actually possible?) (think about s and 

Genome Evolution. Amos Tanay 2009 Epistasis Assume we have two loci, each bearing two alleles (Aa and Bb) Assume that the basal state of the population is homogenous with alleles ab f(A) - The relative fitness of A is defined using the growth rate of the genome Ab f(B) - The relative fitness of B is defined using the growth rate of the genome aB What is the fitness of AB? If the two loci are unrelated, we can expect it to be: f(Ab)*f(aB) When f(A)=1+s, f(B)=1+s’, and s,s’ are small, f(A)*f(B)~(1+s+s’) Epistasis is defined as the deviation from such linearity/independence: f(AB) > f(Ab)*f(aB): synergistic loci f(AB) < f(Ab)*f(aB): antagonistic loci AB AB + AB - How widespread is epistasis? Is it positive or negative in general? and how it affect evolution in general?

Genome Evolution. Amos Tanay 2009 Testing epistasis in viruses: directed mutagenesis Sanjuan, PNAS genotypes of vesicular stomatitis virus carrying pairs of nucleotide substitution mutations (filled) 15 genotypes carrying pairs of beneficial mutations (empty circles) Epistasis is generally negative here

Genome Evolution. Amos Tanay 2009 Testing epistasis in viruses: HIV-1 isolated drug resistant strains Comparing growth in drug-free media (extracting viral sequence and reintegrating it in a virus model) Sequencing strains, comparing to some standard Plotting fitness relative to the number of mutations: Bonhoeffer et al, science 2004 For each pair of loci, compute average fitness for aa,aB,Aa and BB, then estimate epistasis. To assess significance, recompute the same after shuffling the sequences Mean is significantly higher than randomized means Effect is stronger when analysis is restricted to 59 loci with significant effect on fitness Results suggesting that epistasis tends to be positive (at least in these viruses and in this condition)

Genome Evolution. Amos Tanay 2009 Functional sources for epistasis: Protein structure (interacting residues) Different positions in the same TFBS Two interacting TFBSs TF DNA binding domain and its target site Two competing enzymes Two competing TFBS RNA paired bases Groups of TFBSs at co-regulated promoters

Genome Evolution. Amos Tanay 2009 RNA folds and the function of RNA moelcules RNA molecular perform a wide variety of functions in the cell They differ in length and class, from very short miRNA to much longer rRNA or other structural RNAs. They are all affected strongly by base-pairing – which make their structural mostly planar (with many exceptions!!) and relatively easy to model Simple RNA folding energy: number of matching basepairs or sum over basepairing weights More complex energy (following Zucker): each feature have an empirically determined parameters stem stacking energy (adding a pair to a stem) bulge loop length interior loop length hairpin loop length dangling nucleotides and so on. Pseudoknots (breaking of the basepairing hierarchy) are typically forbidden:

Genome Evolution. Amos Tanay 2009 Predicting fold structure Due to the hierarchical nature of the structure (assuming no pseudoknots), the situation can be analyzed efficiently using dynamic programming. We usually cannot be certain that there is a single, optimal fold, especially if we are not at all sure we are looking at a functional RNA. It would be better to have posterior probabilities for basepairing given the data and an energy model… This can be achieved using a generalization of HMM called Stochastic Context Free Grammar (SCFG)

Genome Evolution. Amos Tanay 2009 EvoFold: considering base-pairing as part of the evolutionary model Once base-pairing is predicted, the evolutionary model works with pairs instead of single nucleotides. By neglecting genomic context effects, this give rise to a simple-tree model and is easy to solve. If we want to simultaneously consider many possible base pairings, things are becoming more complicated. An exact algorithm that find the best alignment given the fold structure is very expensive (n^5) even when using base pairing scores and two sequences. Pedersen PloS CB 2006

Genome Evolution. Amos Tanay 2009 EvoFold: considering base-pairing as part of the evolutionary model Whenever we discover compensatory mutations, the prediction of a functional RNA becomes much stronger.

Genome Evolution. Amos Tanay 2009 Compensatory mutations in proteins? PDB structures Homology modelling 3-Alignments Pairs of interacting residues RatMouseHuman Choi et al, Nat Genet 2005 Find pairs of mutations in interacting residues (DRIP) Coupled: occurring in the same lineage Uncoupled: occurring in different lineages

Genome Evolution. Amos Tanay 2009 Ludwig, Kreitmen 2000 eve stripe 2 in D. melanogaster and D. pseudoobscura – conserved phenotype by a compensatory substitution pattern in two parts of the enhancer mel pseudo While the two enhancers drive a conserved expression pattern, we cannot mix and match them between species! Evolution therefore continuously compensate for changes in one part with changes in the other.

Genome Evolution. Amos Tanay 2009 D. Melanogaster D. Yakuba D. Erecta D. Pseudoobscura Across a larger phylogeny, the phenotype can diverge Ludwig,..,Kreitmen 2005 The D. Erecta S2E is forming much weaker stripe in D. Mel. Eve staining in 4 species Orthologous stripe 2 enhancer reporters in a melanogaster embryo

Genome Evolution. Amos Tanay 2009 D. Melanogaster D. Yakuba D. Erecta D. Pseudoobscura The conservation of the enhancer sequence itself cannot predict the conservation of the phenotype Enhancer functional in mel. Enhancer not functional in mel. Sequence conserved Sequence not conserved May reflect compensation May reflect trans- diverg All conserved

Genome Evolution. Amos Tanay 2009 Species-Specific Transcription in Mice Carrying Human Chromosome 21 (Wilson et al. 2008) Duncan Odom and co-workers introduced human chromosome 21 into mouse cells Using ChIP they showed that most binding sites (of enhancer mostly) were remain active as in human cells – suggesting they are determined in cis.

Genome Evolution. Amos Tanay 2009 Coregulation: epistasis of transcriptional modules Transcriptional modules are crucial for the organization and function of biological system Gene co-regulation give rise to major epistatic relations among regulatory loci epistasis reduces evolvability Co-regulation Is advantageous Disruption of regulation Is deleterious Regulation Scheme 1 Regulation Scheme 2 Rugged evolutionary landscape

Genome Evolution. Amos Tanay 2009 Cis-elements underlying conserved TMs 32 genes P< S. cerevisiae S. Pombe 114 genes P< S. cerevisiae S. Pombe 45 genes P< Ribosome biogenesis S. Pombe S. cerevisiae S phase S. pombe 7 genes P<10 -9 S. cerevisiae Amino acid met.Ribosomal Proteins

Genome Evolution. Amos Tanay 2009 Phylogenetic cis-profiling with 17 yeast species A. nidulans S. bayanus S. cerevisiae K. waltii A. gossypii S. castellii N. crassa S. pombe C.albicans S. kluyverii Y. lypolitica D. hansenii K. lactis C. glabrata Putative Orthologous Module (POM)

Genome Evolution. Amos Tanay 2009 Conserved cis-elements S. cerevisiae S. castellii S. kluyveri K. waltii A. gossypii C. albicans N. crassa A. nidulans S. bayanus S. kudriavzevii S. mikatae S. paradoxus S. pombe MCBHAP2345GCN4 S phaseRespiration Amino acid metabolism C. galbrata K. lactis D. hansenii Y. lipolytica Conserved FM are sometime regulated by remarkably conserved cis elements Conserved cis elements are bounded by conserved TFs Tanay et al. PNAS, 2005

Genome Evolution. Amos Tanay 2009 RAP1Homol-DIFHL Rap1 emergence Homol-D loss Ribosomal Protein Module: Evolutionary change via redundancy Redundant mechanism Homol-D based

Genome Evolution. Amos Tanay 2009 Rap1 evolution in trans BCRTMybSilencingTA S. cerevisiae S. castelii K. waltii A. gossypii C. albicans N. crassa A. nidulans S. pombe H. sapiens New TA domain Co-emerged with Rap1 role in RP regulation

Genome Evolution. Amos Tanay 2009 Redundant cis-elements are spatially clustered: RP genes in A. gossypii 3’ 6bp Homol-D RAP1 5’

Genome Evolution. Amos Tanay 2009 Evolution of the IFHL element pombe nidulans crassa lypolityca albicans hansenii sacc. et al. Tandem duplication Conservation Reverse complement duplication Drift…

Genome Evolution. Amos Tanay 2009 Evolution of the Ribosomal biogenesis module S. cerevisiae (225) S. castellii (204) S. kluyveri (178) K. Waltii (230) A. gossypii (226) C. albicans (214) N. crassa (193) A. Nidulans (187) S. bayanus (195) S. kudri. (196) S. mikatae (187) S. parad. (215) S. pombe (196) C. glabrata (214) K. lactis (225) D. hansenii (219) Y. lipolytica (208) RRPE PACTC ?

Genome Evolution. Amos Tanay 2009 a, S. cerevisiae and C. albicans transcribe their genes according to one of three programs, which produce the a-,  - and a/  -cells. The particular cell type produced is determined by the MAT locus, which encodes a sequence-specific DNA-binding protein. In S. cerevisiae, a-type mating is repressed in  -cells by  2. In C. albicans, a-type mating is activated in a-cells by a2. In both species, a-cells mate with  -cells to form a/  -cells, which cannot mate. a2 is an activator of a-type mating over a broad phylogenetic range of yeasts. In S. cerevisiae and close relatives, a2 is missing and  2 has taken over regulation of the type. Tsong et al Mating genes a2 22 Albicans Cerevisiae

Genome Evolution. Amos Tanay 2009 A transition of motifs is observed between Cerevisiae and albicans

Genome Evolution. Amos Tanay 2009 Innovation in  2 is observed along with the emergence of possible mcm2 interaction A redundant intermediate may have enable the switch

Genome Evolution. Amos Tanay 2009 Ihmels Science, 2005