Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill.

Slides:



Advertisements
Similar presentations
Whole Genome Duplications (Polyploidy) Made famous by S. Ohno, who suggested WGD can be a route to evolutionary innovation (focusing on neofunctionalization)
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement Henry Yves et al 2006, in press.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Genome evolution There are both proximate and ultimate explanations in molecular biology Mutation continually generates variation in genome content and.
Molecular Evolution Revised 29/12/06
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The evolution of expression patterns in the Arabidopsis genome Todd Vision Department of Biology University of North Carolina at Chapel Hill.
The dynamics of nuclear gene order in the eukaryotes.
Comparative ab initio prediction of gene structures using pair HMMs
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Outline Arabidopsis gene expression (MPSS) Two evolutionary issues in the evolution of expression profiles: –Physical clustering of co-expressed genes.
8/22/03 CS RA fair Comparative genome mapping Todd Vision Department of Biology University of North Carolina at Chapel Hill.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Plant genomes: phenotypes evolving by new rules Todd J. Vision Department of Biology University of North Carolina at Chapel Hill.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Fine Structure and Analysis of Eukaryotic Genes
Eukaryotic Gene Expression The “More Complex” Genome.
Generating Diversity: how genes and genomes evolve Erin “They call me Dr. Worm” Friedman 29 September 2005.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Introduction to Phylogenetics
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein and RNA Families
Mark D. Adams Dept. of Genetics 9/10/04
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Overview -Overview of Grass flower morphology -Floral organ identity and the evolution of the Grass flower -SEPALLATA3 genes and floral organ developent.
Chapter 3 The Interrupted Gene.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
How many genes are there?
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Regulation of Gene Expression
(Quantitative, Evolution, & Development)
bacteria and eukaryotes
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Shin-Han Shiu Department of Plant Biology
Evolution of gene function
Genetics and Evolutionary Biology
SGN23 The Organization of the Human Genome
Ab initio gene prediction
Gene duplications: evolutionary role
Volume 11, Issue 3, Pages (March 2018)
Evolutionary genetics
Volume 11, Issue 3, Pages (March 2018)
Presentation transcript:

Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill

Abstract  In complex genomes, the continual duplication, functional divergence, and loss of genes over time results in gene content divergence among related lineages. In addition to changes in content, the order of genes within the genome can be disturbed by a host of different rearrangement events. Changes in gene content and order are of interest for a number of reasons. Such mutations, particularly those that affect gene content, may, as a class, have dramatic phenotypic consequences; thus, they merit study from a functional perspective. In order to predict the location of genes in non-model organisms using comparative mapping, molecular breeders will need to have better models for how gene content and order and evolve. And from an evolutionary perspective, it is of interest to understand how carefully our gene content and gene order is the directly governed by selective forces, and what other forces are at work. Here, I describe what we currently know about the evolution of gene content and order among the flowering plants. This clade contains all of the world's major food crops, and is thus the focus of a great deal of comparative mapping effort. I will offer my thoughts on what computational biology has to contribute to this emerging area of inquiry.

Outline  Gene order rearrangement in plants Chromosomal perspective Gene family perspective  Gene duplication and functional divergence Segmental duplications as a tool

Chromosomal perspective  Biological importance Clustering of gene function Clustering of transcriptional activity  Applied importance Conservation of gene order (synteny)

Devos and Gale 2000 Plant Cell 12, 637

Arabidopsis as a hub for plant comparative maps Arumuganathan and Earle 1991 Plant Mol Biol Rep 9, 208.

Arabidopsis paleopolyploidy The Arabidopsis Genome Initiative 2000 Nature 408, 796

Non-overlapping syntenies

Blanc et al Genome Res. 13, 137.

Blanc and Wolfe 2004 Plant Cell 16, 1667.

Tomato-Arabidopsis synteny Bancroft 2001 TIG 17, 89 after Ku et al PNAS 97, 9121.

Mayer et al Genome Res. 11, Rice-Arabidopsis microsynteny

Hidden syntenies Simillion et al PNAS 99,

Interspecies comparison can reveal hidden syntenies Vandepoele et al TIG 18, 606.

Simillion et al Genome Res. 14, 1095

From descriptive to predictive  Can we predict the gene content of homologous segments when markers are sparse?  Utility for QTL mapping Prioritize candidate genes in a QTL region from a non-sequenced genome Provide markers for fine-mapping

Hidden Markov Models (HMM) 12end p 1 (a) p 1 (b) p 2 (a) p 2 (b) t 1,1 t 1,2 t 2,2 t 2,end Transition probabilities Hidden states Emission probabilities Observed states: a->b->a Hidden states: 1->1->2->end Probability: p 1 (a) t 1,1 p 1 (b) t 1,2 p 2 (a) t 2,end

A gene content HMM  Observed states a homologous gene is either observed or not  Hidden states presence or absence of gene within a segment  Emission probabilities A gene will be unobserved if it is not present A gene may be unobserved even if it is present Dependent on the density of the gene map  Transition probabilities reflect conservation of gene content along the branches of a phylogeny

Transition probabilities and the segment phylogeny

A1A1 P PA2A2 PA 1-   1 A1A1 1-  1-  1  1-   1-  i ii 1 A2A2 Loss (L) Loss-Gain (LG) Multiple Loss-Gain (MLG)

Estimating model parameters  Segment phylogeny Each set of homologous genes is missing from some segments Estiimate an “averaged” distance matrix Build tree with neighbor-joining and midpoint rooting  HMM parameter estimation Loss rate(s) Gain rate Number of genes present at the root

Do parameter estimates converge? LG model n=100 genes no missing data  1 = 0.1,  2 = replicates Initial  SE

Accuracy of hidden state assignments 5 segment phylogeny,  =  1 =0.1,  2 =0.3,  =0.1, 24% gain

Vandepoele et al 2003 Plant Cell 15, A large multiplicon 12 segments from rice and arabidopsis 56 sets of homologous genes

Self-validation test ? ? ? ? ?

Probability of gene presence (8 longest segments) Branch lengths scaled so that longest branch is 1.0 Estimate of  = 0.7 SegmentTrueEstimateDiff

Summary: gene content HMM  Multispecies comparative maps Becoming more common Most species only partially characterized Usefulness also compromised by sparse synteny  Probabilistic models will allow us to move from simple descriptions of the extent of synteny to predictive tools that can guide further experiments

Gene family perspective  Modes of duplication Tandem (T) Dispersed (D) Segmental (S) T D S

A tale of two sisters: the ARF and the Aux/IAA gene families  Modulate whole plant response to auxin  Interact via dimerization ARFs are transcription factors Aux/IAAs bind and repress ARFs in the absence of auxin

Diversification of ARFs Remington et al 2004 Plant Cell 135, 1738

The chromosomal context Remington et al 2004 Plant Cell 135, 1738

Diversification of the Aux/ IAA s Remington et al 2004 Plant Cell 135, 1738

Why the different patterns of diversification?  12% (ARF) vs 40% (Aux/IAA) segmental duplications  Presumably reflects differential retention  Possible explanations Dosage requirements Coevolution with other interacting genes Regional transcriptional regulation

How typical is the Aux/IAA family? Cannon et al BMC Plant Biology 4, 10. Gene familyGenesS events Proteasome alpha & beta subunits239 Ser/Thr phosphatase2610 Ras related GTP-binding7219 Auxin-independent growth promoter 338 Major instrinsic protein3810 Calmodulin7920 Phosphatidylcholine transferase308 Cation/hydrogen exchanger288

Blanc and Wolfe 2004 Plant Cell 16, Segmental duplication of pathways?

Summary: gene family perspective  Chromosomal context can matter  Gene families differ in their patterns of duplicate gene proliferation Presumably due to differential retention  Polyploidy Qualitatively differs from other gene duplication modes Divergence of whole pathways possible

Functional divergence and chromosomal context Do patterns of divergence (ie spatiotemporal expression) differ among T, D, and S duplicates?

Retention of duplicated genes  Neofunctionalization (NF) Mutations lead to new divergent functions that are positively selected  Subfunctionalization (SF) Mutations knock out ancestral functions and make both copies indispensible New divergent functions evolve secondarily SF more likely for tandem than dispersed pairs (due to linkage)  There are other possibilities Duplicates retained when higher expression is favored

Divergence of duplicated genes Age of duplication Divergence in expression profile

Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003)  Appx. 50% of pairs diverge very rapidly  Proportion of divergent pairs increases with synonymous substitions (K s )  Less so with replacement changes (K a ) Plateaus at K a ~0.3 in human  In humans, distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific

Digital expression profiling  Massively Parallel Signature Sequencing (MPSS) Count occurrence of bp mRNA signatures Cloning and sequencing is done on microbeads Similar to Serial Analysis of Gene Expression (SAGE)  “Bar-code” counting reduces concerns of cross-hybridization probe affinity background hybridization  Which enables Accurate counts of low expression genes Distinguishing expression profiles of duplicate genes

MPSS technology Brenner et al PNAS 97:1665. Sort by FACS and deposit in channeled monolayer Clone 3’ ends of transcripts to microbeads Sequence bp from 5’ end by hybridization

MPSS Data GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG. GATCGGACCGATCGACT ,935 signaturefrequency Total # of tags: >1,000,000

Classifying signatures Potential alternative splicing or nested gene Potential alternative termination Potential un-annotated ORF Potential anti-sense transcript Anti-sense transcript or nested gene? Duplicated: expression may be from other site in genome Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or Typical signatures

Core Arabidopsis MPSS libraries sequenced by Lynx for Blake Meyers, U. of Delaware SignaturesDistinct Library sequencedsignatures Root3,645,41448,102 Shoot2,885,22953,396 Flower1,791,46037,754 Callus1,963,47440,903 Silique2,018,78538,503 TOTAL12,304,362133,377

Query by Sequence Arabidopsis gene identifier chromosomal position BAC clone ID MPSS signature Library comparison Site includes Library and tissue information FAQs and help pages

Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures Chr. I Chr. II Chr. III Chr. IV Chr. V

Dataset of duplicate pairs  Arabidopsis gene families of size 2 classified as Dispersed (280) Segmental (149) Tandem (63)  For each pair Measured similarity/distance in expression profile Estimated silent K s and replacement K A changes

Expression distance library 1 library 2 library 3

Major findings  Many pairs are divergent in sequence but not expression and vice versa  Pairs have atypically high expression Especially slowly evolving pairs  Divergence increases with K a, Particularly among S duplicates! Divergence tends to be highly asymmetric

LibrariesGenes in pairsAll genes 0153 (15.5%)4160 (23.3%) 1124 (12.6%)2643 (14.8%) 273 (7.4%)1727 (9.6%) 393 (9.5%)1777 (10.0%) 4109 (11.1%)1930 (10.8%) 5432 (43.9%) 5612 (31.4%) Expression level >5 ppm in x libraries

d N =  K A, p<0.0001

Asymmetric divergence Type of PairABCD ___________________________________________________ Young Dispersed (Ks  0.5) %68.5%9.0%6.7% Tandem (Ks  0.5) %51.8%17.9%16.1% Old Dispersed (Ks>0.5) %58.1%12.6%11.0% Segmental (All) %69.8%4.7%4.7% A: Each copy has higher expression in at least one library B: One copy has higher expression in all libraries that differ and at least two libraries differ C: Copies differ in expression in only one library D: Copies do not differ in expression in any libraries

Why put gene family evolution into a chromosomal context?  We can begin to understand and utilize patterns of evolution in gene order  We can gain insight into the function and evolution of gene families that are not apparent from beanbag genomics

Thanks to: Zongli Xu David Remington Jason Reed Tom Guilfoyle Blake Meyers NSF