Volume 2, Issue 4, Pages (October 2012)

Slides:



Advertisements
Similar presentations
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 17:
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
SHI Meng. Abstract Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses.
NEW TOPIC: MOLECULAR EVOLUTION.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders  Ariel Feiglin, Bryce K. Allen,
Basics of Comparative Genomics
Attention Narrows Position Tuning of Population Responses in V1
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
What are the Patterns Of Nucleotide Substitution Within Coding and
Adaptive Evolution of Gene Expression in Drosophila
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Volume 38, Issue 4, Pages (May 2010)
Volume 11, Issue 3, Pages (March 2018)
Volume 8, Issue 5, Pages (September 2014)
Volume 112, Issue 7, Pages (April 2017)
Volume 21, Issue 3, Pages (October 2017)
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Chimeras Reveal a Single Lipid-Interface Residue that Controls MscL Channel Kinetics as well as Mechanosensitivity  Li-Min Yang, Dalian Zhong, Paul Blount 
Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention  Iain G. Johnston, Ben P. Williams  Cell.
First Draft of Chimpanzee Genome
Coral Reef Conservation
Volume 3, Issue 4, Pages (April 2013)
Volume 21, Issue 3, Pages (October 2017)
Cooperation between Noncanonical Ras Network Mutations
Volume 10, Issue 11, Pages (March 2015)
Volume 154, Issue 1, Pages (July 2013)
Morphological Phylogenetics in the Genomic Age
Volume 85, Issue 4, Pages (February 2015)
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Volume 5, Issue 4, Pages e4 (October 2017)
Volume 14, Issue 7, Pages (February 2016)
Differential DNA Methylation Analysis without a Reference Genome
Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations  Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,
Medial Axis Shape Coding in Macaque Inferotemporal Cortex
Fast Sequences of Non-spatial State Representations in Humans
Sex Chromosome Specialization and Degeneration in Mammals
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Gautam Dey, Tobias Meyer  Cell Systems 
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Volume 133, Issue 7, Pages (June 2008)
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Basics of Comparative Genomics
Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders  Ariel Feiglin, Bryce K. Allen,
Volume 122, Issue 6, Pages (September 2005)
Cetaceans on a Molecular Fast Track to Ultrasonic Hearing
Volume 11, Issue 3, Pages (March 2018)
Volume 110, Issue 4, Pages (August 2002)
High-Definition Reconstruction of Clonal Composition in Cancer
Volume 158, Issue 6, Pages (September 2014)
Evolutionary History of the ADRB2 Gene in Humans
Encoding of Stimulus Probability in Macaque Inferior Temporal Cortex
Qian Cong, Dominika Borek, Zbyszek Otwinowski, Nick V. Grishin 
Volume 21, Issue 23, Pages (December 2011)
Volume 10, Issue 2, Pages (January 2015)
Volume 11, Issue 7, Pages (May 2015)
Origins and Impacts of New Mammalian Exons
Evolutionary Fates and Origins of U12-Type Introns
Michael S.Y. Lee, Julien Soubrier, Gregory D. Edgecombe 
Presentation transcript:

Volume 2, Issue 4, Pages 817-823 (October 2012) A “Forward Genomics” Approach Links Genotype to Phenotype using Independent Phenotypic Losses among Related Species  Michael Hiller, Bruce T. Schaar, Vahan B. Indjeian, David M. Kingsley, Lee R. Hagey, Gill Bejerano  Cell Reports  Volume 2, Issue 4, Pages 817-823 (October 2012) DOI: 10.1016/j.celrep.2012.08.032 Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure 1 Evolutionary Model and Assumptions behind Our Forward Genomics Approach (A) An ancestral trait is passed to descendant species, along with the genomic regions required for this trait, which evolve under purifying selection. (B) One lineage loses the ancestral trait due to an inactivating mutation in a trait-required region. (C) Following trait loss, all trait-specific (nonpleiotropic) regions switch to evolve neutrally and begin to accumulate random mutations in the first trait-loss lineage. Meanwhile, two additional independent lineages lose this trait, due to independent mutations occurring either in the same or in other trait-required regions. (D) All trait-specific regions continue to erode independently in the three different trait-loss lineages, whereas their counterparts in the trait-preserving species are conserved due to purifying selection. This characteristic evolutionary signature can be detected with forward genomics, revealing functional components of this (monogenic or polygenic) trait. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure 2 A Forward Genomics Screen to Match an Ancestral Presence/Absence Trait Pinpoints Gulo Inactivation in Vitamin C-Nonsynthesizing Species (A) For every gene (dot) in the mouse genome (x axis), we measured how well it matches the given phenotree by counting the number of species (y axis) whose divergence level violates the expectation of divergence or conservation based on the vitamin C phenotree shown in (C). Gulo, with 0 violations, is the only gene that perfectly matches. (B) Elevated ratio of nonsynonymous to synonymous (Ka/Ks) substitutions shows that remaining megabat and guinea pig exons evolve under relaxed pressure to preserve the Gulo protein sequence. (C) Nonsynthesizing species show elevated sequence divergence in the Gulo coding sequence, with a divergence margin (gray) that perfectly separates them from synthesizing species. Note that the microbat and megabat lineage have independently lost this trait, as intermediate bat species (without a sequenced genome) were biochemically shown to synthesize vitamin C (Cui et al., 2011a). (D) Graphical sequence alignment of the Gulo coding region. Rows match species in (C). Large deletions (red blocks) occurred only in nonsynthesizing species. See also Figures S1 and S2. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure 3 Forward Genomics Implicates Independent Inactivation of the Human Disease Gene ABCB4 in Two Species with Low Levels of Biliary Phospholipids (A) The level of biliary phospholipids is a continuous trait that varies over 200-fold between mammals. Seven hundred and ninety-six genes show more divergence in guinea pig than ten other measured species, but only eight genes show elevated divergence in both guinea pig and horse, the two species with the lowest biliary phospholipid levels. (B) We plot (y axis) the number of violations of each gene (dot) in the mouse genome (x axis) against the biliary phospholipid level phenotree in (E). The eight genes with 0 violations are labeled. (C) Of the eight genes, only Abcb4 (bold) has a bile-related function. (D) Increased Abcb4 nonsynonymous to synonymous (Ka/Ks) substitution ratios for guinea pig and horse. (E and F) Divergence from the reconstructed common ancestor (E) and a graphical sequence alignment representation (F) of the Abcb4 coding sequence reveal elevated divergence and deletions (red blocks) in trait-loss species only. See also Figure S3 and Table S1. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure 4 Broad Applicability of Our Forward Genomics Approach (A) We show the branches in the phylogeny that evolve neutrally for the trait-associated gene in the trait-loss simulation (red: vitamin C synthesis; green: biliary phospholipids). For biliary phospholipids, we simulated a loss that happened either 0.05 or 0.1 substitutions per site ago. (B) Simulations suggest that the evolutionary signature of independent loss of vitamin C synthesis can highlight exons of the trait-associated gene in nine of ten iterations (iteration 5 gave no hit). We observed no false positives. (C) Simulations of the biliary phospholipid trait show that in at least seven of ten iterations, the single top-ranked hit is an exon of the trait-associated gene (shown as ∗), and that the trait-associated gene often has the most hits (shown as #), whereas false positives are scattered across the genome. The chart (right) shows that true positives (green) usually rank highly. (D) In three very different vertebrate phenotype-scoring studies, an average of 42% of phenotypes have changes in two or more independent lineages, the conditions required for forward genomics analysis. See also Figure S4. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure S1 The Gulo Gene and Gulo Exon 2 Perfectly Match the Vitamin C Phenotree regardless of the Details of Our Forward Genomics Approach, Related to Figure 2 (A) The Gulo gene is also the best match to the vitamin C phenotree if we compute divergence values that take the different neutral divergence rates of each species into account. In the main text we quantify ancestral information erasure at the DNA level; however some mammals (e.g., rodents) have generally higher neutral divergence rates than others. Here, we ranked percent identity values across all genes within a species to get relative percent identities (values between 0 and 100). Axes are as for Figure 2A. Gulo is the only gene with no violations. (B) Gulo exon 2 is the only exon out of 173,554 conserved coding exons that is more diverged in all vitamin C nonsynthesizing species and several other (but not all) Gulo exons nearly perfectly match the phenotree signature, making the Gulo genomic locus the strongest genome-wide hit. Each exon is one circle. The circle radius is proportional to the number of circles plotted on top of each other. (C) Gulo is also the strongest genome-wide hit if we consider 544,549 conserved regions (exons and noncoding regions). Each region is one circle. (D) We show a visualization of the percent identity values and DNA sequences for Gulo exon 2 using the reconstructed boreoeutherian ancestor. Left: Question marks indicate missing sequence or assembly gaps. The margin between the percent identity values of the two species groups (1.74%) is in gray. Reconstructing the eutherian, therian, and mammalian ancestor (blue dots in the phylogeny), we consistently found only Gulo exon 2 perfectly matching. Right: Gulo gene structure with the alignment of the exon 2 overlapping conserved region below. Exonic bases are in upper case, intronic bases in lower case, and blue background denotes an identical base to the boreoeutherian ancestral sequence. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure S2 Independent Gulo-Inactivating Mutations in Independent Vitamin C Nonsynthesizers, Related to Figure 2 (A) The Gulo gene harbors inactivating mutations in all and only vitamin C nonsynthesizing species. All exons were also searched in all unassembled traces of each species (second from last column). Exon 1 containing only the ATG start codon is omitted. We find all previously known mutations, as well as additional previously unreported gene-inactivating mutations in chimpanzee, gorilla, orangutan, and rhesus macaque, and the first clear evidence for Gulo inactivation in the marmoset Callithrix jacchus and the tarsier Tarsier syrichta. Previous studies identified relaxed selection on the Gulo protein sequence of P. vampyrus but failed to find direct evidence for gene inactivation in P. vampyrus (Cui et al., 2011a, 2011b). Our screen is the first to show exon deletions, frameshifting insertions/deletions, and splice site mutations that clearly inactivate Gulo in the bats P. vampyrus and M. lucifugus. (B and C) Experimental amplification and sequencing confirms a frameshifting 1 bp deletion in the microbat M. lucifugus (B) and a donor splice site mutation that destroys the essential GT in the megabat P. vampyrus (C). (D) We verified the exon 9–10 deletion in megabat. Our megabat sample is heterozygous for an additional 529 bp deletion in the exon 9–10 region, suggesting ongoing erosion of the Gulo locus. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure S3 Abcb4 Is the Strongest Hit for the Biliary Phospholipid Trait based on Screening Exons, Related to Figure 3 We reconstructed the boreoeutherian ancestral sequence for mouse coding exons, measured the divergence from each ancestral sequence and counted the number of species that violate the signature, which expects the highest divergence in guinea pig and horse. The circle radius is proportional to the number of exons with the same number of violations per gene. 72 exons perfectly match (0 violations). We ranked all 72 exons by the margin between the percent identity values of the two species groups and show this rank above the circle for the top 10. The Abcb4 gene has four perfectly matching exons in the top 10 and another exon at rank 14 (also indicated in the figure), making it the strongest match in the genome. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Figure S4 Simulation Studies and Numerous Phenotypes with Independent Changes Support Broad Applicability of Forward Genomics, Related to Figure 4 (A–D) The power of forward genomics to detect the genomic regions associated with a given trait is influenced by the number of independent losses, the evolutionary time that has passed since trait loss, the neutral branch length of the species showing trait losses, the length of the region and the strength of purifying selection. (A) We evolved an ancestral genome and analyzed different scenarios involving the loss of a trait in 2, 3, or 4 independent slowly (blue) or fast (red) evolving lineages and varied the age of the trait loss. We always randomly picked ten coding and ten non-coding regions as our true positives. Each data point represents the average from five independent iterations of the entire simulation. Using our forward genomics approach, we tried to find these 20 true positives in the large set of >100,000 false-positive regions, only using the divergence of each region and knowledge about which species lost the trait. We evaluated forward genomics by counting the number of true positives in a fixed number of top-ranked hits, using the percent identity difference between trait-loss and trait-preserving species for ranking. (B) The dashed gray line indicates the optimum, which is identifying the same number of true positives as top ranked hits analyzed (100% recovery of true positives, no recovery of false positives). We observe the following trends: (1) the sensitivity and specificity increases with the time that has passed since trait loss, (2) trait losses in more slowly evolving species yield predictions with higher specificity compared to losses in fast evolving species (fewer false positives = identified regions that did not evolve neutrally in trait-loss species), (3) for fast evolving species, the top 10 hits contain more true positives the more independent losses happened, (4) while forward genomics cannot identify all regions associated with the lost trait, the top hits are mostly strongly enriched in true positives, a property that is desirable for experimental validation. Even for very old losses in many independent lineages, forward genomics cannot detect all 20 trait-associated regions, presumably due to the random nature of neutral evolution and the fact that regions under purifying selection also evolve. However, for several scenarios, the top hits contain a subset of the trait-associated regions, which can reveal some genetic components of the given trait. (C) We compared how length of the genomic regions that are associated with the trait loss affects the power to detect them. We compared trait loss that involves 20 randomly picked short (<100 bp) regions to a trait loss that involves 20 randomly picked long (>180 bp) regions. We found that long regions are easier to identify than short regions, probably because long regions have a higher chance to accumulate random mutations in trait-loss species. (D) We compared trait loss that involves 20 randomly picked regions that evolve under strong purifying selection (probability to accept a mutation is < 0.3) to trait loss that involves 20 randomly picked regions that evolve under weaker purifying selection (probability to accept a mutation is > 0.5). Regions under strong purifying selection are easier to identify, probably because their divergence in trait-preserving species is low, which increases the chance that neutral evolution in trait-loss species leads to higher divergence in all trait-loss species. In (C) and (D), we simulated three independent trait losses in guinea pig, pika, and shrew that happened 0.15 substitutions per site ago. Chart axis as in (B). (E) Extrapolation supports the expected notion that the number of phenotypes with independent changes increases as more measurements of the same phenotypes are taken in other species of the same clade. We measured how many phenotypes have independent changes for bat, primate and anglerfish character data if we only analyze subsets of all species with phenotype measurements. We fitted a linear and logarithmic curve to the data points as they show saturation above ∼9 species. The inset shows (above) the number of species (taken from the NCBI taxonomy) in this and the higher-order clade from which outgroups were taken and (below) how many species were measured. Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions

Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032) Copyright © 2012 The Authors Terms and Conditions