Presentation is loading. Please wait.

Presentation is loading. Please wait.

Volume 2, Issue 4, Pages (October 2012)

Similar presentations


Presentation on theme: "Volume 2, Issue 4, Pages (October 2012)"— Presentation transcript:

1 Volume 2, Issue 4, Pages 817-823 (October 2012)
A “Forward Genomics” Approach Links Genotype to Phenotype using Independent Phenotypic Losses among Related Species  Michael Hiller, Bruce T. Schaar, Vahan B. Indjeian, David M. Kingsley, Lee R. Hagey, Gill Bejerano  Cell Reports  Volume 2, Issue 4, Pages (October 2012) DOI: /j.celrep Copyright © 2012 The Authors Terms and Conditions

2 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

3 Figure 1 Evolutionary Model and Assumptions behind Our Forward Genomics Approach (A) An ancestral trait is passed to descendant species, along with the genomic regions required for this trait, which evolve under purifying selection. (B) One lineage loses the ancestral trait due to an inactivating mutation in a trait-required region. (C) Following trait loss, all trait-specific (nonpleiotropic) regions switch to evolve neutrally and begin to accumulate random mutations in the first trait-loss lineage. Meanwhile, two additional independent lineages lose this trait, due to independent mutations occurring either in the same or in other trait-required regions. (D) All trait-specific regions continue to erode independently in the three different trait-loss lineages, whereas their counterparts in the trait-preserving species are conserved due to purifying selection. This characteristic evolutionary signature can be detected with forward genomics, revealing functional components of this (monogenic or polygenic) trait. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

4 Figure 2 A Forward Genomics Screen to Match an Ancestral Presence/Absence Trait Pinpoints Gulo Inactivation in Vitamin C-Nonsynthesizing Species (A) For every gene (dot) in the mouse genome (x axis), we measured how well it matches the given phenotree by counting the number of species (y axis) whose divergence level violates the expectation of divergence or conservation based on the vitamin C phenotree shown in (C). Gulo, with 0 violations, is the only gene that perfectly matches. (B) Elevated ratio of nonsynonymous to synonymous (Ka/Ks) substitutions shows that remaining megabat and guinea pig exons evolve under relaxed pressure to preserve the Gulo protein sequence. (C) Nonsynthesizing species show elevated sequence divergence in the Gulo coding sequence, with a divergence margin (gray) that perfectly separates them from synthesizing species. Note that the microbat and megabat lineage have independently lost this trait, as intermediate bat species (without a sequenced genome) were biochemically shown to synthesize vitamin C (Cui et al., 2011a). (D) Graphical sequence alignment of the Gulo coding region. Rows match species in (C). Large deletions (red blocks) occurred only in nonsynthesizing species. See also Figures S1 and S2. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

5 Figure 3 Forward Genomics Implicates Independent Inactivation of the Human Disease Gene ABCB4 in Two Species with Low Levels of Biliary Phospholipids (A) The level of biliary phospholipids is a continuous trait that varies over 200-fold between mammals. Seven hundred and ninety-six genes show more divergence in guinea pig than ten other measured species, but only eight genes show elevated divergence in both guinea pig and horse, the two species with the lowest biliary phospholipid levels. (B) We plot (y axis) the number of violations of each gene (dot) in the mouse genome (x axis) against the biliary phospholipid level phenotree in (E). The eight genes with 0 violations are labeled. (C) Of the eight genes, only Abcb4 (bold) has a bile-related function. (D) Increased Abcb4 nonsynonymous to synonymous (Ka/Ks) substitution ratios for guinea pig and horse. (E and F) Divergence from the reconstructed common ancestor (E) and a graphical sequence alignment representation (F) of the Abcb4 coding sequence reveal elevated divergence and deletions (red blocks) in trait-loss species only. See also Figure S3 and Table S1. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

6 Figure 4 Broad Applicability of Our Forward Genomics Approach
(A) We show the branches in the phylogeny that evolve neutrally for the trait-associated gene in the trait-loss simulation (red: vitamin C synthesis; green: biliary phospholipids). For biliary phospholipids, we simulated a loss that happened either 0.05 or 0.1 substitutions per site ago. (B) Simulations suggest that the evolutionary signature of independent loss of vitamin C synthesis can highlight exons of the trait-associated gene in nine of ten iterations (iteration 5 gave no hit). We observed no false positives. (C) Simulations of the biliary phospholipid trait show that in at least seven of ten iterations, the single top-ranked hit is an exon of the trait-associated gene (shown as ∗), and that the trait-associated gene often has the most hits (shown as #), whereas false positives are scattered across the genome. The chart (right) shows that true positives (green) usually rank highly. (D) In three very different vertebrate phenotype-scoring studies, an average of 42% of phenotypes have changes in two or more independent lineages, the conditions required for forward genomics analysis. See also Figure S4. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

7 Figure S1 The Gulo Gene and Gulo Exon 2 Perfectly Match the Vitamin C Phenotree regardless of the Details of Our Forward Genomics Approach, Related to Figure 2 (A) The Gulo gene is also the best match to the vitamin C phenotree if we compute divergence values that take the different neutral divergence rates of each species into account. In the main text we quantify ancestral information erasure at the DNA level; however some mammals (e.g., rodents) have generally higher neutral divergence rates than others. Here, we ranked percent identity values across all genes within a species to get relative percent identities (values between 0 and 100). Axes are as for Figure 2A. Gulo is the only gene with no violations. (B) Gulo exon 2 is the only exon out of 173,554 conserved coding exons that is more diverged in all vitamin C nonsynthesizing species and several other (but not all) Gulo exons nearly perfectly match the phenotree signature, making the Gulo genomic locus the strongest genome-wide hit. Each exon is one circle. The circle radius is proportional to the number of circles plotted on top of each other. (C) Gulo is also the strongest genome-wide hit if we consider 544,549 conserved regions (exons and noncoding regions). Each region is one circle. (D) We show a visualization of the percent identity values and DNA sequences for Gulo exon 2 using the reconstructed boreoeutherian ancestor. Left: Question marks indicate missing sequence or assembly gaps. The margin between the percent identity values of the two species groups (1.74%) is in gray. Reconstructing the eutherian, therian, and mammalian ancestor (blue dots in the phylogeny), we consistently found only Gulo exon 2 perfectly matching. Right: Gulo gene structure with the alignment of the exon 2 overlapping conserved region below. Exonic bases are in upper case, intronic bases in lower case, and blue background denotes an identical base to the boreoeutherian ancestral sequence. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

8 Figure S2 Independent Gulo-Inactivating Mutations in Independent Vitamin C Nonsynthesizers, Related to Figure 2 (A) The Gulo gene harbors inactivating mutations in all and only vitamin C nonsynthesizing species. All exons were also searched in all unassembled traces of each species (second from last column). Exon 1 containing only the ATG start codon is omitted. We find all previously known mutations, as well as additional previously unreported gene-inactivating mutations in chimpanzee, gorilla, orangutan, and rhesus macaque, and the first clear evidence for Gulo inactivation in the marmoset Callithrix jacchus and the tarsier Tarsier syrichta. Previous studies identified relaxed selection on the Gulo protein sequence of P. vampyrus but failed to find direct evidence for gene inactivation in P. vampyrus (Cui et al., 2011a, 2011b). Our screen is the first to show exon deletions, frameshifting insertions/deletions, and splice site mutations that clearly inactivate Gulo in the bats P. vampyrus and M. lucifugus. (B and C) Experimental amplification and sequencing confirms a frameshifting 1 bp deletion in the microbat M. lucifugus (B) and a donor splice site mutation that destroys the essential GT in the megabat P. vampyrus (C). (D) We verified the exon 9–10 deletion in megabat. Our megabat sample is heterozygous for an additional 529 bp deletion in the exon 9–10 region, suggesting ongoing erosion of the Gulo locus. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

9 Figure S3 Abcb4 Is the Strongest Hit for the Biliary Phospholipid Trait based on Screening Exons, Related to Figure 3 We reconstructed the boreoeutherian ancestral sequence for mouse coding exons, measured the divergence from each ancestral sequence and counted the number of species that violate the signature, which expects the highest divergence in guinea pig and horse. The circle radius is proportional to the number of exons with the same number of violations per gene. 72 exons perfectly match (0 violations). We ranked all 72 exons by the margin between the percent identity values of the two species groups and show this rank above the circle for the top 10. The Abcb4 gene has four perfectly matching exons in the top 10 and another exon at rank 14 (also indicated in the figure), making it the strongest match in the genome. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

10 Figure S4 Simulation Studies and Numerous Phenotypes with Independent Changes Support Broad Applicability of Forward Genomics, Related to Figure 4 (A–D) The power of forward genomics to detect the genomic regions associated with a given trait is influenced by the number of independent losses, the evolutionary time that has passed since trait loss, the neutral branch length of the species showing trait losses, the length of the region and the strength of purifying selection. (A) We evolved an ancestral genome and analyzed different scenarios involving the loss of a trait in 2, 3, or 4 independent slowly (blue) or fast (red) evolving lineages and varied the age of the trait loss. We always randomly picked ten coding and ten non-coding regions as our true positives. Each data point represents the average from five independent iterations of the entire simulation. Using our forward genomics approach, we tried to find these 20 true positives in the large set of >100,000 false-positive regions, only using the divergence of each region and knowledge about which species lost the trait. We evaluated forward genomics by counting the number of true positives in a fixed number of top-ranked hits, using the percent identity difference between trait-loss and trait-preserving species for ranking. (B) The dashed gray line indicates the optimum, which is identifying the same number of true positives as top ranked hits analyzed (100% recovery of true positives, no recovery of false positives). We observe the following trends: (1) the sensitivity and specificity increases with the time that has passed since trait loss, (2) trait losses in more slowly evolving species yield predictions with higher specificity compared to losses in fast evolving species (fewer false positives = identified regions that did not evolve neutrally in trait-loss species), (3) for fast evolving species, the top 10 hits contain more true positives the more independent losses happened, (4) while forward genomics cannot identify all regions associated with the lost trait, the top hits are mostly strongly enriched in true positives, a property that is desirable for experimental validation. Even for very old losses in many independent lineages, forward genomics cannot detect all 20 trait-associated regions, presumably due to the random nature of neutral evolution and the fact that regions under purifying selection also evolve. However, for several scenarios, the top hits contain a subset of the trait-associated regions, which can reveal some genetic components of the given trait. (C) We compared how length of the genomic regions that are associated with the trait loss affects the power to detect them. We compared trait loss that involves 20 randomly picked short (<100 bp) regions to a trait loss that involves 20 randomly picked long (>180 bp) regions. We found that long regions are easier to identify than short regions, probably because long regions have a higher chance to accumulate random mutations in trait-loss species. (D) We compared trait loss that involves 20 randomly picked regions that evolve under strong purifying selection (probability to accept a mutation is < 0.3) to trait loss that involves 20 randomly picked regions that evolve under weaker purifying selection (probability to accept a mutation is > 0.5). Regions under strong purifying selection are easier to identify, probably because their divergence in trait-preserving species is low, which increases the chance that neutral evolution in trait-loss species leads to higher divergence in all trait-loss species. In (C) and (D), we simulated three independent trait losses in guinea pig, pika, and shrew that happened 0.15 substitutions per site ago. Chart axis as in (B). (E) Extrapolation supports the expected notion that the number of phenotypes with independent changes increases as more measurements of the same phenotypes are taken in other species of the same clade. We measured how many phenotypes have independent changes for bat, primate and anglerfish character data if we only analyze subsets of all species with phenotype measurements. We fitted a linear and logarithmic curve to the data points as they show saturation above ∼9 species. The inset shows (above) the number of species (taken from the NCBI taxonomy) in this and the higher-order clade from which outgroups were taken and (below) how many species were measured. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

11 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

12 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

13 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

14 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

15 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions

16 Cell Reports 2012 2, 817-823DOI: (10.1016/j.celrep.2012.08.032)
Copyright © 2012 The Authors Terms and Conditions


Download ppt "Volume 2, Issue 4, Pages (October 2012)"

Similar presentations


Ads by Google