Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Slides:



Advertisements
Similar presentations
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Advertisements

Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Recognizing the significance of meiosis to sexual reproduction
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Molecular Clock I. Evolutionary rate Xuhua Xia
Genome Evolution. Amos Tanay 2009 Genome evolution Lecture 10: Comparative genomics, non coding sequences.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
[Bejerano Fall10/11] 1 Any Project reflections?
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Chapter 8 Section 8.7: Mutations.
Meiosis and Sexual Reproduction
[Bejerano Fall10/11] 1.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Chapter 2 Genes Encode RNAs and Polypeptides
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Chapter 2: From genes to Genomes. 2.1 Introduction.
Sequencing a genome and Basic Sequence Alignment
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Instability: Mutation and DNA repair Mutations DNA repair.
Chapter 21 Eukaryotic Genome Sequences
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Phylogenomics “The intersection of phylogenetics and genomics”
Meiosis Males – only occurs in the testicles. Females – only occurs in the ovaries. Formation of four cells that are NOT genetically identical with only.
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
The influence of population size on patterns of natural selection in mammals Carolin Kosiol Cornell University 21 st December 2007 Isaac.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
Genes in ActionSection 1 Section 1: Mutation and Genetic Change Preview Bellringer Key Ideas Mutation: The Basis of Genetic Change Several Kinds of Mutations.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
Asexual and Sexual Reproduction Fertilization of an egg cell by a sperm cell. In sexual reproduction, haploid gametes fuse to produce a diploid zygote.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Chapter 13 Things you should know!. Asexual vs. Sexual reproduction Genes are segments of DNA that code for the basic units of heredity. (They are also.
Katherine S. Pollard Gladstone Institutes, Institute for Human Genetics and Division of Biostatistics - UCSF What makes us human?
Homologous Recombination
Evolution of gene function
Meiosis.
Genomes and Their Evolution
Genomes and Their Evolution
Biology Unit 3 Warm Ups Mrs. Hilliard.
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Evolution of eukaryote genomes
Gene Density and Noncoding DNA
Telophase I and Cytokinesis
Chapter 6 Clusters and Repeats.
Presented by, Jeremy Logue.
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Different forms of a gene
Presented by, Jeremy Logue.
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, (2008). Image source: Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, (2008).

“Forces shaping the fastest evolving regions in the human genome” by Katherine S. Pollard et al.

What’s the difference? Image sources:

What’s the difference? Humans have higher “brainpower” Examples: creativity, problem solving, language What part of the genome is the cause? Image source:

What’s the difference? Human and chimpanzee DNA is 98% similar The 2% difference is 29 million bases (mostly in non- coding DNA) Image source:

Comparative Genomics Human and rodent genomes are often compared to identify conserved (presumably functional) elements. Humans and chimpanzees are compared to understand what is uniquely human about our genome. Image source:

Comparative Genomics Look at HARs in human genome HAR - human accelerated region. High rate of nucleotide substitution in humans, low in other vertebrates. Fastest is HAR1 – novel RNA gene expressed in development of neocortex (language, conscious thought).

HARs ~ 100 bp, mostly non-coding Function is likely to be gene regulation. Seem to have been under strong negative selection up to common ancestor of chimp and human. Rapid positive selection then started in humans only. Image source:

Finding HARs Evolutionary tree based on the comparison of conserved regions in whole-genome alignments between species. Branch lengths given in substitutions per base, or in millions of years Evolution of vertebrates Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Finding HARs Find HARs by using LRT, the likelihood ratio test. In statistical hypothesis testing, the likelihood ratio (Λ) is the ratio of the maximum probability of a result under a null hypothesis and alternative hypothesis. The LRT decides between the two hypothesis based on the value of the likelihood ratio.

Two models were used for genomic LRT. Model 1: human substitution rate is held proportional to the other substitution rates in the evolutionary tree. Model 2: human substitution rate can be accelerated relative to the rates in the rest of the tree. Finding HARs

... Human Another vertebrate All the conserved alignments

Finding HARs... Human Another vertebrate Determine 1 st set of rates Determine 2 nd set of rates Determine 3 rd set of rates Scale all by the same amount Model 1

Finding HARs Human Another vertebrate Scale all by the same amount Model 2... Scale the human rates separately

Identify regions conserved between human and other vertebrates (34,498 of them)

For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Obtain P1 (max probability 1)

Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Loop over all conserved regions. For each region, do: Obtain P1 (max probability 1)

Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Loop over all conserved regions. For each region, do: Fit model 2 to the region in human, find acceleration for that region that maximizes the likelihood of the tree Obtain P1 (max probability 1) Obtain P2 (max probability 2) Calculate LRT for the region as Λ = log(P2 / P1)

Finding HARs Big LRT value indicates an HAR. How big is big? Do 1 million simulations of the 34,498 conserved alignments. To create each simulation, use the model 1 proportional rates. Repeat the LRT calculation for each simulation. Then for each region, find proportion of simulated LRTs that are bigger than its original LRT. That proportion is a p-value that tells if the region is an HAR.

Finding HARs Note on methods: vertebrates that were used in selecting the conserved regions (chimp, macaque, mouse, rat, rabbit) were omitted from any LRT analysis. This ensured that the LRT test is independent of the method used to select the conserved regions.

Finding HARs Result: 202 HARs were found in the human genome. Image source:

Results for Conserved Elements 80.4% of the 34,498 conserved regions are non-coding. 45.4% of non-coding regions are intronic, 31% are intergenic, Non-coding regions are enriched for transcription factors, DNA-binding proteins, regulators of nucleic acid metabolism

Results for HARs 202 HARs have p < 0.1, 49 of them have p < 0.05 HAR1 through HAR5 have p < 4.5e-4, very accelerated Most HARs are non-coding 66.3% are intergenic, 31.7% are intronic, only 1.5% are coding Results support the hypothesis (King and Wilson) that most chimp-human differences are regulatory.

Results: Confirming Accelerated Selection in HARs Are the HARs just due to relaxation of negative selection? No. Compare to neutral rate for 4D sites to see. Negative selection Positive selection Image source: [Bejerano Aut 08/09]

The chimp rates in all five elements fall well below the human rates, which exceed the background rates by as much as an order of magnitude. H, human; C, chimp. Genome-wide neutral rate for 4D sites in human and chimp Genome-wide neutral rate for 4D sites in human and chimp in chromosome end bands Image from: K.S. Pollard et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Results: W  S Bias in HARs Dramatic AT  GC bias was observed in HARs. AT  GC substitution bias in HARs HAR1 – HAR5 HAR6 – HAR49 HAR50 – HAR202 GC  AT AT  GC Rest of ~ conserved elements Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

Results: W  S Bias in HARs Top 49 HARs are 2.7 times as likely to be located near final chromosomal bands as the other conserved elements Interestingly, HAR1 and HAR5 are also in end regions in other mammals, but are not accelerated. Image source:

HARs tend to be located in regions of high recombination in humans. All of this evidence points to biased gene conversion (BGC) as the driving force behind HARs. Results: W  S Bias in HARs

Genetic Recombination Paired chromosomes can exchange homologous pieces Typically occurs during meiosis

maternal chromosome A paternal chromosome A diploid germ cell Meiosis

maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication diploid germ cell Meiosis

maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination diploid germ cell Meiosis

maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination Segregation diploid germ cell Meiosis

maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination Segregation haploid gametes diploid germ cell Meiosis

Recombination Recombination hotspot

Genetic Recombination duplex 1 duplex 2 Formation of Holliday Junction intermediate Vertical resolution with crossover Horizontal resolution with gene conversion Mismatch repair or Image source:

Genetic Recombination: Chromosomal Crossover Chromosomal crossover results in exchange of DNA pieces Homologous chromosomes Recombinant chromatids Image source:

Genetic Recombination: Gene Conversion Gene conversion results in nonreciprocal transfer of DNA Mismatch repair causes DNA to revert back to its original form Recombinant chromatids Image source:

Genetic Recombination: Gene Conversion The result is a nonstandard ratio of alleles, such as 3:1 This causes homogenization of a species’ gene pool haploid gametes Image source:

Biased Gene Conversion DNA repair machinery likes to replace weak pairings with strong pairings during gene conversion. A - T is a weak pairing G - C is a strong pairing Image source:

Biased gene conversion results in G – C enrichment of a species’ gene pool (in addition to causing homogenization) Recombinant chromatids Biased Gene Conversion A – T replaced by G – C during mismatch repair

HARs and Recombination Hotspots HARs tend to be located near recombination hotspots in humans

Recombination Hotspots Mysterious Extremely different between chimps and humans (change rapidly during evolution) Not caused by the local DNA sequence (it is the same in human and chimp)

Some HARs Recombination hotspots ?

Possible Conclusion Recombination-caused BGC (often seen negatively) played a big role in the development of our species.

Alternative Explanation Isochore – DNA region (~100 kb) with high gene concentration Isochores are stabilized by many strong (GC) pairings HAR Isochore

Theory (Bernardi et al.) that weakly deleterious changes drive isochore to a critical point of destabilization At critical point, GC content cannot decrease – otherwise isochore becomes unstable AT  GC substitution in the isochore suddenly gains selective advantage and sweeps through the population Alternative Explanation

Isochore selective sweep theory vs. the BGC theory. Isochore sweep has a different DNA signature than BGC Alternative Explanation ~ 100 kb GC Isochore selective sweep ~ 100 bases GC Biased gene conversion

Evidence so far favors the BGC explanation for HARs However, the results are not yet conclusive Alternative Explanation

“Dispensability of Mammalian DNA” by Gill Bejerano and Cory McLean

Are mammalian CNEs dispensable? CNE – conserved non-exonic element Examples: cis-regulatory DNA, ultraconserved DNA ? Image source:

Cis-regulatory DNA elements promoter or inhibitor Image source:

Cis-regulatory DNA elements Image source:

Ultraconserved elements 200 bp and up, many seem to be regulatory “100% identity with no insertions or deletions between orthologous regions of the human, rat, and mouse genomes.” “Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish.” (quotes from “Ultraconserved elements in the human genome” by Bejerano et al.)

Are mammalian CNEs dispensable? About 20% of gene knockout experiments, including cis-regulatory and ultraconserved knockouts, produce no phenotype measurable in lab settings. Image source:

Are mammalian CNEs dispensable? Do CNEs have functional redundancy? OR Are CNEs indispensable, but in a way that cannot be observed in the lab? Approach: look at CNEs lost in rodents due to evolution

Finding CNEs lost by rodents Computational Pipeline Identify conserved mammalian sequences Pick out the ones absent in rodents Remove artifacts due to assembly, alignment, structural RNA migration

Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Use UCSC chains and nets To avoid assembly artifacts Ignore multi- level nets Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Identify lost DNA Validate quality of results Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Look at the aligned orthologous sequences in primates (human, macaque), dog, and rodents (mouse, rat). Identifying DNA lost by rodents primates A G dog rodents primates dog Different bases between primates and dog

100 bp window Compute primate-dog %id (percentage of identical alignment columns) Identifying DNA lost by rodents primates A G dog rodents primates dog

Compute primate-dog %id Identifying DNA lost by rodents primates A G dog rodents primates dog

primates A G dog rodents Compute primate-dog %id Deletion in rodents Identifying DNA lost by rodents primates dog !

primates A G dog rodents Ultraconserved-like element between primates-dog Identifying DNA lost by rodents primates dog

primates A G dog rodents Ultraconserved-like element that was lost in rodents Identifying DNA lost by rodents primates dog !

Results for non-exonic ultras 1,691,090 bp of ultraconserved-like sequences were found 1147 bp of these sequences were lost in rodents Thus only 0.086% of ultras is lost in rodents In comparison, ¼ of neutrally-evolving DNA (50%id – 65%id) is lost in rodents Thus ultraconserved-like sequences are 300 times more indispensable than neutrally-evolving DNA

Results for neutral DNA Expected uniform rate of lost neutrally-evolving DNA Observed that less conserved sequences are more retained Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Results for neutral DNA Phenomenon due to poorly conserved sequences being adjacent to exons, and thus shielded from being lost Larger deletions are biased away from gene structures Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Moving away from 100%id, there is a mixing of DNA under purifying selection and neutrally evolving DNA Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

To distinguish neutral DNA from conserved DNA in the mix, use longer evolutionary tree branch lengths Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Thus human-dog-horse alignment has lower %id for neutral DNA than human-macaque-dog This shifts the neutral DNA curve shifts to the right Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Results for DNA under purifying selection Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Results for DNA under purifying selection 80%id to 100%id identified as DNA under purifying selection As is visible from the figure, practically none of this DNA is lost in the primates (only 0.154% of bases are lost) Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Results for DNA under purifying selection The previous results were for CNEs Those results compare to the numbers for lost coding DNA: Fraction of lost CNEs: 0 at 100%id, at 80%id Fraction of lost exons: 0 at 100%id, at 80%id Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Results for DNA under purifying selection Thus CNEs under purifying selection are indispensable, similarly to coding elements.

CNE dispensability ranking In rodents In primates Deepest in vertebrate tree, so corresponds to the most indispensable CNEs Region of high conservation (CNEs) Left plot explanation (right plot is similar): take the h-m-d alignments, find their conservation %id in each of the shown species. Then for each of those species, plot the fraction of DNA lost in rodents vs the %id. Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

CNE dispensability ranking Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

Conclusion Many mammalian CNE knockouts produce no observable phenotype in the lab, suggesting great functional redundancy. However, evolutionary analysis shows that the CNEs, and particularly ultraconserved regions, are indispensable. Seems like the phenotype in knockouts is subtle, but very important. Image source: