Population genetics, comparative genomics, and natural selection Simon Myers.

Slides:



Advertisements
Similar presentations
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome Mol. Biol. Evol. 26(3):649– Journal Club
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Outline to SNP bioinformatics lecture
Are we still evolving? Mapping sites of selection in the human genome Simon Myers.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
October 2, 2002Daryl Thomas. October 2, 2002Daryl Thomas Molecular Evolution of FOXP2 Human Language Abilities Highlighted by Comparative Genomics CMPE.
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
 Archaeology – “the scientific study of material remains (as fossil relics, artifacts, and monuments) of past human life and activities”  Studies.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
CS177 Lecture 10 SNPs and Human Genetic Variation
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Detection of positive selection in humane genome.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
NEW TOPIC: MOLECULAR EVOLUTION.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
A genetic polymorphism in the Drosophila insulin receptor suggests adaptation to climate variation across continents Annalise Paaby a, Mark Blacket b,
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
Katherine S. Pollard Gladstone Institutes, Institute for Human Genetics and Division of Biostatistics - UCSF What makes us human?
Published primate genome sequences - I Published primate genome sequences - II.
The Haplotype Blocks Problems Wu Ling-Yun
Considerations for multi-omics data integration Michael Tress CNIO,
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Adaptive evolution of genes underlying schizophrenia
Gil McVean Department of Statistics
MULTIPLE GENES AND QUANTITATIVE TRAITS
Detection of the footprint of natural selection in the genome
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Signatures of Selection
Recombination (Crossing Over)
Detection of the footprint of natural selection in the genome
MULTIPLE GENES AND QUANTITATIVE TRAITS
Detection of the footprint of natural selection in the genome
Identifying Recent Adaptations in Large-Scale Genomic Data
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Jonathan K. Pritchard, Joseph K. Pickrell, Graham Coop  Current Biology 
GWAS-eQTL signal colocalisation methods
Presentation transcript:

Population genetics, comparative genomics, and natural selection Simon Myers

Overview Identifying selection through –Use of comparative genomic data (FOXP2) –Present day diversity patterns (Lactase) –Both (conserved non-coding regions)

Separation of evolutionary timescales The genome evolves over many millions of years –Our genome almost 99% identical to chimpanzee Population genetics studies variation among individuals within a population –Uses study of genealogies –In humans, only hundreds of thousands of years What can population genetics tell us about genome evolution?

Targets of selection are important Humans Other species Disease resistance (LARGE, Duffy) What makes us human? (FOXP2) Resistance to pesticides Explain observable phenotypes (Lactase,SLC24A5, EDAR…) Pathogen evolution

Adaptive evolution Time Advantagous mutations arise by chance Once arisen, carriers have more offspring “Positive selection” On average, higher rate of change towards advantageous mutations

Looking for positive selection Direct approach is very difficult –Need to observe trait for long time –Need very strong selection In many cases, need a more indirect approach –Compare genomes among closely related species –Look for “accelerated evolution” –Current day patterns of diversity –Look for “signature of selection”

FOXP2 Gene coding for a transcription factor Mutations in this gene cause speech impairment and other problems (Lai et al., Nature 2001) –Mutation in FOXP2 co-segregates with a disorder in a family in which half of the members have severe speech, linguistic and grammatical difficulties –Translocation in same gene in unrelated individual with similar disorder Are changes in this gene associated with human language development?

FOXP2 (Enard et al., Nature 2002) Are humans different from other species at FOXP2? Sequence gene in chimpanzee, gorilla, orang-utan, rhesus macaque and mouse Comparison

Yellow: human lineage mutations (since chimpanzee-human split) Blue: mutations on all other lineages Very conserved gene (top 5% of 1,880 genes) Only 3 non-repeat amino acid changes in 130 million years between human and mouse 2 occurred on human lineage in last 5-6 million years FOXP2 (Enard et al., Nature 2002)

156 synonymous changes, 0 on human lineage 4 non-synonymous changes 2 on human lineage (p= by Fishers exact test) FOXP2 (Enard et al., Nature 2002)

Is this the answer? Comparative genomics has disadvantages –Need repeated mutations to give power –Tells little about the timescale –Recent research suggests Neanderthals may share FOXP2 mutations with humans (Krause et al., Current Biology 2007) How do we find out if, and where, we’re currently evolving?

Looking for positive selection Direct approach is typically difficult –Need to observe trait for long time In many cases, need a more indirect approach –Compare genomes among closely related species –Look for “accelerated evolution” –Current day patterns of diversity –Look for “signature of selection” –Identify effect of selection on diversity patterns

Variation data and selection Revolution in population genetics Genome-wide datasets –HapMap project –Many unrelated individuals (60 CEU, 60 YRI, 45 JPT and 45 CHB) –Typed at ~4,000,000 loci that vary within population Allow systematic searches for selection –Comparison of interesting regions to genome –Identification of novel candidates for selection

Neutral alleles III III Neutral allele arises Neutral variation Recombination scrambles variation over time e.g. HapMap

The signature of positive selection III III Advantageous allele arises Neutral variation Spreads (sweeps) rapidly through population Recombination has much less time to scramble variation on selected background

The signature of positive selection SelSim (Spencer and Coop, Bioinformatics 2004)

The signature of positive selection Neutral mutation at 50%Selected mutation at 50%

EHH Several authors have developed tests based on similar idea –Sabeti et al. (Nature 2002), Voight et al. (PLoS Biology 2006) –Focus on potentially selected mutation –Measure proportion of haplotypes identical, as a function of distance on either side –Compare selected/nonselected types –Look for signal of “extended haplotype homozygosity” (EHH)

Simulation results (Voight et al.,PloS Biology 2006)

Lactase gene Most humans lose ability to digest lactose as adults –70% of all humans are lactose intolerant –In Europe, 95% lactose tolerance

Lactase gene DNA variant C/T kb upstream of Lactase gene Completely predicts lactose persistance across human populations (Enattah et al., Nature Genetics 2002) Mutation enhances promoter activity, so probably causal (Olds et al. Hum. Mol. Genet. 2003)

EHH around Lactase From Bersaglieri et al. (AJHG, 2004)

EHH around Lactase 5’: p=.012 3’: p<0.0004

Another approach SNPs that are at highly different frequencies across populations are excellent candidates for selection –SLC24A5 (skin colour, HapMap paper, Lamason et al. Science 2005) –EDAR (hair follicle development, HapMap paper, Sabeti et al. Nature 2007)

Testing for ongoing conservation IDEA: Look at how common variants occurring within CNC’s are in the population –If the CNC’s are functional, mutations in them have a hard time competing –Tend to be rarer in the population than other mutations CNC Non-CNC

Purifying selection Much of the work of selection is removing disadvantageous alleles Regions performing some useful function (e.g. genes!) evolve more slowly Once again, comparative genomics can help! –Look for regions that are conserved between distantly related species Maladaptive mutation Fewer offspringMutation lost

Identifying conserved regions 5% of genome is “conserved” – but only 1.5% exonic sequence

CNCs So called conserved non-coding regions (CNCs) make up about 3% of the genome (e.g. Waterson et al. 2002) Suggests widespread regulatory sequence Is this stuff real? –Mutational cold spots –Old functionality, now lost Population genetics enables testing –Approach complements comparative genomics

Disadvantageous mutations should be at lower frequency From talk by S. Williamson Negatively selected (2Ns=-2) Neutral

SNP frequency “spectrum” in CNC’s SNPs are at lower frequencies in CNC’s (p=3x ) Drake et al. (Nature Genetics, 2005)

CNC’s results (Drake et al., 2005) Shift in frequency spectrum relative to non-conserved regions –Proves conservation is real, and function exists now –Signal robust to demography changes Signal is comparatively weak! –Not all changes selected against? –Signal stronger nearer genes –Near genes, strength comparable to signal for nonsynonymous mutations in exons Extreme SNP frequency bias for ultraconserved elements (Katzman et al., Science 2007) –“Ultraconserved elements are ultraselected”

Conclusions Population genetics provides diverse information about molecular evolution Combining population genetics with knowledge of genomic sequence –New insights into adaptive evolution –Identification of functional sequence Avalanche of variation data being gathered –Will bring many more insights –Presents major challenges in utilising vast and highly informative datasets, whilst keeping analyses computationally tractable

Selected references - Lai, C.S., S.E. Fisher, J.A. Hurst, F. Vargha-Khadem, and A.P. Monaco A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413: Lamason, R.L., M.A. Mohideen, J.R. Mest, A.C. Wong, H.L. Norton, M.C. Aros, M.J. Jurynec, X. Mao, V.R. Humphreville, J.E. Humbert et al SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310: Olds, L.C. and E. Sibley Lactase persistence DNA variant enhances lactase promoter activity in vitro: functional role as a cis regulatory element. Hum Mol Genet 12: Sabeti, P.C., D.E. Reich, J.M. Higgins, H.Z. Levine, D.J. Richter, S.F. Schaffner, S.B. Gabriel, J.V. Platko, N.J. Patterson, G.J. McDonald et al Detecting recent positive selection in the human genome from haplotype structure. Nature 419: Sabeti, P.C. P. Varilly B. Fry J. Lohmueller E. Hostetter C. Cotsapas X. Xie E.H. Byrne S.A. McCarroll R. Gaudet et al Genome-wide detection and characterization of positive selection in human populations. Nature 449: Spencer, C.C. and G. Coop SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics 20: The International HapMap Consortium A haplotype map of the human genome. Nature 437: The International HapMap Consortium The Phase II HapMap. Nature - Voight, B.F., S. Kudaravalli, X. Wen, and J.K. Pritchard A map of recent positive selection in the human genome. PLoS Biol 4: e72. - Waterston, R.H. K. Lindblad-Toh E. Birney J. Rogers J.F. Abril P. Agarwal R. Agarwala R. Ainscough M. Alexandersson P. An et al Initial sequencing and comparative analysis of the mouse genome. Nature 420:

Selected references - Bersaglieri, T., P.C. Sabeti, N. Patterson, T. Vanderploeg, S.F. Schaffner, J.A. Drake, M. Rhodes, D.E. Reich, and J.N. Hirschhorn Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74: Drake, J.A., C. Bird, J. Nemesh, D.J. Thomas, C. Newton-Cheh, A. Reymond, L. Excoffier, H. Attar, S.E. Antonarakis, E.T. Dermitzakis et al Conserved noncoding sequences are selectively constrained and not mutation cold spots. Nat Genet 38: Enard, W., M. Przeworski, S.E. Fisher, C.S. Lai, V. Wiebe, T. Kitano, A.P. Monaco, and S. Paabo Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418: Enattah, N.S., T. Sahi, E. Savilahti, J.D. Terwilliger, L. Peltonen, and I. Jarvela Identification of a variant associated with adult-type hypolactasia. Nat Genet 30: Katzman, S., A.D. Kern, G. Bejerano, G. Fewell, L. Fulton, R.K. Wilson, S.R. Salama, and D. Haussler Human genome ultraconserved elements are ultraselected. Science 317: Krause, J., C. Lalueza-Fox, L. Orlando, W. Enard, R.E. Green, H.A. Burbano, J.J. Hublin, C. Hanni, J. Fortea, M. de la Rasilla et al The Derived FOXP2 Variant of Modern Humans Was Shared with Neandertals. Curr Biol 17: