Detecting selection using genome scans

Slides:



Advertisements
Similar presentations
Lab 3 : Exact tests and Measuring Genetic Variation.
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Lecture 19: Causes and Consequences of Linkage Disequilibrium March 21, 2014.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
Pattern of similarity between Europeans and Neanderthals Green et al. Science 328, 710 (2010)
Brian Kinlan UC Santa Barbara Integral-difference model simulations of marine population genetics.
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Jonathan B. Puritz, Christopher M. Hollenbeck, and John R. Gold Fishing for selection, but only catching bias: library effects in double-digest RAD data.
Linkage Analysis in Merlin
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Lecture 21: Tests for Departures from Neutrality November 9, 2012.
Landscape genomics in sugar pines (Pinus lambertiana) Exploring patterns of adaptive genetic variation along environmental gradients. Carl Vangestel.
The East African Lake Malawi represents one of the largest and most diverse adaptive radiations on earth, with over 700 species of haplochromine cichlid.
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012.
Lab 11 :Test of Neutrality and Evidence for Selection.
Genetic Linkage. Two pops may have the same allele frequencies but different chromosome frequencies.
Population Genomics of Coastal California Resident and Anadromous Oncorhynchus mykiss in Scott Creek, CA Devon Pearse Molecular Ecology and Genetic Analysis.
Lecture 13: Population Structure October 5, 2015.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Quantitative Genetics
INTRODUCTION TO ASSOCIATION MAPPING
Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate.
Selectionist view: allele substitution and polymorphism
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
An quick overview of human genetic linkage analysis
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Lecture 20 : Tests of Neutrality
The International Consortium. The International HapMap Project.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Lab 7. Estimating Population Structure
Methods  DNA was isolated from blood samples collected at four separate locations.  Samples were Nanodropped to ensure proper concentrations of DNA.
Lecture 22: Quantitative Traits II
Genomics of Adaptation
Mammalian Population Genetics
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
What is a QTL? Quantitative trait locus (loci) Region of chromosome that contributes to variation in a quantitative trait Generally used to study “complex.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Quantitative Genetics and Genetic Diversity Bruce Walsh Depts of Ecology & Evol. Biology, Animal Science, Biostatistics, Plant Science Footprints of Diversity.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Ø Novel approaches for linkage mapping in dairy cattle
Genetic Linkage.
Signatures of Selection
Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection.
Linkage and Linkage Disequilibrium
The Neutral Theory M. Kimura, 1968
Genome Wide Association Studies using SNP
Genetic Linkage.
Detection of the footprint of natural selection in the genome
Patterns of Linkage Disequilibrium in the Human Genome
The ‘V’ in the Tajima D equation is:
Genome-wide Association Studies
Genetic Drift, followed by selection can cause linkage disequilibrium
Association Analysis Spotted history
Matthieu Foll, Oscar E. Gaggiotti, Josephine T
Presentation transcript:

Detecting selection using genome scans Roger Butlin University of Sheffield

Nielsen R (2005) Molecular signatures of natural selection. Annu. Rev Nielsen R (2005) Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218. What signatures does selection leave in the genome? Population differentiation – today’s focus! Frequency spectrum, e.g. Tajima’s D Selective sweeps Haplotype structure (linkage disequilibrium) MacDonald-Kreitman tests (or PAML over long time-scales)

Frequency distribution: From Nielsen (2005): frequency of derived allele in a sample of 20 alleles. Tajima’s D = (π-S)/sd, summarises excess of rare variants

Selective sweep:

Extended haplotype homozygosity (Sabeti et al. 2002)

MacDonald-Kreitman and related tests dN = replacement changes per replacement site dS = silent changes per silent site dN/dS = 1 - neutral dN/dS < 1 - conserved (purifying selection) dN/dS > 1 - adaptive evolution (positive selection)

Selection on phenotypic traits: QTL Association analysis Candidate genes

Genome scans (aka ‘Outlier analysis’)

Littorina saxatilis – locally adapted morphs What signatures of selection might we look for? ‘H’ NB divergent selection (sort of equivalent to RI) ‘M’ Thornwick Bay

Signatures of selection: Departure from HWE Low diversity (selective sweep) Frequency spectrum tests High divergence Elevated proportion of non-synonymous substitutions LD

Neutral loci

Stabilizing selection

Local adaptation

Charlesworth et al. 1997 (from Nosil et al. 2009)

A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006) High – 2000m Intermediate – 1000m Low – 400m 190 individuals 392 AFLP bands

Generating the expected distribution DetSel – Vitalis et al. 2001 N0 N1 N2 t μ to F1,2 – measure of divergence of population 1,2 from population 2,1 Dfdist – Beaumont & Nichols 1996 N m FST – symmetrical population differentiation, as a function of heterozygosity Separation of timescales argument… Does the structure/history matter?

DetSel Dfdist 95% CI 95% 50% 5% ‘Low 1’ vs ‘High 1’

DetSel Dfdist Both Interpretation Monomorphic in one population 35 N/A Unreliable outliers Significant in one comparison 14 29 False positives Significant in comparisons involving one population 3 11 Local effects Significant in at least 2 comparisons 2 1 Adaptation to altitude Significant in global comparison across altitudes 6 (2 at 99%) 392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off

343 loci 8 loci

Outlier AFLP in homologous set* Outliers and selected traits Rogers and Bernatchez (2007): Dwarf x Normal cross  both backcrosses Measure ‘adaptive’ traits (9) QTL map (>400 AFLP plus microsatellites) Homologous AFLP in 4 natural sympatric population pairs Outlier analysis (forward simulation based on Winkle) Coregonus clupeaformis (lake whitefish) Homologous AFLP Outlier AFLP in homologous set* Outlier within QTL (based on 1.5 LOD support) Hybrid x Dwarf 180 19 9 (3.6 expected, P=0.0015) Hybrid x Normal 131 8 4 (0.5 expected, P=0.0002) Expectation based on overall proportion of AFLP associated with QTL. *Only 3 outliers shared between lakes

Roger Butlin - Genome scans

Nosil et al. 2009 review of 14 studies: 0.5 – 26% outliers, most studies 5-10% 1 - 5% outliers replicated in pair-wise comparisons 25 - 100% of outliers specific to habitat comparisons No consistent pattern for EST-associated loci LD among outliers typically low But many methodological differences between studies Population sampling Marker type Analysis type and options Statistical cut-offs

Environmental correlations SAM – Joost et al. 2007 IBA – Nosil et al. 2007 FST for each locus correlated with ‘adaptive distance’, controlling for geographic distance (partial Mantel test)

Methodological improvements – Bayesian approaches BayesFst – Beaumont & Balding 2004 Bayescan – Foll & Gaggiotti 2008 For each locus i and population j we have an FST measure, relative to the ‘ancestral’ population, Fij Then decompose into locus and population components, Log(Fij/(1-Fij) = αi + βj αi is the locus-effect – 0 neutral, +ve divergence selection, -ve balancing selection βj is the population effect Assuming Dirichlet distribution of allele frequencies among subpopulations, can estimate αi + βj by MCMC In Bayescan, also explicitly test αi = 0 Ancestral

Apparently much greater power to detect balancing selection than FDIST Lower false positive rate Wider applicability

Methodological improvements – hierarchical structure Arlequin – Excoffier et al. 2009

Circles – simulated STR data, grey – null distribution

Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010 Bayenv – Coop et al. 2010 Estimates variance-covariance matrix of allele frequencies then tests for correlations with environmental variables (or categories). Software available at: http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010

Hohenlohe et al. 2010

Mäkinen et al 2008 7 populations 3 marine, 4 freshwater 103 STR loci Analysed by BayesFst (and LnRH) 5 under directional selection (3 in Eda locus) 15 under balancing selection Used as a test case by Excoffier et al 2 directional 3 balancing

Can we replicate these results? Bayescan Stickleback_allele.txt – input file Output_fst.txt – view with R routine plot_Bayescan Arlequin Stickleback_data_standard.arp – IAM Stickleback_data_repeat.arp – SMM Run using Arlequin3.5 Try hierarchical and island models, maybe different hierarchies

Sympatric speciation? FST distribution as evidence of speciation with gene flow Savolainen et al (2006) Cf. Gavrilets and Vose (2007) few loci underlying key traits intermediate selection initial environmental effect on phenology Howea - palms