Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide association mapping Introduction to theory and methodology

Similar presentations


Presentation on theme: "Genome-wide association mapping Introduction to theory and methodology"— Presentation transcript:

1 Genome-wide association mapping Introduction to theory and methodology
Aaron Lorenz Department of Agronomy and Horticulture

2 GWAS – Genome-wide Association Study
Big subject Lots of methods and software packages Lots of considerations for handling data We have some data to analyze 75 minutes

3 Slide credit: Mike Gore

4 Goal Find genes contributing to variation in phenotypes of interest

5 Approaches to mapping genes
Yu and Buckler, 2006

6 Germplasm Biometris

7 Germplasm Any genetically diverse natural or artificial population can be used Examples 71 elite European maize inbred lines (Andersen et al., 2005) Diverse panel of 288 maize lines (Harjes et al., 2008) Diverse panel of 191 Arabidopsis lines (Stock center accessions and individuals sampled from the wild; Atwell et al. 2010) 915 dogs from 80 domestic breeds, 83 wild canids, 10 outbred African shelter dogs.

8 Linkage disequilibrium (LD)
The non-random association of alleles between loci. Extent of LD over physical distance determines marker density needed. Common statistic to quantify LD. Normalized value of D.

9 LD decay in bi-parental linkage mapping populations
Slide credit: Peter Bradbury

10 Plots of LD across the Maize d3 Gene (Remington et al., 2001).
r2 above diagonal, D’ below diagonal Note that LD drops to nearly 0 within 500 base pairs Plots of LD across the Maize d3 Gene (Remington et al., 2001). (A) Patterns of pair-wise LD between common polymorphisms with minor-allele frequencies of >0.05 in the d3 gene. LD estimates r2 and D′ are plotted for each pair-wise comparison, with D′ below the diagonal and r2 above it. (B) A plot of r2 versus distance (in bp) between pairs of sites. The line fitted to the data minimizes the sum of the squared differences between r2 and its expected value, assuming recombination scales with physical distance. 3e9 base pairs in maize. Assume need marker every 300 base pairs. Then 10,000,000 markers needed. bp Gaut B. S., Long A. D. Plant Cell 2010:15: Copyright © American Society of Plant Biologists. All rights reserved.

11 Extensive LD in barley of the Upper Midwest
Total length of barley consensus map = 1100 cM. Assume marker only needed every 8 cM. 1100/8 = 138 markers needed (assuming they are evenly spaced.)

12 Toy example 500 random individuals from a population phenotyped and genotyped Genotypes were scored for one marker linked to a candidate gene Individuals scored as A1A1 = 0, A1A2 = 1, A2A2 = 2. Pheno value 1 2

13 R: lm function Fits a linear model with normal errors and constant variance; generally this is used for regression analysis using continuous explanatory variables. Simple linear regression lm(y ~ x) See riceGwasEmma.r

14 Population structure Nearly always present in association mapping panels Causes spurious associations if not accounted for. Extreme example a b A B A B a b A B a b a b a b A B A B A B a b a b Within each of these populations, the Ab or bA gametes never occur, so D = freq(AB) – freq(A)*freq(B) = 0.25. When the subpops are combined into population and LD is calculated, the two loci are in complete LD regardless of their physical linkage.

15 Model population structure
Subpop membership and effect Marker allele dosage and effect Matrix notation

16 Illustration 3 subpopulations, 2 markers, 10 individuals

17 Population structure and differential relatedness (or family structure)
Yu and Buckler, 2006

18 Mixed-linear model to account for family structure
Polygenic effect (random) K = kinship matrix. Normally calculated with genome-wide markers

19 Efficient Mixed-Model Association (EMMA)
Uses eigenvalue decomposition to more efficiently solve mixed-model equation (Taking direct inverse of covariance matrix is computationally intensive. Want to avoid in GWAS.)

20 Options for modeling structure and kinship [see Price et al. (2010)]
Inferring and modeling structure Use knowledge on subpop membership directly Subpopulation clustering (explicitly infer ancestry) STRUCTURE ADMIXTURE Principal component analysis Use top PCs as covariates to correct for pop structure Related approach is multi-dimensional scaling (MDS) Inferring kinship Marker similarity matrix Realized genomic additive relationship matrix Pedigree additive relationship matrix

21 Efficient Mixed-Model Association (EMMA)
See riceGwasEmma.r

22 Manhattan plot See riceGwasEmma.r

23 Statistical threshold: Correcting for multiple testing
Here? Here?

24 Statistical threshold: Correcting for multiple testing
Bonferroni correction alphaC ≈ alphaE / test# Assumes independent tests Too conservative Permutation testing Good for linkage mapping Generally, not valid for GWAS because family structure not preserved False-discovery rate (Benjamini and Hochberg, 1995) Calculate expected proportion of declared QTL that are false positives.

25 Calculate effective number of tests

26 Other software packages to implement linear models for GWAS
TASSEL: PLINK: EIGENSTRAT: EMMAMAX: GAPIT: GenABEL: GWASTools: FaST-LMM:


Download ppt "Genome-wide association mapping Introduction to theory and methodology"

Similar presentations


Ads by Google