Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,

Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O. Cookson, Martin Taylor, J. Nicholas P. Rawlins, Richard Mott, Jonathan Flint.

Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis) Complex – controlled by multiple genes*environment (diabetes, asthma)

Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome genes

Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene chromosome

Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide chromosome

Association Studies: Map in Humans or Animal Models ? Disease studied directly Population and environment stratification Very many SNPs (1,000,000?) required Hard to detect trait loci – very large sample sizes required to detect loci of small effect (5,000-10,000) Potentially very high mapping resolution – single gene Very Expensive Animal Model required Population and environment controlled Fewer SNPs required (~100- 10,000) Easy to detect QTL with ~500 animals Poorer mapping resolution – 1Mb (10 genes) Relatively inexpensive

Mosaic Crosses Inbred foundersG3GNGNF20 mixingchopping up inbreeding F2, diallele Heterogeneous Stock, Advanced Intercross, Random Outbreds Recombinant Inbred Lines

Sizes of Behavioural QTL in rodents (% of total phenotypic variance)

Effect size of cloned genes

Mapping Resolution F2 crosses –Powerful at detecting QTL –Poor at Localisation – 20cM –Too few recombinants Increase number of recombinants: –more animals –more generations in cross

Heterogeneous Stocks cross 8 inbred strains for >10 generations

Heterogeneous Stocks cross 8 inbred strains for >10 generations 0.25 cM

Multiple Phenotype QTL Experiment

Multiple Phenotypes measured on a Heterogeneous Stock 2000 HS mice (Northport, Bob Hitzeman) 84 families 40 th generation 150 traits measured on each animal –Standardised phenotyping protocol –Covariates Recorded Experimenter Time/Date Litter –Microchipping

Phenotypes Anxiety (Conditioned and Unconditioned Tests) Asthma (Plethysmography) Diabetes (Glucose Tolerance Test) Haematology Immunology Biochemistry Wound Healing (Ear Punch) Gene Expression ….others….

High throughput phenotyping facility

Neophobia

Fear Potentiated Startle

Ovalbumin sensitization

Plethysmograph

Intraperitoneal Glucose Tolerance Test

Genotyping 15360 SNPs genotyped by Illumina –2000 HS mice –300 HS parents –8 inbred HS founders –500 other inbreds www.well.ox.ac.uk/mouse/snp.selector 13459 SNPs successful 99.8% accuracy (parent-offspring)

Distribution of Marker Spacing (chromosome X) (9 Markers)

LD Decay with distance 99.2% marker pairs on different autosomes have R 2 < 0.05.

Genetic Drift in HS 40 generations of breeding Allele Frequency in founders will drift 8% of genome fixed Allele Frequency in Founders Allele Frequency in HS 12.514.99 2523.23 37.529.77 5031.45

Analysis Automated analysis pipeline –R HAPPY package –Single Marker Association Each phenotype analysed independently –Transformed to Normality, outliers removed –Tailored set of covariates –Linear models for most phenotypes –Survival models for latency phenotypes

chromosome markers Want to predict ancestral strain from genotype We know the alleles in the founder strains Single marker association lacks power, can’t distinguish all strains Multipoint analysis – combine data from neighbouring markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock

chromosome markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock Hidden Markov model HAPPY Hidden states = ancestral strains Observed states = genotypes Unknown phase of genotypes Analyse both chromosomes simultaneously Twisted pair of HMMs Mott et al 2000 PNAS

Testing for a QTL p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) p iL (s,t) calculated by HMM using –genotype data –founder strains’ alleles Phenotype is modelled E(y i ) =  s,t p iL (s,t)T(s,t) +  i Var(y i ) =  2 Test for no QTL at locus L –H 0 : T(s,t) are all same –ANOVA partial F test

Genome Scan Additive and dominance models Record all peaks that exceed 5% genome- wide significance, –Threshold based on 200 permutations –9000 preliminary candidate QTL found

Results

Many peaks mean red cell volume

How to select peaks: a simulated example

Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance

Simulated example: 1D scan

Peaks from 1D scan phenotype ~ covariates + ?

1D scan: condition on 1 peak phenotype ~ covariates + peak 1 + ?

1D scan: condition on 2 peaks phenotype ~ covariates + peak 1 + peak 2 + ?

1D scan: condition on 3 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + ?

1D scan: condition on 4 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + ?

1D scan: condition on 5 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?

1D scan: condition on 6 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?

1D scan: condition on 7 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?

1D scan: condition on 8 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?

1D scan: condition on 9 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?

1D scan: condition on 10 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?

1D scan: condition on 11 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?

Peaks chosen by forward selection

Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 10 subjects

Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 1 2 2 3 5 5 6 7 7 9 10 subjects sample with replacement bootstrap sample from 10 subjects

Forward selection on a bootstrap sample

Bootstrap evidence mounts up…

In 700 bootstraps… Bootstrap Posterior Probability (BPP)

Model averaging by bootstrap aggregation Choosing only one model: –very data-dependent, arbitrary –can’t get all the true QTLs in one model Bootstrap aggregation averages over models – true QTLs get included more often than false ones References: –Broman & Speed (2002) –Hackett et al (2001)

Results

We identified 843 QTLs for 97 phenotypes with BPP greater than 0.25 of which on the basis of simulations we expect 590 to be genuine

Performance of multiple QTL modelling BPP Threshold a Number of QTL b Proportion of detected QTLs that are true c Proportion of true QTLs detected d Expected number of false QTLs per genome scan e 0.0531270.260.900.47 0.1021190.410.900.43 0.2011050.630.890.32 0.258430.700.890.25 0.306330.750.890.21 0.403640.820.890.17 0.502510.850.880.12 0.75580.910.870.04 0.90300.910.850.03 1.00130.960.600.00

Where to find the results http://gscan.well.ox.ac.uk/

Distribution of effect sizes

Megabases 95% confidence intervals Resolution

Number of Genes per Locus

Results summary 843 peaks found with BPP > 0.25 8.7 peaks per phenotype on average Based on simulation, we expect ~590 to be genuine. Mean 95% CI width 2.78 Mb Mean number of genes under each 95% CI is 28.9

Results ~7 jointly significant QTL per phenotype 95% Confidence Interval ~ 2 Mb ~50% of QTL have a significant non- additive component Only 3 phenotypes were explained by single major QTL –Most phenotypes are complex

Distribution of QTL Effects Mean Effect size 2.7%

%Variance Explained [% Additive Genetic Variance calculated using 3-generation pedigree data, not genotypes]

Coat colour genes

A known QTL: HDL Wang et al, 2003 HS mapping

New QTLs: two examples Ear Punch Hole Area Regrowth –wound healing Cue Conditioning Freeze.During.Tone –measure of fear

Cue Conditioning Freeze.During.Tone: huge effect, small number of genes cntn1: Contactin precursor (Neural cell surface protein) chr15

What do we want? Biological: –Joint QTL containing the functional genes and that lead to their identification –But genetic mapping finds the variants not the genes Statistical: –Multi-locus QTL selection algorithms that predict the phenotype of new animals accurately –Model-Averaging: no best choice? –Ghost QTL Are statistical QTL algorithms consistent? –Do they find the biological QTL given a large enough sample size? –Simulations of multiple QTL models indicate mapping accuracy declines as complexity increases [Valdar et al 2006 Genetics in press]

Work of many hands Carmen Arboleda-Hitas Amarjit Bhomra Stephanie Burnett Peter Burns Richard Copley Stuart Davidson Simon Fiddy Jonathan Flint Polinka Hernandez Sue Miller Richard Mott Chela Nunez Gemma Peachey Sagiv Shifman Leah Solberg Amy Taylor Martin Taylor William Valdar Binnaz Yalcin Dave Bannerman Shoumo Bhattacharya Bill Cookson Rob Deacon Dominique Gauguier Doug Higgs Tertius Hough Paul Klenerman Nick Rawlins Jennifer Taylor Chris Holmes Project funded by The Wellcome Trust, UK

Data are publicly available http://gscan.well.ox.ac.uk

Gene x Environment Gene x Sex Repeat analysis looking for QTLs that interact with –Gender –Litter number –Season, Month, etc –Experimenter Compare models E(y) =  + locus + env E(y) =  + locus * env

Gene x Environment 431 jointly significant GxE QTLs –27 gene x experimenter, –81 gene x litter number, –67 gene x age, –105 gene x study day –151 gene x season. 13% of variation is GxE 25 GxE QTLs overlapped with original joint QTL –defined as lying within 4Mb of the peak position 42 GxSex QTLs

Gene Expression Data (with Binnaz Yalcin, Jennifer Taylor) Illumina 40k chip Livers, Lungs (Brains) –190 HS –HS founders

Phenotype-gene expression correlation Liver gene expression in 180 HS mice Slc4a7

Testing for Functional Variants Is a SNP functional for a trait? Is a functional assay measured in founders related to a trait? –Gene expression –DNA-Protein binding

Testing for non-Functional Variants Is a SNP’s pattern of variation inconsistent with the QTL’s pattern of action ? Is a functional assay’s distribution inconsistent with the QTL’s pattern of action ?

Merge Analysis Yalcin et al 2005 Genetics Require sequence of HS founders –Determine all variants and their strain distribution patterns (SDP) Don’t genotype every variant in the HS –Instead predict genotypes in HS at all variants based on a sparse skeleton of genotypes

Merge Analysis A variant v will partition the HS founder strains into 2 or more groups, depending on its strain distribution pattern (SDP) If p is functional for the trait then the strain effects at the QTL must be identical for strains with the same allele. –so if merging founders according to v’s SDP destroys significance then we reject v

Merge Analysis Model Comparison p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) Replace strains s,t by merged pseudo-strains g,h –Add together probabilities for strains with the same allele –Phenotypic effect of merged strains g,h is  (g,h) v iL (g,h) = Prob( animal i is descended from merged strains g,h at locus L) Compare fits of nested models E(y i ) =  s,t p iL (s,t)T(s,t)+  i unmerged E(y i ) =  g,h v iL (g,h)  (g,h)+  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

Merge Analysis Open Field Activity, Chr 1

Merge Analysis rgs18

Functional Merge Analysis Measure functional assay on HS founders –F L (t) is value at locus L on founder s –e.g. gene expression Expected value in HS is E(f i ) =  s,t p iL (s,t)[F(s) + F(t)] assuming additivity If assay is related to phenotype y then E(y i ) =  E(f i ) +  i Compare nested models (thanks to Chris Holmes) E(y i ) =  s,t p iL (s,t)T(s,t) +  i unmerged E(y i ) =  s,t p iL (s,t)[F(s) + F(t)] +  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

model difference logp Using Gene Expression in HS founders

Future Work

Extensions to basic model Generalised linear models Multivariate data Mixture Models, EM (Chris Holmes) Family Effects, Variance Components, REML (Peter Visscher, Allan McRae) Gene Annotation Data (Kate Elliot) Multiple QTL models Epistasis Pleiotropy

Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,

Similar presentations

Presentation on theme: "Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,

Similar presentations

Presentation on theme: "Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,"— Presentation transcript:

Similar presentations

About project

Feedback