Download presentation
Presentation is loading. Please wait.
Published byKimberly Sullivan Modified over 8 years ago
1
Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O. Cookson, Martin Taylor, J. Nicholas P. Rawlins, Richard Mott, Jonathan Flint.
2
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis) Complex – controlled by multiple genes*environment (diabetes, asthma)
3
Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome genes
4
Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene chromosome
5
Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide chromosome
6
Association Studies: Map in Humans or Animal Models ? Disease studied directly Population and environment stratification Very many SNPs (1,000,000?) required Hard to detect trait loci – very large sample sizes required to detect loci of small effect (5,000-10,000) Potentially very high mapping resolution – single gene Very Expensive Animal Model required Population and environment controlled Fewer SNPs required (~100- 10,000) Easy to detect QTL with ~500 animals Poorer mapping resolution – 1Mb (10 genes) Relatively inexpensive
7
Mosaic Crosses Inbred foundersG3GNGNF20 mixingchopping up inbreeding F2, diallele Heterogeneous Stock, Advanced Intercross, Random Outbreds Recombinant Inbred Lines
8
Sizes of Behavioural QTL in rodents (% of total phenotypic variance)
9
Effect size of cloned genes
10
Mapping Resolution F2 crosses –Powerful at detecting QTL –Poor at Localisation – 20cM –Too few recombinants Increase number of recombinants: –more animals –more generations in cross
11
Heterogeneous Stocks cross 8 inbred strains for >10 generations
12
Heterogeneous Stocks cross 8 inbred strains for >10 generations
13
Heterogeneous Stocks cross 8 inbred strains for >10 generations 0.25 cM
14
Multiple Phenotype QTL Experiment
15
Multiple Phenotypes measured on a Heterogeneous Stock 2000 HS mice (Northport, Bob Hitzeman) 84 families 40 th generation 150 traits measured on each animal –Standardised phenotyping protocol –Covariates Recorded Experimenter Time/Date Litter –Microchipping
16
Phenotypes Anxiety (Conditioned and Unconditioned Tests) Asthma (Plethysmography) Diabetes (Glucose Tolerance Test) Haematology Immunology Biochemistry Wound Healing (Ear Punch) Gene Expression ….others….
17
High throughput phenotyping facility
18
Neophobia
19
Fear Potentiated Startle
20
Ovalbumin sensitization
21
Plethysmograph
22
Intraperitoneal Glucose Tolerance Test
23
Ears
25
Genotyping 15360 SNPs genotyped by Illumina –2000 HS mice –300 HS parents –8 inbred HS founders –500 other inbreds www.well.ox.ac.uk/mouse/snp.selector 13459 SNPs successful 99.8% accuracy (parent-offspring)
26
Distribution of Marker Spacing (chromosome X) (9 Markers)
27
LD Decay with distance 99.2% marker pairs on different autosomes have R 2 < 0.05.
28
Genetic Drift in HS 40 generations of breeding Allele Frequency in founders will drift 8% of genome fixed Allele Frequency in Founders Allele Frequency in HS 12.514.99 2523.23 37.529.77 5031.45
29
Analysis Automated analysis pipeline –R HAPPY package –Single Marker Association Each phenotype analysed independently –Transformed to Normality, outliers removed –Tailored set of covariates –Linear models for most phenotypes –Survival models for latency phenotypes
30
chromosome markers Want to predict ancestral strain from genotype We know the alleles in the founder strains Single marker association lacks power, can’t distinguish all strains Multipoint analysis – combine data from neighbouring markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock
31
chromosome markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock Hidden Markov model HAPPY Hidden states = ancestral strains Observed states = genotypes Unknown phase of genotypes Analyse both chromosomes simultaneously Twisted pair of HMMs Mott et al 2000 PNAS
32
Testing for a QTL p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) p iL (s,t) calculated by HMM using –genotype data –founder strains’ alleles Phenotype is modelled E(y i ) = s,t p iL (s,t)T(s,t) + i Var(y i ) = 2 Test for no QTL at locus L –H 0 : T(s,t) are all same –ANOVA partial F test
33
Genome Scan Additive and dominance models Record all peaks that exceed 5% genome- wide significance, –Threshold based on 200 permutations –9000 preliminary candidate QTL found
34
Results
35
Many peaks mean red cell volume
36
How to select peaks: a simulated example
37
Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance
38
Simulated example: 1D scan
39
Peaks from 1D scan phenotype ~ covariates + ?
40
1D scan: condition on 1 peak phenotype ~ covariates + peak 1 + ?
41
1D scan: condition on 2 peaks phenotype ~ covariates + peak 1 + peak 2 + ?
42
1D scan: condition on 3 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + ?
43
1D scan: condition on 4 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + ?
44
1D scan: condition on 5 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?
45
1D scan: condition on 6 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?
46
1D scan: condition on 7 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?
47
1D scan: condition on 8 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?
48
1D scan: condition on 9 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?
49
1D scan: condition on 10 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?
50
1D scan: condition on 11 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?
51
Peaks chosen by forward selection
52
Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 10 subjects
53
Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 1 2 2 3 5 5 6 7 7 9 10 subjects sample with replacement bootstrap sample from 10 subjects
55
Forward selection on a bootstrap sample
58
Bootstrap evidence mounts up…
59
In 700 bootstraps… Bootstrap Posterior Probability (BPP)
60
Model averaging by bootstrap aggregation Choosing only one model: –very data-dependent, arbitrary –can’t get all the true QTLs in one model Bootstrap aggregation averages over models – true QTLs get included more often than false ones References: –Broman & Speed (2002) –Hackett et al (2001)
61
Results
62
We identified 843 QTLs for 97 phenotypes with BPP greater than 0.25 of which on the basis of simulations we expect 590 to be genuine
63
Performance of multiple QTL modelling BPP Threshold a Number of QTL b Proportion of detected QTLs that are true c Proportion of true QTLs detected d Expected number of false QTLs per genome scan e 0.0531270.260.900.47 0.1021190.410.900.43 0.2011050.630.890.32 0.258430.700.890.25 0.306330.750.890.21 0.403640.820.890.17 0.502510.850.880.12 0.75580.910.870.04 0.90300.910.850.03 1.00130.960.600.00
64
Where to find the results http://gscan.well.ox.ac.uk/
65
Distribution of effect sizes
66
Megabases 95% confidence intervals Resolution
67
Number of Genes per Locus
68
Results summary 843 peaks found with BPP > 0.25 8.7 peaks per phenotype on average Based on simulation, we expect ~590 to be genuine. Mean 95% CI width 2.78 Mb Mean number of genes under each 95% CI is 28.9
69
Results ~7 jointly significant QTL per phenotype 95% Confidence Interval ~ 2 Mb ~50% of QTL have a significant non- additive component Only 3 phenotypes were explained by single major QTL –Most phenotypes are complex
70
Distribution of QTL Effects Mean Effect size 2.7%
71
%Variance Explained [% Additive Genetic Variance calculated using 3-generation pedigree data, not genotypes]
72
Coat colour genes
73
A known QTL: HDL Wang et al, 2003 HS mapping
74
New QTLs: two examples Ear Punch Hole Area Regrowth –wound healing Cue Conditioning Freeze.During.Tone –measure of fear
76
Cue Conditioning Freeze.During.Tone: huge effect, small number of genes cntn1: Contactin precursor (Neural cell surface protein) chr15
77
What do we want? Biological: –Joint QTL containing the functional genes and that lead to their identification –But genetic mapping finds the variants not the genes Statistical: –Multi-locus QTL selection algorithms that predict the phenotype of new animals accurately –Model-Averaging: no best choice? –Ghost QTL Are statistical QTL algorithms consistent? –Do they find the biological QTL given a large enough sample size? –Simulations of multiple QTL models indicate mapping accuracy declines as complexity increases [Valdar et al 2006 Genetics in press]
78
Work of many hands Carmen Arboleda-Hitas Amarjit Bhomra Stephanie Burnett Peter Burns Richard Copley Stuart Davidson Simon Fiddy Jonathan Flint Polinka Hernandez Sue Miller Richard Mott Chela Nunez Gemma Peachey Sagiv Shifman Leah Solberg Amy Taylor Martin Taylor William Valdar Binnaz Yalcin Dave Bannerman Shoumo Bhattacharya Bill Cookson Rob Deacon Dominique Gauguier Doug Higgs Tertius Hough Paul Klenerman Nick Rawlins Jennifer Taylor Chris Holmes Project funded by The Wellcome Trust, UK
79
Data are publicly available http://gscan.well.ox.ac.uk
80
Gene x Environment Gene x Sex Repeat analysis looking for QTLs that interact with –Gender –Litter number –Season, Month, etc –Experimenter Compare models E(y) = + locus + env E(y) = + locus * env
81
Gene x Environment 431 jointly significant GxE QTLs –27 gene x experimenter, –81 gene x litter number, –67 gene x age, –105 gene x study day –151 gene x season. 13% of variation is GxE 25 GxE QTLs overlapped with original joint QTL –defined as lying within 4Mb of the peak position 42 GxSex QTLs
85
Gene Expression Data (with Binnaz Yalcin, Jennifer Taylor) Illumina 40k chip Livers, Lungs (Brains) –190 HS –HS founders
86
Phenotype-gene expression correlation Liver gene expression in 180 HS mice Slc4a7
87
Testing for Functional Variants Is a SNP functional for a trait? Is a functional assay measured in founders related to a trait? –Gene expression –DNA-Protein binding
88
Testing for non-Functional Variants Is a SNP’s pattern of variation inconsistent with the QTL’s pattern of action ? Is a functional assay’s distribution inconsistent with the QTL’s pattern of action ?
89
Merge Analysis Yalcin et al 2005 Genetics Require sequence of HS founders –Determine all variants and their strain distribution patterns (SDP) Don’t genotype every variant in the HS –Instead predict genotypes in HS at all variants based on a sparse skeleton of genotypes
90
Merge Analysis A variant v will partition the HS founder strains into 2 or more groups, depending on its strain distribution pattern (SDP) If p is functional for the trait then the strain effects at the QTL must be identical for strains with the same allele. –so if merging founders according to v’s SDP destroys significance then we reject v
91
Merge Analysis Model Comparison p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) Replace strains s,t by merged pseudo-strains g,h –Add together probabilities for strains with the same allele –Phenotypic effect of merged strains g,h is (g,h) v iL (g,h) = Prob( animal i is descended from merged strains g,h at locus L) Compare fits of nested models E(y i ) = s,t p iL (s,t)T(s,t)+ i unmerged E(y i ) = g,h v iL (g,h) (g,h)+ i merged E(y i ) = i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model
92
Merge Analysis Open Field Activity, Chr 1
93
Merge Analysis rgs18
94
Functional Merge Analysis Measure functional assay on HS founders –F L (t) is value at locus L on founder s –e.g. gene expression Expected value in HS is E(f i ) = s,t p iL (s,t)[F(s) + F(t)] assuming additivity If assay is related to phenotype y then E(y i ) = E(f i ) + i Compare nested models (thanks to Chris Holmes) E(y i ) = s,t p iL (s,t)T(s,t) + i unmerged E(y i ) = s,t p iL (s,t)[F(s) + F(t)] + i merged E(y i ) = i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model
95
model difference logp Using Gene Expression in HS founders
96
Future Work
97
Extensions to basic model Generalised linear models Multivariate data Mixture Models, EM (Chris Holmes) Family Effects, Variance Components, REML (Peter Visscher, Allan McRae) Gene Annotation Data (Kate Elliot) Multiple QTL models Epistasis Pleiotropy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.