Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,

Similar presentations


Presentation on theme: "Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,"— Presentation transcript:

1 Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman, William O. Cookson, Martin Taylor, J. Nicholas P. Rawlins, Richard Mott, Jonathan Flint.

2 Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis) Complex – controlled by multiple genes*environment (diabetes, asthma)

3 Quantitative Trait Loci QTL: Quantitative Trait Locus chromosome genes

4 Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene chromosome

5 Quantitative Trait Loci QTL: Quantitative Trait Locus QTG: Quantitative Trait Gene QTN: Quantitative Trait Nucleotide chromosome

6 Association Studies: Map in Humans or Animal Models ? Disease studied directly Population and environment stratification Very many SNPs (1,000,000?) required Hard to detect trait loci – very large sample sizes required to detect loci of small effect (5,000-10,000) Potentially very high mapping resolution – single gene Very Expensive Animal Model required Population and environment controlled Fewer SNPs required (~100- 10,000) Easy to detect QTL with ~500 animals Poorer mapping resolution – 1Mb (10 genes) Relatively inexpensive

7 Mosaic Crosses Inbred foundersG3GNGNF20 mixingchopping up inbreeding F2, diallele Heterogeneous Stock, Advanced Intercross, Random Outbreds Recombinant Inbred Lines

8 Sizes of Behavioural QTL in rodents (% of total phenotypic variance)

9 Effect size of cloned genes

10 Mapping Resolution F2 crosses –Powerful at detecting QTL –Poor at Localisation – 20cM –Too few recombinants Increase number of recombinants: –more animals –more generations in cross

11 Heterogeneous Stocks cross 8 inbred strains for >10 generations

12 Heterogeneous Stocks cross 8 inbred strains for >10 generations

13 Heterogeneous Stocks cross 8 inbred strains for >10 generations 0.25 cM

14 Multiple Phenotype QTL Experiment

15 Multiple Phenotypes measured on a Heterogeneous Stock 2000 HS mice (Northport, Bob Hitzeman) 84 families 40 th generation 150 traits measured on each animal –Standardised phenotyping protocol –Covariates Recorded Experimenter Time/Date Litter –Microchipping

16 Phenotypes Anxiety (Conditioned and Unconditioned Tests) Asthma (Plethysmography) Diabetes (Glucose Tolerance Test) Haematology Immunology Biochemistry Wound Healing (Ear Punch) Gene Expression ….others….

17 High throughput phenotyping facility

18 Neophobia

19 Fear Potentiated Startle

20 Ovalbumin sensitization

21 Plethysmograph

22 Intraperitoneal Glucose Tolerance Test

23 Ears

24

25 Genotyping 15360 SNPs genotyped by Illumina –2000 HS mice –300 HS parents –8 inbred HS founders –500 other inbreds www.well.ox.ac.uk/mouse/snp.selector 13459 SNPs successful 99.8% accuracy (parent-offspring)

26 Distribution of Marker Spacing (chromosome X) (9 Markers)

27 LD Decay with distance 99.2% marker pairs on different autosomes have R 2 < 0.05.

28 Genetic Drift in HS 40 generations of breeding Allele Frequency in founders will drift 8% of genome fixed Allele Frequency in Founders Allele Frequency in HS 12.514.99 2523.23 37.529.77 5031.45

29 Analysis Automated analysis pipeline –R HAPPY package –Single Marker Association Each phenotype analysed independently –Transformed to Normality, outliers removed –Tailored set of covariates –Linear models for most phenotypes –Survival models for latency phenotypes

30 chromosome markers Want to predict ancestral strain from genotype We know the alleles in the founder strains Single marker association lacks power, can’t distinguish all strains Multipoint analysis – combine data from neighbouring markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock

31 chromosome markers alleles 1121 1 2111221221111211211111212212 1 Twisted Pair Analysis of Heterogeneous Stock Hidden Markov model HAPPY Hidden states = ancestral strains Observed states = genotypes Unknown phase of genotypes Analyse both chromosomes simultaneously Twisted pair of HMMs Mott et al 2000 PNAS

32 Testing for a QTL p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) p iL (s,t) calculated by HMM using –genotype data –founder strains’ alleles Phenotype is modelled E(y i ) =  s,t p iL (s,t)T(s,t) +  i Var(y i ) =  2 Test for no QTL at locus L –H 0 : T(s,t) are all same –ANOVA partial F test

33 Genome Scan Additive and dominance models Record all peaks that exceed 5% genome- wide significance, –Threshold based on 200 permutations –9000 preliminary candidate QTL found

34 Results

35 Many peaks mean red cell volume

36 How to select peaks: a simulated example

37 Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance

38 Simulated example: 1D scan

39 Peaks from 1D scan phenotype ~ covariates + ?

40 1D scan: condition on 1 peak phenotype ~ covariates + peak 1 + ?

41 1D scan: condition on 2 peaks phenotype ~ covariates + peak 1 + peak 2 + ?

42 1D scan: condition on 3 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + ?

43 1D scan: condition on 4 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + ?

44 1D scan: condition on 5 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?

45 1D scan: condition on 6 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?

46 1D scan: condition on 7 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?

47 1D scan: condition on 8 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?

48 1D scan: condition on 9 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?

49 1D scan: condition on 10 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?

50 1D scan: condition on 11 peaks phenotype ~ covariates + peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?

51 Peaks chosen by forward selection

52 Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 10 subjects

53 Bootstrap sampling 1 2 3 4 5 6 7 8 9 10 1 2 2 3 5 5 6 7 7 9 10 subjects sample with replacement bootstrap sample from 10 subjects

54

55 Forward selection on a bootstrap sample

56

57

58 Bootstrap evidence mounts up…

59 In 700 bootstraps… Bootstrap Posterior Probability (BPP)

60 Model averaging by bootstrap aggregation Choosing only one model: –very data-dependent, arbitrary –can’t get all the true QTLs in one model Bootstrap aggregation averages over models – true QTLs get included more often than false ones References: –Broman & Speed (2002) –Hackett et al (2001)

61 Results

62 We identified 843 QTLs for 97 phenotypes with BPP greater than 0.25 of which on the basis of simulations we expect 590 to be genuine

63 Performance of multiple QTL modelling BPP Threshold a Number of QTL b Proportion of detected QTLs that are true c Proportion of true QTLs detected d Expected number of false QTLs per genome scan e 0.0531270.260.900.47 0.1021190.410.900.43 0.2011050.630.890.32 0.258430.700.890.25 0.306330.750.890.21 0.403640.820.890.17 0.502510.850.880.12 0.75580.910.870.04 0.90300.910.850.03 1.00130.960.600.00

64 Where to find the results http://gscan.well.ox.ac.uk/

65 Distribution of effect sizes

66 Megabases 95% confidence intervals Resolution

67 Number of Genes per Locus

68 Results summary 843 peaks found with BPP > 0.25 8.7 peaks per phenotype on average Based on simulation, we expect ~590 to be genuine. Mean 95% CI width 2.78 Mb Mean number of genes under each 95% CI is 28.9

69 Results ~7 jointly significant QTL per phenotype 95% Confidence Interval ~ 2 Mb ~50% of QTL have a significant non- additive component Only 3 phenotypes were explained by single major QTL –Most phenotypes are complex

70 Distribution of QTL Effects Mean Effect size 2.7%

71 %Variance Explained [% Additive Genetic Variance calculated using 3-generation pedigree data, not genotypes]

72 Coat colour genes

73 A known QTL: HDL Wang et al, 2003 HS mapping

74 New QTLs: two examples Ear Punch Hole Area Regrowth –wound healing Cue Conditioning Freeze.During.Tone –measure of fear

75

76 Cue Conditioning Freeze.During.Tone: huge effect, small number of genes cntn1: Contactin precursor (Neural cell surface protein) chr15

77 What do we want? Biological: –Joint QTL containing the functional genes and that lead to their identification –But genetic mapping finds the variants not the genes Statistical: –Multi-locus QTL selection algorithms that predict the phenotype of new animals accurately –Model-Averaging: no best choice? –Ghost QTL Are statistical QTL algorithms consistent? –Do they find the biological QTL given a large enough sample size? –Simulations of multiple QTL models indicate mapping accuracy declines as complexity increases [Valdar et al 2006 Genetics in press]

78 Work of many hands Carmen Arboleda-Hitas Amarjit Bhomra Stephanie Burnett Peter Burns Richard Copley Stuart Davidson Simon Fiddy Jonathan Flint Polinka Hernandez Sue Miller Richard Mott Chela Nunez Gemma Peachey Sagiv Shifman Leah Solberg Amy Taylor Martin Taylor William Valdar Binnaz Yalcin Dave Bannerman Shoumo Bhattacharya Bill Cookson Rob Deacon Dominique Gauguier Doug Higgs Tertius Hough Paul Klenerman Nick Rawlins Jennifer Taylor Chris Holmes Project funded by The Wellcome Trust, UK

79 Data are publicly available http://gscan.well.ox.ac.uk

80 Gene x Environment Gene x Sex Repeat analysis looking for QTLs that interact with –Gender –Litter number –Season, Month, etc –Experimenter Compare models E(y) =  + locus + env E(y) =  + locus * env

81 Gene x Environment 431 jointly significant GxE QTLs –27 gene x experimenter, –81 gene x litter number, –67 gene x age, –105 gene x study day –151 gene x season. 13% of variation is GxE 25 GxE QTLs overlapped with original joint QTL –defined as lying within 4Mb of the peak position 42 GxSex QTLs

82

83

84

85 Gene Expression Data (with Binnaz Yalcin, Jennifer Taylor) Illumina 40k chip Livers, Lungs (Brains) –190 HS –HS founders

86 Phenotype-gene expression correlation Liver gene expression in 180 HS mice Slc4a7

87 Testing for Functional Variants Is a SNP functional for a trait? Is a functional assay measured in founders related to a trait? –Gene expression –DNA-Protein binding

88 Testing for non-Functional Variants Is a SNP’s pattern of variation inconsistent with the QTL’s pattern of action ? Is a functional assay’s distribution inconsistent with the QTL’s pattern of action ?

89 Merge Analysis Yalcin et al 2005 Genetics Require sequence of HS founders –Determine all variants and their strain distribution patterns (SDP) Don’t genotype every variant in the HS –Instead predict genotypes in HS at all variants based on a sparse skeleton of genotypes

90 Merge Analysis A variant v will partition the HS founder strains into 2 or more groups, depending on its strain distribution pattern (SDP) If p is functional for the trait then the strain effects at the QTL must be identical for strains with the same allele. –so if merging founders according to v’s SDP destroys significance then we reject v

91 Merge Analysis Model Comparison p iL (s,t) = Prob( animal i is descended from strains s,t at locus L) Replace strains s,t by merged pseudo-strains g,h –Add together probabilities for strains with the same allele –Phenotypic effect of merged strains g,h is  (g,h) v iL (g,h) = Prob( animal i is descended from merged strains g,h at locus L) Compare fits of nested models E(y i ) =  s,t p iL (s,t)T(s,t)+  i unmerged E(y i ) =  g,h v iL (g,h)  (g,h)+  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

92 Merge Analysis Open Field Activity, Chr 1

93 Merge Analysis rgs18

94 Functional Merge Analysis Measure functional assay on HS founders –F L (t) is value at locus L on founder s –e.g. gene expression Expected value in HS is E(f i ) =  s,t p iL (s,t)[F(s) + F(t)] assuming additivity If assay is related to phenotype y then E(y i ) =  E(f i ) +  i Compare nested models (thanks to Chris Holmes) E(y i ) =  s,t p iL (s,t)T(s,t) +  i unmerged E(y i ) =  s,t p iL (s,t)[F(s) + F(t)] +  i merged E(y i ) =  i null Require no significant difference between merged and unmerged models, –and for both to be significant compared to null model

95 model difference logp Using Gene Expression in HS founders

96 Future Work

97 Extensions to basic model Generalised linear models Multivariate data Mixture Models, EM (Chris Holmes) Family Effects, Variance Components, REML (Peter Visscher, Allan McRae) Gene Annotation Data (Kate Elliot) Multiple QTL models Epistasis Pleiotropy


Download ppt "Genome-wide genetic association of complex traits in outbred mice William Valdar, Leah C. Solberg, Dominique Gauguier, Stephanie Burnett, Paul Klenerman,"

Similar presentations


Ads by Google