Complex Trait Genetics in Animal Models

Complex Trait Genetics in Animal Models
Will Valdar Oxford University

Mapping Genes for Quantitative Traits in Outbred Mice
Will Valdar Oxford University

What’s so great about mice?
Share ~99% of genes with humans ~90% of the two genomes can be portioned into regions of conserved synteny Shorter lifespans You can do invasive experiments You can breed them as you like – control the genetics

What is an inbred strain?
F20 generation F = 99% BALB/c

one inbred strain

two inbred strains

Mouse model of anxiety Mice are not natural predators, they are prey to larger animals. And as such, they like small dark places. This is an open field arena, which is a brightly lit open box, which we would consider would be anxiety-producing to a mouse. We placed the mouse into this arena, and we use a video tracking device to monitor its behaviour.

Mouse model of anxiety Anxious mouse Non anxious mouse
On the right, I show the movements of the mouse. The red lines indicate movement and the green spots indicate freezing positions. You can see that the mouse in the top panel is thoroughly exploring it’s new territory, moving all about the enclosure including the central area away from the relatively “safe” walls of the open field arena. On the bottom panel, you can see a completely contrasting behaviour. The mouse has entered through the door on the right hand side, and has frozen as soon as it has entered the arena, it is not exploring it’s new territory at all. An additional measure of anxiety is defecation.

F2 cross Generation F0 F1 F2

anxious about average non-anxious
Suppose we have a quantitative phenotype that is influenced by a genetic variant. For example, say our normally distributed phenotype is affected by a QTL accounting for some small fraction of the variance, 10% or so. If we plotted a histogram of the phenotype, it would look something like this. However, assuming the QTL had two alleles and acted in an additive manner, we would in fact be looking at three populations. non-anxious

Quantitative Trait Locus
anxious about average Those individuals homozygous for the low allele. non-anxious

anxious about average Those who are homozygous for the high allele. non-anxious

anxious about average And those who are heterozygous, and who tend to have an intermediate phenotype. non-anxious

anxious about average Here are the three genotypes, and the first step is to encode these some way so they can be correlated with the phenotype. non-anxious QTL snp

Linear models Also known as ANOVA ANCOVA regression
multiple regression linear regression To keep it general, we’ll be using linear models. ANOVA, ANCOVA, regression, mulitple regression and linear regression. These are different names that people use for linear models, but of course they’re all the same thing.

+1 -1 A simple way is to represent the snp by x and let x equal -1, 0 and +1 for the three genotypes in turn. QTL snp

+1 -1 Having done this, it’s natural to model the effect of the genotype on the phenotype as a linear model, where y, representing the phenotype is equal to mu plus ax plus episilon… QTL snp

+1 -1 mu defines the mean QTL snp

+1 -1 a then describes the average effect of adding or taking away copies of the high allele, and epsilon represents the error and accounts for the fact that animals with the same genotype may still have different phenotypes. QTL snp

Hypothesis testing H0: H1:
Of course, the way we go about testing whether there is a significant effect of genotype is by comparing two models: one representing a null hypothesis and the other an alternative hypothesis… the phenotype is influenced by the mean plus some error vs the phenotype is influenced by the mean plus the genotype plus error.

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x
And the way we write those formulas in R is like this. y twiddles 1, y is influenced by the mean, y twiddles 1 plus x.

H1 vs H0 : Does x explain a significant amount of the variation? Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like.

H1 vs H0 : Does x explain a significant amount of the variation? LOD score Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like. likelihood ratio

H1 vs H0 : Does x explain a significant amount of the variation? LOD score Or more meaningfully used in a chi-square test, yeilding a p-value, which we often find convenient to express as a on the log scale as a logP. likelihood ratio Chi Square test p-value logP

H1 vs H0 : Does x explain a significant amount of the variation? LOD score In the special case of linear models we could do a likelihood ratio test, but it turns out to be more accurate to use explained and unexplained sums of squares leading to an F-test, and to get our p-value and logP that way. likelihood ratio Chi Square test p-value logP linear models only SS explained / SS unexplained F-test (or t-test)

Hypothesis testing H0: y ~ 1 + x1 H1: y ~ 1 + x1 + x2
H1 vs H0 : Does x2 explain a significant amount of the variation after accounting for x1? or is x2 significant conditional on x1? Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like.

F2 cross Generation F0 F1 F2

QTL This is the chromosome where the QTL is and all those lines represent every snp on that chromosome. The marked snp is the QTL. 29

QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. 30

highly significant snp
QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. highly significant snp non significant snp 31

QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. 32

QTL logP Genotype every animal at snp 1 and compare the 33

Chromosome scan for F2 Typical chromosome QTL goodness of fit (logP)
significance threshold 200Mb position along whole chromosome (Mb) Typical chromosome

Advanced intercross lines (AILs)
F0 F1 F2 F3 F4 Darvasi & Soller (1995) Genetics

F12 cross F0 F1 F2 F12

Chromosome scan for F12 Typical chromosome QTL goodness of fit (logP)
significance threshold 200Mb position along whole chromosome (Mb) Typical chromosome

Practical Fitting a linear model to test a marker-phenotype association Single marker association on an F2 Permutation test Single marker association on an AIL (F12) Conditional modelling of loci Start Firefox, File->Open and go to F:\valdar\ThursdayAfternoonAnimals\practical.R Start R

Practical: F2 cross Bonferroni = 2.6 permutation ~ 2.1
uncorrected = 1.3

Practical: F2 cross Bonferroni = 2.6 permutation ~ 2.1
uncorrected = 1.3 generalized extreme value (GEV) distribution

Practical: F12 phenotype ~ MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER
phenotype ~ m37 + m29 + MARKER

F2 F18 F18 multilocus approach

F2 breeding in a small population F18 F18 multilocus approach

F2 population structure gross genetic differences between groups where groups = families F18 multilocus approach

F2 population structure gross genetic differences between groups where groups = families multilocus approach

Heterogeneous Stocks The title of my talk refers to the mouse version of human association studies. In fact the project I’m going to describe could been seen as something in between an association study and an inbred line cross. The population we’ve used are the heterogeneous stock mice pictured here, also known as the HS. 51

Heterogeneous Stock Pseudo-random mating for 50 generations
Avg. Distance Between Recombinations: HS is an outbred population large number of recombinants HS has 8 progenitors greater haplotypic diversity more likely QTLs will segregate HS ~2 cM

Heterogeneous Stock F2 Intercross x Pseudo-random mating
for 50 generations F1 Avg. Distance Between Recombinations: HS is an outbred population large number of recombinants HS has 8 progenitors greater haplotypic diversity more likely QTLs will segregate HS ~2 cM F2 intercross ~30 cM F2

124 Phenotypes Anxiety [24] Asthma [13] Biochemistry [15]
Bone Morphology [23] Diabetes [16] Haematology [15] Immunology [9] Weight/size related [8] Wound Healing [1] 54 54

Intraperitoneal Glucose Tolerance Test
55

How to select peaks: a simulated example
Reallistic example

How to select peaks: a simulated example
Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance We’ve estimated 20% cage effects

Simulated example

Say forward selection phenotype ~ ?

condition on 1 peak phenotype ~ peak 1 + ?

condition on 2 peaks phenotype ~ peak 1 + peak 2 + ?

condition on 3 peaks phenotype ~ peak 1 + peak 2 + peak 3 + ?

condition on 4 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + ?

condition on 5 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?

condition on 6 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?

condition on 7 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?

condition on 8 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?

condition on 9 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?

condition on 10 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?

condition on 11 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?

Peaks chosen by forward selection
We have recorded all the QTLs in this simulation, but some false positives. Also in other simulations we missed some. How can we tell which peaks are genuine? One way to think about this sis suppose we had chosen a slightly different set of HS mice: the peak heights would have been different and our forward selection would have chosen slightly different set of peaks.

Bootstrap sampling 1 2 3 4 10 subjects 5 6 7 8 9 10
To deal with this we’ve taken a bootstrapping approach. Let me explain what this is.

Bootstrap sampling sample with replacement 1 2 3 4 5 6 7 8 9 10 1 2 3
bootstrap sample from 10 subjects 10 subjects So what I will do is create a bootstrap of the 2000 mice and repeat the forward selection

Forward selection on a bootstrap sample

Bootstrap evidence mounts up…

In 1000 bootstraps… Model Inclusion Probability

854 loci in all phenotypes, 84 diabetes loci

Servin B, Stephens M (2007)

Bayesian Multiple QTL modelling
Kilpikari R, Sillanpaa MJ (2003) Bayesian analysis of multilocus association in quantitative and qualitative traits. Genet Epidemiol 25: Yi N (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167: Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3: e114 Fridley BL (2008) Bayesian variable and model selection methods for genetic association studies. Genet Epidemiol.

The Collaborative Cross
Heterogeneous Stocks (HS) Collaborative Cross outbreed outbred population recombinant inbred lines Churchill et al 2004; Broman 2005; Valdar et al 2006

Complex Trait Genetics in Animal Models

Similar presentations

Presentation on theme: "Complex Trait Genetics in Animal Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complex Trait Genetics in Animal Models

Similar presentations

Presentation on theme: "Complex Trait Genetics in Animal Models"— Presentation transcript:

Similar presentations

About project

Feedback