Complex Trait Genetics in Animal Models

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

Planning breeding programs for impact
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Gene by environment effects. Elevated Plus Maze (anxiety)
Experimental crosses. Inbred Strain Cross Backcross.
Qualitative and Quantitative traits
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Basics of Linkage Analysis
QTL Mapping R. M. Sundaram.
A multi-phenotype protocol for fine scale mapping of QTL in outbred heterogeneous stock mice LC Solberg, C Arboledas, P Burns, S Davidson, G Nunez, A Taylor,
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Statistical association of genotype and phenotype.
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
Quantitative Genetics
Quantitative Genetics
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
From QTL to QTG: Are we getting closer? Sagiv Shifman and Ariel Darvasi The Hebrew University of Jerusalem.
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
Regulation of gene expression in the mammalian eye and its relevance to eye disease Todd Scheetz et al. Presented by John MC Ma.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Statistics for Differential Expression Naomi Altman Oct. 06.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
QTL Cartographer A Program Package for finding Quantitative Trait Loci C. J. Basten Z.-B. Zeng and B. S. Weir.
Association between genotype and phenotype
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Stats Methods at IC Lecture 3: Regression.
Identifying candidate genes for the regulation of the response to Trypanosoma congolense infection Introduction African cattle breeds differ significantly.
Dependent-Samples t-Test
MULTIPLE GENES AND QUANTITATIVE TRAITS
Identifying QTLs in experimental crosses
Why is this important? Requirement Understand research articles
Signatures of Selection
upstream vs. ORF binding and gene expression?
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Mapping variation in growth in response to glucose concentration
Quantitative traits Lecture 13 By Ms. Shumaila Azam
Genome Wide Association Studies using SNP
Spring 2009: Section 5 – Lecture 1
Gene mapping in mice Karl W Broman Department of Biostatistics
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
The same gene can have many versions.
Lecture 7: QTL Mapping I:
Statistical issues in QTL mapping in mice
Mapping Quantitative Trait Loci
MULTIPLE GENES AND QUANTITATIVE TRAITS
Genetic architecture of behaviour
The genomes of recombinant inbred lines
I. Statistical Tests: Why do we use them? What do they involve?
What are BLUP? and why they are useful?
Lecture 10: QTL Mapping II: Outbred Populations
Lecture 9: QTL Mapping II: Outbred Populations
Chapter 7 Beyond alleles: Quantitative Genetics
Cancer as a Complex Genetic Trait
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Complex Trait Genetics in Animal Models Will Valdar Oxford University

Mapping Genes for Quantitative Traits in Outbred Mice Will Valdar Oxford University

What’s so great about mice? Share ~99% of genes with humans ~90% of the two genomes can be portioned into regions of conserved synteny Shorter lifespans You can do invasive experiments You can breed them as you like – control the genetics

What is an inbred strain? F20 generation F = 99% BALB/c

http://www.informatics.jax.org/mgihome/genealogy/

one inbred strain

two inbred strains

Mouse model of anxiety Mice are not natural predators, they are prey to larger animals. And as such, they like small dark places. This is an open field arena, which is a brightly lit open box, which we would consider would be anxiety-producing to a mouse. We placed the mouse into this arena, and we use a video tracking device to monitor its behaviour.

Mouse model of anxiety Anxious mouse Non anxious mouse On the right, I show the movements of the mouse. The red lines indicate movement and the green spots indicate freezing positions. You can see that the mouse in the top panel is thoroughly exploring it’s new territory, moving all about the enclosure including the central area away from the relatively “safe” walls of the open field arena. On the bottom panel, you can see a completely contrasting behaviour. The mouse has entered through the door on the right hand side, and has frozen as soon as it has entered the arena, it is not exploring it’s new territory at all. An additional measure of anxiety is defecation.

F2 cross Generation F0 F1 F2

anxious about average non-anxious Suppose we have a quantitative phenotype that is influenced by a genetic variant. For example, say our normally distributed phenotype is affected by a QTL accounting for some small fraction of the variance, 10% or so. If we plotted a histogram of the phenotype, it would look something like this. However, assuming the QTL had two alleles and acted in an additive manner, we would in fact be looking at three populations. non-anxious

Quantitative Trait Locus anxious about average Those individuals homozygous for the low allele. non-anxious

Quantitative Trait Locus anxious about average Those who are homozygous for the high allele. non-anxious

Quantitative Trait Locus anxious about average And those who are heterozygous, and who tend to have an intermediate phenotype. non-anxious

Quantitative Trait Locus anxious about average Here are the three genotypes, and the first step is to encode these some way so they can be correlated with the phenotype. non-anxious QTL snp

Linear models Also known as ANOVA ANCOVA regression multiple regression linear regression To keep it general, we’ll be using linear models. ANOVA, ANCOVA, regression, mulitple regression and linear regression. These are different names that people use for linear models, but of course they’re all the same thing.

+1 -1 A simple way is to represent the snp by x and let x equal -1, 0 and +1 for the three genotypes in turn. QTL snp

+1 -1 Having done this, it’s natural to model the effect of the genotype on the phenotype as a linear model, where y, representing the phenotype is equal to mu plus ax plus episilon… QTL snp

+1 -1 mu defines the mean QTL snp

+1 -1 a then describes the average effect of adding or taking away copies of the high allele, and epsilon represents the error and accounts for the fact that animals with the same genotype may still have different phenotypes. QTL snp

Hypothesis testing H0: H1: Of course, the way we go about testing whether there is a significant effect of genotype is by comparing two models: one representing a null hypothesis and the other an alternative hypothesis… the phenotype is influenced by the mean plus some error vs the phenotype is influenced by the mean plus the genotype plus error.

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x And the way we write those formulas in R is like this. y twiddles 1, y is influenced by the mean, y twiddles 1 plus x.

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x H1 vs H0 : Does x explain a significant amount of the variation? Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like.

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x H1 vs H0 : Does x explain a significant amount of the variation? LOD score Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like. likelihood ratio

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x H1 vs H0 : Does x explain a significant amount of the variation? LOD score Or more meaningfully used in a chi-square test, yeilding a p-value, which we often find convenient to express as a on the log scale as a logP. likelihood ratio Chi Square test p-value logP

Hypothesis testing H0: y ~ 1 H1: y ~ 1 + x H1 vs H0 : Does x explain a significant amount of the variation? LOD score In the special case of linear models we could do a likelihood ratio test, but it turns out to be more accurate to use explained and unexplained sums of squares leading to an F-test, and to get our p-value and logP that way. likelihood ratio Chi Square test p-value logP linear models only SS explained / SS unexplained F-test (or t-test)

Hypothesis testing H0: y ~ 1 + x1 H1: y ~ 1 + x1 + x2 H1 vs H0 : Does x2 explain a significant amount of the variation after accounting for x1? or is x2 significant conditional on x1? Our model comparison amounts to asking the question “does x explain a significant amount of the variation” , or Does the larger model fit the data significantly better than the smaller model given the number of parameters required for that improvement. The comparison of the two models yeilds a likelihood ratio, which can be written as a LOD score if you like.

F2 cross Generation F0 F1 F2

QTL This is the chromosome where the QTL is and all those lines represent every snp on that chromosome. The marked snp is the QTL. 29

QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. 30

highly significant snp QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. highly significant snp non significant snp 31

QTL logP If we perform our hypothesis test at every snp and plot them on a graph we get something like this. Snps that have no influence on the trait have low scores and the QTL snp has the highest score. But what you’ll also notice is that the snps around the QTL also have high scores. 32

QTL logP Genotype every animal at snp 1 and compare the 33

Chromosome scan for F2 Typical chromosome QTL goodness of fit (logP) significance threshold 200Mb position along whole chromosome (Mb) Typical chromosome

Advanced intercross lines (AILs) F0 F1 F2 F3 F4 Darvasi & Soller (1995) Genetics

F12 cross F0 F1 F2 F12

Chromosome scan for F12 Typical chromosome QTL goodness of fit (logP) significance threshold 200Mb position along whole chromosome (Mb) Typical chromosome

Practical Fitting a linear model to test a marker-phenotype association Single marker association on an F2 Permutation test Single marker association on an AIL (F12) Conditional modelling of loci Start Firefox, File->Open and go to F:\valdar\ThursdayAfternoonAnimals\practical.R Start R

Practical: F2 cross Bonferroni = 2.6 permutation ~ 2.1 uncorrected = 1.3

Practical: F2 cross Bonferroni = 2.6 permutation ~ 2.1 uncorrected = 1.3 generalized extreme value (GEV) distribution

Practical: F12 phenotype ~ MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER

Practical: F12 phenotype ~ MARKER phenotype ~ m37 + MARKER phenotype ~ m37 + m29 + MARKER

F2 F18 F18 multilocus approach

F2 F18 F18 multilocus approach

F2 breeding in a small population F18 F18 multilocus approach

F2 population structure gross genetic differences between groups where groups = families F18 multilocus approach

F2 population structure gross genetic differences between groups where groups = families multilocus approach

Heterogeneous Stocks The title of my talk refers to the mouse version of human association studies. In fact the project I’m going to describe could been seen as something in between an association study and an inbred line cross. The population we’ve used are the heterogeneous stock mice pictured here, also known as the HS. 51

Heterogeneous Stock Pseudo-random mating for 50 generations Avg. Distance Between Recombinations: HS is an outbred population large number of recombinants HS has 8 progenitors greater haplotypic diversity more likely QTLs will segregate HS ~2 cM

Heterogeneous Stock F2 Intercross x Pseudo-random mating for 50 generations F1 Avg. Distance Between Recombinations: HS is an outbred population large number of recombinants HS has 8 progenitors greater haplotypic diversity more likely QTLs will segregate HS ~2 cM F2 intercross ~30 cM F2

124 Phenotypes Anxiety [24] Asthma [13] Biochemistry [15] Bone Morphology [23] Diabetes [16] Haematology [15] Immunology [9] Weight/size related [8] Wound Healing [1] 54 54

Intraperitoneal Glucose Tolerance Test 55

How to select peaks: a simulated example Reallistic example

How to select peaks: a simulated example Simulate 7 x 5% QTLs (ie, 35% genetic effect) + 20% shared environment effect + 45% noise = 100% variance We’ve estimated 20% cage effects

Simulated example

Say forward selection phenotype ~ ?

condition on 1 peak phenotype ~ peak 1 + ?

condition on 2 peaks phenotype ~ peak 1 + peak 2 + ?

condition on 3 peaks phenotype ~ peak 1 + peak 2 + peak 3 + ?

condition on 4 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + ?

condition on 5 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + ?

condition on 6 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + ?

condition on 7 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + ?

condition on 8 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + ?

condition on 9 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + ?

condition on 10 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + ?

condition on 11 peaks phenotype ~ peak 1 + peak 2 + peak 3 + peak 4 + peak 5 + peak 6 + peak 7 + peak 8 + peak 9 + peak 10 + peak 11 + ?

Peaks chosen by forward selection We have recorded all the QTLs in this simulation, but some false positives. Also in other simulations we missed some. How can we tell which peaks are genuine? One way to think about this sis suppose we had chosen a slightly different set of HS mice: the peak heights would have been different and our forward selection would have chosen slightly different set of peaks.

Bootstrap sampling 1 2 3 4 10 subjects 5 6 7 8 9 10 To deal with this we’ve taken a bootstrapping approach. Let me explain what this is.

Bootstrap sampling sample with replacement 1 2 3 4 5 6 7 8 9 10 1 2 3 bootstrap sample from 10 subjects 10 subjects So what I will do is create a bootstrap of the 2000 mice and repeat the forward selection

Forward selection on a bootstrap sample

Forward selection on a bootstrap sample

Forward selection on a bootstrap sample

Bootstrap evidence mounts up…

In 1000 bootstraps… Model Inclusion Probability

854 loci in all phenotypes, 84 diabetes loci

854 loci in all phenotypes, 84 diabetes loci

Servin B, Stephens M (2007)

Bayesian Multiple QTL modelling Kilpikari R, Sillanpaa MJ (2003) Bayesian analysis of multilocus association in quantitative and qualitative traits. Genet Epidemiol 25: 122-135 Yi N (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167: 967-975 Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3: e114 Fridley BL (2008) Bayesian variable and model selection methods for genetic association studies. Genet Epidemiol.

The Collaborative Cross Heterogeneous Stocks (HS) Collaborative Cross outbreed outbred population recombinant inbred lines Churchill et al 2004; Broman 2005; Valdar et al 2006