Statistical issues in QTL mapping in mice

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

The genetic dissection of complex traits
Planning breeding programs for impact
Introduction Materials and methods SUBJECTS : Balb/cJ and C57BL/6J inbred mouse strains, and inbred fruit fly strains number 11 and 70 from the recombinant.
Gene by environment effects. Elevated Plus Maze (anxiety)
Experimental crosses. Inbred Strain Cross Backcross.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Basics of Linkage Analysis
QTL Mapping R. M. Sundaram.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Quantitative Genetics
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
Karl W Broman Department of Biostatistics Johns Hopkins University Gene mapping in model organisms.
Quantitative Genetics
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
1 How many genes? Mapping mouse traits, cont. Lecture 3, Statistics 246 January 27, 2004.
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 How many genes? Mapping mouse traits Lecture 1, Statistics 246 January 20, 2004.
Association between genotype and phenotype
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
The genomes of recombinant inbred lines
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
1 QTL mapping in mice, completed. Lecture 12, Statistics 246 March 2, 2004.
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Identifying QTLs in experimental crosses
Complex Trait Genetics in Animal Models
The genomes of recombinant inbred lines
upstream vs. ORF binding and gene expression?
Statistical Methods for Quantitative Trait Loci (QTL) Mapping
Model Selection for Multiple QTL
Genome Wide Association Studies using SNP
Recombination and Linkage
Relationship between quantitative trait inheritance and
Gene mapping in mice Karl W Broman Department of Biostatistics
15 The Genetic Basis of Complex Inheritance
Genes may be linked or unlinked and are inherited accordingly.
The breakpoint process on RI chromosomes
The breakpoint process on RI chromosomes
Mapping Quantitative Trait Loci
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
Genome-wide Association Studies
The genomes of recombinant inbred lines
Inferring Genetic Architecture of Complex Biological Processes Brian S
What are BLUP? and why they are useful?
Lecture 9: QTL Mapping II: Outbred Populations
Super computing for “classical” genetics
The genomes of recombinant inbred lines
Chapter 7 Beyond alleles: Quantitative Genetics
Completion and analysis of Punnett squares for dihybrid traits
Quantitative Trait Locus (QTL) Mapping
Presentation transcript:

Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman

Outline Overview of QTL mapping The X chromosome Mapping multiple QTLs Recombinant inbred lines Heterogeneous stock and 8-way RILs

The intercross

The data Phenotypes, yi Genotypes, xij = AA/AB/BB, at genetic markers A genetic map, giving the locations of the markers.

Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs.

Phenotypes 133 females (NOD  B6)  (NOD  B6)

NOD

C57BL/6

Agouti coat

Genetic map

Genotype data

Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs.

Statistical structure Missing data: markers  QTL Model selection: genotypes  phenotype

Models: recombination No crossover interference Locations of breakpoints according to a Poisson process. Genotypes along chromosome follow a Markov chain. Clearly wrong, but super convenient.

E(y | g) = (g1,…,gp) SD(y | g) = (g1,…,gp) Models: gen  phe Phenotype = y, whole-genome genotype = g Imagine that p sites are all that matter. E(y | g) = (g1,…,gp) SD(y | g) = (g1,…,gp) Simplifying assumptions: SD(y | g) = , independent of g y | g ~ normal( (g1,…,gp),  ) (g1,…,gp) =  + ∑ j 1{gj = AB} + j 1{gj = BB}

Interval mapping Lander and Botstein 1989 Imagine that there is a single QTL, at position z. Let qi = genotype of mouse i at the QTL, and assume yi | qi ~ normal( (qi),  ) We won’t know qi, but we can calculate pig = Pr(qi = g | marker data) yi, given the marker data, follows a mixture of normal distributions with known mixing proportions (the pig). Use an EM algorithm to get MLEs of  = (AA, AB, BB, ). Measure the evidence for a QTL via the LOD score, which is the log10 likelihood ratio comparing the hypothesis of a single QTL at position z to the hypothesis of no QTL anywhere.

LOD curves

LOD thresholds To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. The 95th percentile of this distribution is used as a significance threshold. Such a threshold may be estimated via permutations (Churchill and Doerge 1994).

Permutation test Shuffle the phenotypes relative to the genotypes. Calculate M* = max LOD*, with the shuffled data. Repeat many times. LOD threshold = 95th percentile of M*. P-value = Pr(M* ≥ M)

Permutation distribution

Chr 9 and 11

Non-normal traits

Non-normal traits Standard interval mapping assumes that the residual variation is normally distributed (and so the phenotype distribution follows a mixture of normal distributions). In reality: we see binary traits, counts, skewed distributions, outliers, and all sorts of odd things. Interval mapping, with LOD thresholds derived via permutation tests, often performs fine anyway. Alternatives to consider: Nonparametric linkage analysis (Kruglyak and Lander 1995). Transformations (e.g., log or square root). Specially-tailored models (e.g., a generalized linear model, the Cox proportional hazards model, the model of Broman 2003).

Split by sex

Split by sex

Split by parent-of-origin

Split by parent-of-origin Percent of individuals with phenotype Genotype at D15Mit252 Genotype at D19Mit59 P-O-O AA AB Dad 63% 54% 75% 43% Mom 57% 23% 38% 40%

The X chromosome (N  B)  (N  B) (B  N)  (B  N)

The X chromosome BB  BY? NN  NY? Different “degrees of freedom” Autosome NN : NB : BB Females, one direction NN : NB Both sexes, both dir. NY : NN : NB : BB : BY  Need an X-chr-specific LOD threshold. “Null model” should include a sex effect.

Chr 9 and 11

Epistasis

Going after multiple QTLs Greater ability to detect QTLs. Separate linked QTLs. Learn about interactions between QTLs (epistasis).

Miss important loci  include extraneous loci. Model selection Choose a class of models. Additive; pairwise interactions; regression trees Fit a model (allow for missing genotype data). Linear regression; ML via EM; Bayes via MCMC Search model space. Forward/backward/stepwise selection; MCMC Compare models. BIC() = log L() + (/2) || log n Miss important loci  include extraneous loci.

Special features Relationship among the covariates. Missing covariate information. Identify the key players vs. minimize prediction error.

Opportunities for improvements Each individual is unique. Must genotype each mouse. Unable to obtain multiple invasive phenotypes (e.g., in multiple environmental conditions) on the same genotype. Relatively low mapping precision. Design a set of inbred mouse strains. Genotype once. Study multiple phenotypes on the same genotype.

Recombinant inbred lines

AXB/BXA panel

AXB/BXA panel

The usual analysis Calculate the phenotype average within each strain. Use these strain averages for QTL mapping as with a backcross (taking account of the map expansion in RILs). Can we do better? With the above data: Ave. no. mice per strain = 15.8 (SD = 8.4) Range of no. mice per strain = 3 – 39

A simple model for RILs ysi =  +  xs + s + si xs = 0 or 1, according to genotype at putative QTL s = strain (polygenic) effect ~ normal(0, ) si = residual environment effect ~ normal(0, )

RIL analysis If and were known: Work with the strain averages, Weight by Equivalently, weight by where Equal ns: The usual analysis is fine. h2 large: Weight the strains equally. h2 small: Weight the strains by ns.

LOD curves

Chr 7 and 19

Recombination fractions

Chr 7 and 19

RI lines Advantages Disadvantages Each strain is a eternal resource. Only need to genotype once. Reduce individual variation by phenotyping multiple individuals from each strain. Study multiple phenotypes on the same genotype. Greater mapping precision. Disadvantages Time and expense. Available panels are generally too small (10-30 lines). Can learn only about 2 particular alleles. All individuals homozygous.

The RIX design

Heterogeneous stock McClearn et al. (1970) Mott et al. (2000); Mott and Flint (2002) Start with 8 inbred strains. Randomly breed 40 pairs. Repeat the random breeding of 40 pairs for each of ~60 generations (30 years). The genealogy (and protocol) is not completely known.

Heterogeneous stock

Heterogeneous stock Advantages Disadvantages Great mapping precision. Learn about 8 alleles. Disadvantages Time. Each individual is unique. Need extremely dense markers.

The “Collaborative Cross”

Genome of an 8-way RI

Genome of an 8-way RI

Genome of an 8-way RI

Genome of an 8-way RI

Genome of an 8-way RI

The “Collaborative Cross” Advantages Great mapping precision. Eternal resource. Genotype only once. Study multiple invasive phenotypes on the same genotype. Barriers Advantages not widely appreciated. Ask one question at a time, or Ask many questions at once? Time. Expense. Requires large-scale collaboration.

To be worked out Breakpoint process along an 8-way RI chromosome. Reconstruction of genotypes given multipoint marker data. Single-QTL analyses. Mixed models, with random effects for strains and genotypes/alleles. Power and precision (relative to an intercross).

Haldane & Waddington 1930 r = recombination fraction per meiosis between two loci Autosomes Pr(G1=AA) = Pr(G1=BB) = 1/2 Pr(G2=BB | G1=AA) = Pr(G2=AA | G1=BB) = 4r / (1+6r) X chromosome Pr(G1=AA) = 2/3 Pr(G1=BB) = 1/3 Pr(G2=BB | G1=AA) = 2r / (1+4r) Pr(G2=AA | G1=BB) = 4r / (1+4r) Pr(G2  G1) = (8/3) r / (1+4r)

8-way RILs Autosomes X chromosome Pr(G1 = i) = 1/8 Pr(G2 = j | G1 = i) = r / (1+6r) for i  j Pr(G2  G1) = 7r / (1+6r) X chromosome Pr(G1=AA) = Pr(G1=BB) = Pr(G1=EE) = Pr(G1=FF) =1/6 Pr(G1=CC) = 1/3 Pr(G2=AA | G1=CC) = r / (1+4r) Pr(G2=CC | G1=AA) = 2r / (1+4r) Pr(G2=BB | G1=AA) = r / (1+4r) Pr(G2  G1) = (14/3) r / (1+4r)

Acknowledgments Terry Speed, Univ. of California, Berkeley and WEHI Tom Brodnicki, WEHI Gary Churchill, The Jackson Laboratory Joe Nadeau, Case Western Reserve Univ.